Voice-Activated ChatGPT: Harnessing JavaScript for Microphone Input

3 min readOct 17, 2024

Introduction

In this blog post, we will explore how to use JavaScript to capture voice input from a microphone and send that input as a prompt to OpenAI’s ChatGPT API. This innovative approach allows users to interact with AI through voice commands, enhancing accessibility and user experience.

Prerequisites

Before we get started, ensure you have the following:

Basic knowledge of HTML, CSS, and JavaScript.
An API key from OpenAI to access the ChatGPT model.

Step 1: Obtain Your API Key

Sign Up / Log In to OpenAI: Visit the OpenAI website and create an account if you haven’t already.
Get Your API Key: Navigate to the API section and generate your API key. Keep this key secure, as it will be used for authentication.

Step 2: Setting Up Your Project

Create a new directory for your project and create two files: index.html and app.js.

Directory Structure

/voice-chatgpt
    ├── index.html
    └── app.js

Step 3: Creating the HTML Structure

Set up a simple HTML structure in index.html that allows users to start and stop recording their voice input.

HTML Code (index.html)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Voice Activated ChatGPT</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 20px;
        }
        #response {
            margin-top: 10px;
            border: 1px solid #ccc;
            padding: 10px;
            min-height: 50px;
        }
        .button {
            padding: 10px 20px;
            margin: 10px;
            cursor: pointer;
        }
    </style>
</head>
<body>

    <h1>Voice-Activated Chat with ChatGPT</h1>
    <button id="start-button" class="button">Start Recording</button>
    <button id="stop-button" class="button" disabled>Stop Recording</button>

    <div id="response"></div>

    <script src="app.js"></script>
</body>
</html>

Step 4: Implementing JavaScript Functionality

In the app.js file, we will implement the logic to record audio from the microphone, convert it to text using the Web Speech API, and then send the text input to the ChatGPT API.

JavaScript Code (app.js)

const apiKey = 'YOUR_API_KEY'; // Replace with your actual OpenAI API key
const startButton = document.getElementById('start-button');
const stopButton = document.getElementById('stop-button');
const responseDiv = document.getElementById('response');

let recognition;

// Initialize Speech Recognition
if ('webkitSpeechRecognition' in window) {
    recognition = new webkitSpeechRecognition();
    recognition.continuous = false; // Stop automatically after the first result
    recognition.interimResults = false; // We want only final results

    recognition.onresult = async (event) => {
        const userMessage = event.results[0][0].transcript;
        responseDiv.innerHTML = "You said: " + userMessage;
        await sendToChatGPT(userMessage);
    };

    recognition.onerror = (event) => {
        console.error('Error occurred in recognition: ' + event.error);
        responseDiv.innerHTML = "Error occurred while recognizing speech.";
    };
} else {
    alert('Your browser does not support speech recognition. Please use Chrome or Edge.');
}

startButton.addEventListener('click', () => {
    recognition.start();
    startButton.disabled = true;
    stopButton.disabled = false;
});

stopButton.addEventListener('click', () => {
    recognition.stop();
    startButton.disabled = false;
    stopButton.disabled = true;
});

async function sendToChatGPT(userMessage) {
    responseDiv.innerHTML += "<br/>Loading response...";
    
    try {
        const response = await fetch('https://api.openai.com/v1/chat/completions', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${apiKey}`
            },
            body: JSON.stringify({
                model: "gpt-3.5-turbo",
                messages: [{ role: "user", content: userMessage }]
            })
        });

        const data = await response.json();
        const botReply = data.choices[0].message.content;
        responseDiv.innerHTML += `<br/>ChatGPT says: ${botReply}`; // Display the bot's response
    } catch (error) {
        console.error('Error:', error);
        responseDiv.innerHTML += "<br/>Error occurred while fetching response.";
    }
}

Step 5: Running Your Application

Insert Your API Key: Open app.js and replace YOUR_API_KEY with your actual OpenAI API key.
Open index.html: Launch the index.html file in your web browser.
Interact with ChatGPT: Click the “Start Recording” button to begin capturing your voice input. Speak your prompt, then click “Stop Recording.” The application will process your input and display the response from ChatGPT.

Conclusion

In this blog post, we’ve successfully built a voice-activated application that allows users to interact with ChatGPT using their microphone. This project showcases the power of combining JavaScript, the Web Speech API, and OpenAI’s ChatGPT to create an intuitive user experience.