Voice-Activated ChatGPT: Harnessing JavaScript for Microphone Input
Introduction
In this blog post, we will explore how to use JavaScript to capture voice input from a microphone and send that input as a prompt to OpenAI’s ChatGPT API. This innovative approach allows users to interact with AI through voice commands, enhancing accessibility and user experience.
Prerequisites
Before we get started, ensure you have the following:
- Basic knowledge of HTML, CSS, and JavaScript.
- An API key from OpenAI to access the ChatGPT model.
Step 1: Obtain Your API Key
- Sign Up / Log In to OpenAI: Visit the OpenAI website and create an account if you haven’t already.
- Get Your API Key: Navigate to the API section and generate your API key. Keep this key secure, as it will be used for authentication.
Step 2: Setting Up Your Project
Create a new directory for your project and create two files: index.html
and app.js
.
Directory Structure
/voice-chatgpt
├── index.html
└── app.js
Step 3: Creating the HTML Structure
Set up a simple HTML structure in index.html
that allows users to start and stop recording their voice input.
HTML Code (index.html)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Voice Activated ChatGPT</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 20px;
}
#response {
margin-top: 10px;
border: 1px solid #ccc;
padding: 10px;
min-height: 50px;
}
.button {
padding: 10px 20px;
margin: 10px;
cursor: pointer;
}
</style>
</head>
<body>
<h1>Voice-Activated Chat with ChatGPT</h1>
<button id="start-button" class="button">Start Recording</button>
<button id="stop-button" class="button" disabled>Stop Recording</button>
<div id="response"></div>
<script src="app.js"></script>
</body>
</html>
Step 4: Implementing JavaScript Functionality
In the app.js
file, we will implement the logic to record audio from the microphone, convert it to text using the Web Speech API, and then send the text input to the ChatGPT API.
JavaScript Code (app.js)
const apiKey = 'YOUR_API_KEY'; // Replace with your actual OpenAI API key
const startButton = document.getElementById('start-button');
const stopButton = document.getElementById('stop-button');
const responseDiv = document.getElementById('response');
let recognition;
// Initialize Speech Recognition
if ('webkitSpeechRecognition' in window) {
recognition = new webkitSpeechRecognition();
recognition.continuous = false; // Stop automatically after the first result
recognition.interimResults = false; // We want only final results
recognition.onresult = async (event) => {
const userMessage = event.results[0][0].transcript;
responseDiv.innerHTML = "You said: " + userMessage;
await sendToChatGPT(userMessage);
};
recognition.onerror = (event) => {
console.error('Error occurred in recognition: ' + event.error);
responseDiv.innerHTML = "Error occurred while recognizing speech.";
};
} else {
alert('Your browser does not support speech recognition. Please use Chrome or Edge.');
}
startButton.addEventListener('click', () => {
recognition.start();
startButton.disabled = true;
stopButton.disabled = false;
});
stopButton.addEventListener('click', () => {
recognition.stop();
startButton.disabled = false;
stopButton.disabled = true;
});
async function sendToChatGPT(userMessage) {
responseDiv.innerHTML += "<br/>Loading response...";
try {
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
},
body: JSON.stringify({
model: "gpt-3.5-turbo",
messages: [{ role: "user", content: userMessage }]
})
});
const data = await response.json();
const botReply = data.choices[0].message.content;
responseDiv.innerHTML += `<br/>ChatGPT says: ${botReply}`; // Display the bot's response
} catch (error) {
console.error('Error:', error);
responseDiv.innerHTML += "<br/>Error occurred while fetching response.";
}
}
Step 5: Running Your Application
- Insert Your API Key: Open
app.js
and replaceYOUR_API_KEY
with your actual OpenAI API key. - Open index.html: Launch the
index.html
file in your web browser. - Interact with ChatGPT: Click the “Start Recording” button to begin capturing your voice input. Speak your prompt, then click “Stop Recording.” The application will process your input and display the response from ChatGPT.
Conclusion
In this blog post, we’ve successfully built a voice-activated application that allows users to interact with ChatGPT using their microphone. This project showcases the power of combining JavaScript, the Web Speech API, and OpenAI’s ChatGPT to create an intuitive user experience.