Ad

Building a Portable ChatGPT Bot: ESP32, I2S Mic, and OLED

Imagine having the power of artificial intelligence right in the palm of your hand, without needing to pull out your phone or open a browser. In this guide, we are going to build a portable "ChatGPT Anywhere" device using an ESP32 microcontroller, an I2S digital microphone, and a small OLED display. This device will capture your voice, send it to OpenAI’s servers, and display the response right on the screen.

Why Use the ESP32 for AI?

The ESP32 is the perfect candidate for this project because of its built-in Wi-Fi capabilities, dual-core processor, and support for I2S (Inter-IC Sound). I2S is crucial for handling high-quality digital audio input from microphones, which ensures that ChatGPT understands your queries clearly.

What You Will Need

To follow this tutorial, you will need the following hardware components:

  • ESP32 Development Board: (e.g., ESP32-WROOM-32)
  • I2S Microphone: INMP441 is highly recommended for its clarity and digital output.
  • OLED Display: 0.96-inch SSD1306 (I2C version).
  • Push Button: To trigger the recording.
  • Breadboard and Jumper Wires: For prototyping.
  • Power Source: A LiPo battery or a USB power bank for portability.

The Circuit Connections

Wiring digital audio and displays requires specific pins. Follow this mapping to ensure your hardware communicates correctly:

1. INMP441 Microphone to ESP32

  • VDD to 3.3V
  • GND to GND
  • L/R to GND
  • WS to GPIO 25
  • SCK to GPIO 32
  • SD to GPIO 33

2. SSD1306 OLED Display to ESP32

  • VCC to 3.3V
  • GND to GND
  • SCL to GPIO 22
  • SDA to GPIO 21

3. Push Button

  • One side to GPIO 4, the other side to GND (using internal pull-up).

Setting Up the Software

Before uploading the code, you must have the Arduino IDE installed and configured for ESP32. You will also need an OpenAI API Key. You can obtain this by creating an account on the OpenAI platform and generating a secret key under the API section.

Required Libraries

Install these libraries via the Arduino Library Manager:

  • Adafruit SSD1306 & Adafruit GFX: For controlling the OLED.
  • ArduinoJson: To parse the responses from the OpenAI API.
  • WiFiClientSecure: For making encrypted HTTPS requests.

The Logic Flow

The bot operates in a simple loop. First, it waits for a button press. Once pressed, the ESP32 records audio through the INMP441 mic and stores it in a buffer. This audio is then sent to OpenAI’s Whisper API for Speech-to-Text conversion. The resulting text is then sent to the ChatGPT API (GPT-3.5 or GPT-4o). Finally, the text response is scrolled across the OLED display.

Code Snippet

Below is a simplified structure of how the main loop and API interaction function. Note: You will need to handle the WAV file header and base64 encoding for the audio transmission.

#include <WiFi.h>
#include <HTTPClient.h>
#include <Adafruit_SSD1306.h>
#include <ArduinoJson.h>

const char* ssid = "YOUR_WIFI_SSID";
const char* password = "YOUR_WIFI_PASSWORD";
const char* apiKey = "YOUR_OPENAI_API_KEY";

void setup() {
  Serial.begin(115200);
  setupOLED();
  connectWiFi();
  setupI2S(); // Initialize the Microphone
}

void loop() {
  if (digitalRead(BUTTON_PIN) == LOW) {
    display.clearDisplay();
    display.setCursor(0,0);
    display.println("Listening...");
    display.display();
    
    // 1. Record Audio
    // 2. Send to Whisper API
    // 3. Send Text to ChatGPT
    // 4. Display Result
    
    String response = callChatGPT("Who are you?");
    displayResponse(response);
  }
}

String callChatGPT(String query) {
  HTTPClient http;
  http.begin("https://api.openai.com/v1/chat/completions");
  http.addHeader("Content-Type", "application/json");
  http.addHeader("Authorization", "Bearer " + String(apiKey));

  String payload = "{\"model\": \"gpt-3.5-turbo\", \"messages\": [{\"role\": \"user\", \"content\": \"" + query + "\"}]}";
  int httpResponseCode = http.POST(payload);
  
  // Parse JSON and return the content
  return result;
}

Handling Audio Data

The trickiest part of this project is sending audio. ESP32 has limited RAM, so instead of saving a massive file, we send the audio data in chunks or use a small SPIFFS/LittleFS file system to temporarily store the recorded voice. For best results, record in 16-bit Mono at 16kHz, as this is the standard format for most Speech-to-Text engines.

Improving the Experience

Once you have the basic version working, consider these upgrades:

  • Text-to-Speech: Add an I2S DAC (like the MAX98357A) and a speaker so the bot can talk back to you.
  • Battery Management: Add a TP4056 charging module to make it truly portable.
  • 3D Printed Case: Design a small enclosure to house the components and make it look like a finished product.
  • System Prompts: Give your bot a personality by changing the "System" message in the API call (e.g., "You are a helpful robot assistant named Sparky").

Conclusion

Building a portable ChatGPT bot is a fantastic way to learn about I2S audio, API integration, and IoT hardware. While it might seem complex, breaking it down into voice input, API processing, and display output makes it manageable. With the ESP32 at the center, you now have a powerful, pocket-sized AI companion!

Comments

Popular Posts