listener/QWEN.md

5.0 KiB

ESP32 I2S Audio Processing Project

Project Overview

This is an ESP32-based audio processing project that implements I2S (Inter-IC Sound) communication for capturing audio data using an ES7210 codec. The project includes functionality for:

  • Audio recording to SD card in WAV format
  • Automatic Speech Recognition (ASR) integration with cloud APIs
  • WiFi connectivity for cloud services
  • USB Mass Storage (MSC) for file access
  • Power management and sleep modes
  • Time synchronization via NTP

The project is built using the ESP-IDF (Espressif IoT Development Framework) and targets audio applications such as voice recorders, speech recognition systems, or audio processing units.

Key Features

Audio Recording

  • I2S TDM (Time Division Multiplexing) interface with ES7210 codec
  • Configurable sample rate (16kHz), bit width (16-bit), and stereo channels
  • Direct Memory Access (DMA) for efficient audio data transfer
  • WAV file format with proper headers
  • Timestamp-based file naming for recordings

Connectivity

  • WiFi station mode for internet connectivity
  • SD card mounting via SDMMC interface
  • NTP time synchronization with multiple servers
  • HTTP client for cloud API communication

Cloud Integration

  • ASR (Automatic Speech Recognition) using SiliconFlow API
  • Audio transcription capabilities
  • Secure HTTPS communication with bearer token authentication

Power Management

  • Dynamic CPU frequency scaling (8MHz to 240MHz)
  • Light sleep mode for power conservation
  • Automatic power management configuration

USB Interface

  • USB Mass Storage Class (MSC) implementation
  • SD card access via USB when connected to host
  • Console commands for file system operations

Hardware Configuration

I2S Interface

  • Master clock (MCK): GPIO 38
  • Bit clock (BCK): GPIO 14
  • Word select (WS): GPIO 13
  • Data in (DI): GPIO 12

ES7210 Codec

  • I2C interface: Port 0
  • SDA: GPIO 1
  • SCL: GPIO 2
  • I2C address: 0x41

SD Card Interface

  • Command: GPIO 48
  • Clock: GPIO 47
  • Data 0: GPIO 21
  • Mount point: /sdcard

Additional Pins

  • LED indicator: GPIO 48

Building and Running

Prerequisites

  • ESP-IDF v4.x or later installed and configured
  • ESP32 development board
  • ES7210 I2S audio codec
  • SD card

Build Process

# Navigate to project directory
cd /path/to/esp32i2s

# Configure project (optional, if needed)
idf.py menuconfig

# Build the project
idf.py build

# Flash to ESP32
idf.py flash

# Monitor serial output
idf.py monitor

Configuration Options

  • Sample rate: 16000 Hz
  • Channel count: 2 (Stereo)
  • Bit width: 16 bits
  • Recording duration: 20 seconds per file
  • DMA buffer count: 8
  • DMA buffer length: 512 samples

File Structure

esp32i2s/
├── CMakeLists.txt          # Main build configuration
├── partitions.csv          # Partition table
├── sdkconfig              # SDK configuration
├── sdkconfig.defaults     # Default SDK settings
├── main/                  # Main application source
│   ├── main.c             # Main application entry point
│   ├── record.c/h         # Audio recording functionality
│   ├── asr.c/h            # Automatic Speech Recognition
│   ├── base.h             # Common definitions and includes
│   ├── format_wav.h       # WAV file format definitions
│   ├── usb_msc.c          # USB Mass Storage implementation
│   └── ...
└── ...

Development Conventions

Coding Style

  • Follow ESP-IDF coding conventions
  • Use ESP_LOG macros for logging
  • Handle errors with ESP_ERROR_CHECK and ESP_RETURN_ON_FALSE
  • Use FreeRTOS tasks for concurrent operations

Memory Management

  • Use DMA-capable memory allocation for I2S buffers
  • Properly free allocated memory in error paths
  • Monitor heap usage for memory leaks

Power Efficiency

  • Implement sleep modes when idle
  • Use power management configuration appropriately
  • Minimize active periods for battery operation

Testing and Debugging

Serial Monitor

Monitor the serial output for logging information:

  • I2S initialization status
  • SD card mounting results
  • WiFi connection status
  • Recording progress
  • ASR API responses

Console Commands

When USB MSC is active, the following console commands are available:

  • read - Read README.MD file
  • write - Create/update README.MD file
  • size - Show storage capacity
  • expose - Expose storage to USB host
  • status - Show storage exposure status
  • exit - Exit application

Security Considerations

  • WiFi credentials are hardcoded in asr.c (WIFI_SSID and WIFI_PASS)
  • API token is hardcoded in asr.c (BEARER_TOKEN)
  • HTTPS communication uses certificate validation
  • Consider using NVS for storing sensitive information instead of hardcoding

Known Issues and Limitations

  • WiFi credentials and API tokens are hardcoded in source
  • Recording duration is fixed at 20 seconds
  • Only supports PCM WAV format
  • Requires internet connectivity for ASR functionality
  • USB MSC and file system access cannot be used simultaneously