5.0 KiB
5.0 KiB
ESP32 I2S Audio Processing Project
Project Overview
This is an ESP32-based audio processing project that implements I2S (Inter-IC Sound) communication for capturing audio data using an ES7210 codec. The project includes functionality for:
- Audio recording to SD card in WAV format
- Automatic Speech Recognition (ASR) integration with cloud APIs
- WiFi connectivity for cloud services
- USB Mass Storage (MSC) for file access
- Power management and sleep modes
- Time synchronization via NTP
The project is built using the ESP-IDF (Espressif IoT Development Framework) and targets audio applications such as voice recorders, speech recognition systems, or audio processing units.
Key Features
Audio Recording
- I2S TDM (Time Division Multiplexing) interface with ES7210 codec
- Configurable sample rate (16kHz), bit width (16-bit), and stereo channels
- Direct Memory Access (DMA) for efficient audio data transfer
- WAV file format with proper headers
- Timestamp-based file naming for recordings
Connectivity
- WiFi station mode for internet connectivity
- SD card mounting via SDMMC interface
- NTP time synchronization with multiple servers
- HTTP client for cloud API communication
Cloud Integration
- ASR (Automatic Speech Recognition) using SiliconFlow API
- Audio transcription capabilities
- Secure HTTPS communication with bearer token authentication
Power Management
- Dynamic CPU frequency scaling (8MHz to 240MHz)
- Light sleep mode for power conservation
- Automatic power management configuration
USB Interface
- USB Mass Storage Class (MSC) implementation
- SD card access via USB when connected to host
- Console commands for file system operations
Hardware Configuration
I2S Interface
- Master clock (MCK): GPIO 38
- Bit clock (BCK): GPIO 14
- Word select (WS): GPIO 13
- Data in (DI): GPIO 12
ES7210 Codec
- I2C interface: Port 0
- SDA: GPIO 1
- SCL: GPIO 2
- I2C address: 0x41
SD Card Interface
- Command: GPIO 48
- Clock: GPIO 47
- Data 0: GPIO 21
- Mount point:
/sdcard
Additional Pins
- LED indicator: GPIO 48
Building and Running
Prerequisites
- ESP-IDF v4.x or later installed and configured
- ESP32 development board
- ES7210 I2S audio codec
- SD card
Build Process
# Navigate to project directory
cd /path/to/esp32i2s
# Configure project (optional, if needed)
idf.py menuconfig
# Build the project
idf.py build
# Flash to ESP32
idf.py flash
# Monitor serial output
idf.py monitor
Configuration Options
- Sample rate: 16000 Hz
- Channel count: 2 (Stereo)
- Bit width: 16 bits
- Recording duration: 20 seconds per file
- DMA buffer count: 8
- DMA buffer length: 512 samples
File Structure
esp32i2s/
├── CMakeLists.txt # Main build configuration
├── partitions.csv # Partition table
├── sdkconfig # SDK configuration
├── sdkconfig.defaults # Default SDK settings
├── main/ # Main application source
│ ├── main.c # Main application entry point
│ ├── record.c/h # Audio recording functionality
│ ├── asr.c/h # Automatic Speech Recognition
│ ├── base.h # Common definitions and includes
│ ├── format_wav.h # WAV file format definitions
│ ├── usb_msc.c # USB Mass Storage implementation
│ └── ...
└── ...
Development Conventions
Coding Style
- Follow ESP-IDF coding conventions
- Use ESP_LOG macros for logging
- Handle errors with ESP_ERROR_CHECK and ESP_RETURN_ON_FALSE
- Use FreeRTOS tasks for concurrent operations
Memory Management
- Use DMA-capable memory allocation for I2S buffers
- Properly free allocated memory in error paths
- Monitor heap usage for memory leaks
Power Efficiency
- Implement sleep modes when idle
- Use power management configuration appropriately
- Minimize active periods for battery operation
Testing and Debugging
Serial Monitor
Monitor the serial output for logging information:
- I2S initialization status
- SD card mounting results
- WiFi connection status
- Recording progress
- ASR API responses
Console Commands
When USB MSC is active, the following console commands are available:
read- Read README.MD filewrite- Create/update README.MD filesize- Show storage capacityexpose- Expose storage to USB hoststatus- Show storage exposure statusexit- Exit application
Security Considerations
- WiFi credentials are hardcoded in
asr.c(WIFI_SSID and WIFI_PASS) - API token is hardcoded in
asr.c(BEARER_TOKEN) - HTTPS communication uses certificate validation
- Consider using NVS for storing sensitive information instead of hardcoding
Known Issues and Limitations
- WiFi credentials and API tokens are hardcoded in source
- Recording duration is fixed at 20 seconds
- Only supports PCM WAV format
- Requires internet connectivity for ASR functionality
- USB MSC and file system access cannot be used simultaneously