172 lines
5.0 KiB
Markdown
172 lines
5.0 KiB
Markdown
|
|
# ESP32 I2S Audio Processing Project
|
||
|
|
|
||
|
|
## Project Overview
|
||
|
|
|
||
|
|
This is an ESP32-based audio processing project that implements I2S (Inter-IC Sound) communication for capturing audio data using an ES7210 codec. The project includes functionality for:
|
||
|
|
|
||
|
|
- Audio recording to SD card in WAV format
|
||
|
|
- Automatic Speech Recognition (ASR) integration with cloud APIs
|
||
|
|
- WiFi connectivity for cloud services
|
||
|
|
- USB Mass Storage (MSC) for file access
|
||
|
|
- Power management and sleep modes
|
||
|
|
- Time synchronization via NTP
|
||
|
|
|
||
|
|
The project is built using the ESP-IDF (Espressif IoT Development Framework) and targets audio applications such as voice recorders, speech recognition systems, or audio processing units.
|
||
|
|
|
||
|
|
## Key Features
|
||
|
|
|
||
|
|
### Audio Recording
|
||
|
|
- I2S TDM (Time Division Multiplexing) interface with ES7210 codec
|
||
|
|
- Configurable sample rate (16kHz), bit width (16-bit), and stereo channels
|
||
|
|
- Direct Memory Access (DMA) for efficient audio data transfer
|
||
|
|
- WAV file format with proper headers
|
||
|
|
- Timestamp-based file naming for recordings
|
||
|
|
|
||
|
|
### Connectivity
|
||
|
|
- WiFi station mode for internet connectivity
|
||
|
|
- SD card mounting via SDMMC interface
|
||
|
|
- NTP time synchronization with multiple servers
|
||
|
|
- HTTP client for cloud API communication
|
||
|
|
|
||
|
|
### Cloud Integration
|
||
|
|
- ASR (Automatic Speech Recognition) using SiliconFlow API
|
||
|
|
- Audio transcription capabilities
|
||
|
|
- Secure HTTPS communication with bearer token authentication
|
||
|
|
|
||
|
|
### Power Management
|
||
|
|
- Dynamic CPU frequency scaling (8MHz to 240MHz)
|
||
|
|
- Light sleep mode for power conservation
|
||
|
|
- Automatic power management configuration
|
||
|
|
|
||
|
|
### USB Interface
|
||
|
|
- USB Mass Storage Class (MSC) implementation
|
||
|
|
- SD card access via USB when connected to host
|
||
|
|
- Console commands for file system operations
|
||
|
|
|
||
|
|
## Hardware Configuration
|
||
|
|
|
||
|
|
### I2S Interface
|
||
|
|
- Master clock (MCK): GPIO 38
|
||
|
|
- Bit clock (BCK): GPIO 14
|
||
|
|
- Word select (WS): GPIO 13
|
||
|
|
- Data in (DI): GPIO 12
|
||
|
|
|
||
|
|
### ES7210 Codec
|
||
|
|
- I2C interface: Port 0
|
||
|
|
- SDA: GPIO 1
|
||
|
|
- SCL: GPIO 2
|
||
|
|
- I2C address: 0x41
|
||
|
|
|
||
|
|
### SD Card Interface
|
||
|
|
- Command: GPIO 48
|
||
|
|
- Clock: GPIO 47
|
||
|
|
- Data 0: GPIO 21
|
||
|
|
- Mount point: `/sdcard`
|
||
|
|
|
||
|
|
### Additional Pins
|
||
|
|
- LED indicator: GPIO 48
|
||
|
|
|
||
|
|
## Building and Running
|
||
|
|
|
||
|
|
### Prerequisites
|
||
|
|
- ESP-IDF v4.x or later installed and configured
|
||
|
|
- ESP32 development board
|
||
|
|
- ES7210 I2S audio codec
|
||
|
|
- SD card
|
||
|
|
|
||
|
|
### Build Process
|
||
|
|
```bash
|
||
|
|
# Navigate to project directory
|
||
|
|
cd /path/to/esp32i2s
|
||
|
|
|
||
|
|
# Configure project (optional, if needed)
|
||
|
|
idf.py menuconfig
|
||
|
|
|
||
|
|
# Build the project
|
||
|
|
idf.py build
|
||
|
|
|
||
|
|
# Flash to ESP32
|
||
|
|
idf.py flash
|
||
|
|
|
||
|
|
# Monitor serial output
|
||
|
|
idf.py monitor
|
||
|
|
```
|
||
|
|
|
||
|
|
### Configuration Options
|
||
|
|
- Sample rate: 16000 Hz
|
||
|
|
- Channel count: 2 (Stereo)
|
||
|
|
- Bit width: 16 bits
|
||
|
|
- Recording duration: 20 seconds per file
|
||
|
|
- DMA buffer count: 8
|
||
|
|
- DMA buffer length: 512 samples
|
||
|
|
|
||
|
|
## File Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
esp32i2s/
|
||
|
|
├── CMakeLists.txt # Main build configuration
|
||
|
|
├── partitions.csv # Partition table
|
||
|
|
├── sdkconfig # SDK configuration
|
||
|
|
├── sdkconfig.defaults # Default SDK settings
|
||
|
|
├── main/ # Main application source
|
||
|
|
│ ├── main.c # Main application entry point
|
||
|
|
│ ├── record.c/h # Audio recording functionality
|
||
|
|
│ ├── asr.c/h # Automatic Speech Recognition
|
||
|
|
│ ├── base.h # Common definitions and includes
|
||
|
|
│ ├── format_wav.h # WAV file format definitions
|
||
|
|
│ ├── usb_msc.c # USB Mass Storage implementation
|
||
|
|
│ └── ...
|
||
|
|
└── ...
|
||
|
|
```
|
||
|
|
|
||
|
|
## Development Conventions
|
||
|
|
|
||
|
|
### Coding Style
|
||
|
|
- Follow ESP-IDF coding conventions
|
||
|
|
- Use ESP_LOG macros for logging
|
||
|
|
- Handle errors with ESP_ERROR_CHECK and ESP_RETURN_ON_FALSE
|
||
|
|
- Use FreeRTOS tasks for concurrent operations
|
||
|
|
|
||
|
|
### Memory Management
|
||
|
|
- Use DMA-capable memory allocation for I2S buffers
|
||
|
|
- Properly free allocated memory in error paths
|
||
|
|
- Monitor heap usage for memory leaks
|
||
|
|
|
||
|
|
### Power Efficiency
|
||
|
|
- Implement sleep modes when idle
|
||
|
|
- Use power management configuration appropriately
|
||
|
|
- Minimize active periods for battery operation
|
||
|
|
|
||
|
|
## Testing and Debugging
|
||
|
|
|
||
|
|
### Serial Monitor
|
||
|
|
Monitor the serial output for logging information:
|
||
|
|
- I2S initialization status
|
||
|
|
- SD card mounting results
|
||
|
|
- WiFi connection status
|
||
|
|
- Recording progress
|
||
|
|
- ASR API responses
|
||
|
|
|
||
|
|
### Console Commands
|
||
|
|
When USB MSC is active, the following console commands are available:
|
||
|
|
- `read` - Read README.MD file
|
||
|
|
- `write` - Create/update README.MD file
|
||
|
|
- `size` - Show storage capacity
|
||
|
|
- `expose` - Expose storage to USB host
|
||
|
|
- `status` - Show storage exposure status
|
||
|
|
- `exit` - Exit application
|
||
|
|
|
||
|
|
## Security Considerations
|
||
|
|
|
||
|
|
- WiFi credentials are hardcoded in `asr.c` (WIFI_SSID and WIFI_PASS)
|
||
|
|
- API token is hardcoded in `asr.c` (BEARER_TOKEN)
|
||
|
|
- HTTPS communication uses certificate validation
|
||
|
|
- Consider using NVS for storing sensitive information instead of hardcoding
|
||
|
|
|
||
|
|
## Known Issues and Limitations
|
||
|
|
|
||
|
|
- WiFi credentials and API tokens are hardcoded in source
|
||
|
|
- Recording duration is fixed at 20 seconds
|
||
|
|
- Only supports PCM WAV format
|
||
|
|
- Requires internet connectivity for ASR functionality
|
||
|
|
- USB MSC and file system access cannot be used simultaneously
|