Environmental Sound Classification on microcontrollers

Jon Nordby
jon@soundsensing.no
tinyML Summit 2021

Introduction

Environmental Noise Pollution

The environmental pollution that affects most people in Europe

  • 13 million suffering from sleep disturbance
  • 900’000 disability-adjusted life years (DALY) lost

Occupational Noise-induced Hearing Loss

The most prevalent occupational disease in the world

  • 40 million affected by hearing loss from work
  • 4 million disability-adjusted life years (DALY) lost

Noise Monitoring with Machine Learning

Wireless Audio Sensor Networks

Model Constraints

Example target: STM32L476 microcontroller. With 50% of capacity:

  • 64 kB RAM
  • 512 kB FLASH memory
  • 4.5 M operations/second

Small models Urbansound8K

Green: Feasible region on device. 2021 results not published.

Shrinking
Convolutional Neural Networks
for TinyML Audio

How to did we make the model fit on device?

Pipeline

Typical audio pipeline. Spectrogram conversion, CNN on overlapped windows.

Reduce input dimensionality

  • Lower sample rate
  • Lower frequency range
  • Lower frequency resolution
  • Lower time duration in window
  • Lower time resolution

~10x reduction i compute. And easier to learn!

Reduce overlap

Models in literature use 95% overlap or more. 20x penalty in inference time!

Often small performance benefit. Use 0% (1x) or 50% (2x).

Use a small model!

Depthwise-separable Convolution

MobileNet, “Hello Edge”, AclNet. 3x3 kernel,64 filters: 7.5x speedup

Downsampling using max-pooling

Wasteful? Computing convolutions, then throwing away 3/4 of results!

Downsampling using strided convolution

“Learned” downsampling. Striding 2x2: Approx 4x speedup

Quantization

  • Using int8 instead of float32.
  • 4x improvement in weights (FLASH) and activations (RAM)
  • 4.6X improvement in runtime using CMSIS-NN SIMD

Ref “CMSIS-NN: Efficient Neural Network Kernels for ARM Cortex-M CPUs”

Latest developments

  • Binary network quantization
  • Neural Architecture Search
  • Streaming inference
  • Learned filterbanks
  • Hardware acceleration
  • Learned pooling

TinyML very actively researched, rapid improvements

Outro

Noise Monitoring example

Automated documentation of noise footprint wrt regulations
  • Based on Noise Event Detection & Classification
  • Tested successfully at shooting range
  • Expanding now to Construction and Industry noise

Condition Monitoring example

Condition Monitoring of technical equipment using sound.
Developed based on experience from Noise Monitoring.

Conclusions

  1. Audio classification of Environmental Noise can be done directly on sensor
  2. Made possible with a range of efficient CNN techniques
  3. Integrated into Soundsensing IoT sensors
  4. Used for Noise Monitoring & Condition Monitoring

We are open for partners and pilot projects
Get in touch!
contact@soundsensing.no


Questions ?

TinyML Summit 2021: Environmental Sound Classification on microcontrollers

Jon Nordby
jon@soundsensing.no

Bonus

Bonus slides after this point

Thesis results

All the info

Thesis: Environmental Sound Classification on Microcontrollers using Convolutional Neural Networks

Report & Code: https://github.com/jonnor/ESC-CNN-microcontroller

All models

Model comparison

List of results

Confusion

Grouped classification

Foreground-only

Unknown class

Thesis Methods

Standard procedure for Urbansound8k

  • Classification problem
  • 4 second sound clips
  • 10 classes
  • 10-fold cross-validation, predefined
  • Metric: Accuracy

Training settings

Training

  • NVidia RTX2060 GPU 6 GB
  • 10 models x 10 folds = 100 training jobs
  • 100 epochs
  • 3 jobs in parallel
  • 36 hours total

Evaluation

For each fold of each model

  1. Select best model based on validation accuracy
  2. Calculate accuracy on test set

For each model

  • Measure CPU time on device

Mel-spectrogram

More resources

Machine Hearing. ML on Audio

Machine Learning for Embedded / IoT

Thesis Report & Code

// reveal.js plugins