Jon Nordby jon@soundsensing.no
November 19, 2019
Internet of Things specialist
Now:
Environmental Sound Classification on Microcontrollers using Convolutional Neural Networks
Want: 1 year lifetime for palm-sized battery
Need: <1mW
system power
STM32L4 @ 80 MHz. Approx 10 mW.
Human presence detection. VGG8 on 64x64 RGB image, 5 FPS: 7 mW.
Audio ML approx 1 mW
2.9 TOPS/W. AlexNet, 1000 classes, 10 FPS. 41 mWatt
Audio models probably < 1 mWatt.
Given an audio signal of environmental sounds,
determine which class it belongs to
With 50% of STM32L476 capacity:
eGRU: running on ARM Cortex-M0 microcontroller, accuracy 61% with non-standard evaluation
Models in literature use 95% overlap or more. 20x penalty in inference time!
Often low performance benefit. Use 0% (1x) or 50% (2x).
MobileNet, “Hello Edge”, AclNet. 3x3 kernel,64 filters: 7.5x speedup
EffNet, LD-CNN. 5x5 kernel: 2.5x speedup
Wasteful? Computing convolutions, then throwing away 3/4 of results!
Striding means fewer computations and “learned” downsampling
:::
:::
Inference can often use 8 bit integers instead of 32 bit floats
~ 10mW
power,Can this be faster than the standard FFT? And still perform well?
Machine Hearing. ML on Audio
Machine Learning for Embedded / IoT
Thesis Report & Code
Email: jon@soundsensing.no
Email: jon@soundsensing.no
Foreground-only
Standard procedure for Urbansound8k
For each fold of each model
For each model
And the bugs can be hard to spot
Reduces health due to stress and loss of sleep
In Norway
In Europe
Simulation only, no direct measurements