Jon Nordby Head of Data Science & Machine Learning Soundsensing AS jon@soundsensing.no EuroPython 2021
Given input audio return the timestamps (start, end) for each event class
Events are sounds with a clearly-defined duration or onset.
Event (time limited) | Class (continious) |
---|---|
Car passing | Car traffic |
Honk | Car traffic |
Word | Speech |
Gunshot | Shooting |
Fermentation tracking when making alcoholic beverages. Beer, Cider, Wine, etc.
Fermentation activity can be tracked as Bubbles Per Minute (BPM).
Make a system that can track fermentation activity, outputting Bubbles per Minute (BPM), by capturing airlock sound using a microphone, using Machine Learning to count each “plop”
Need enough data.
Instances per class | Suitability |
---|---|
100 | Minimal |
1000 | Good |
10000+ | Very good |
Need realistic data. Capturing natural variation in
Note down characteristics of the sound
Converting to discrete list of events
To compute the Bubbles Per Minute
Github project: jonnor/brewing-audio-event-detection
General Audio ML: jonnor/machinehearing
Slack: Sound of AI community
Now that you know the basics of Audio Event Detection with Machine Learning in Python.
Want to deploy Continious Monitoring with Audio? Consider using the Soundsensing sensors and data-platform.
Get in Touch! contact@soundsensing.no
Want to work on Audio Machine Learning in Python? We have many opportunities.
Get in Touch! contact@soundsensing.no
Sound Event Detection with Machine Learning EuroPython 2021
Jon Nordby jon@soundsensing.no Head of Data Science & Machine Learning
Bonus slides after this point
Using a Gaussian Mixture, Hidden Markov Model (GMM-HMM)
How to get more datawithout gathering “in the wild”?
Key: Chopping up incoming stream into (overlapping) audio windows
import sounddevice, queue
# Setup audio stream from microphone
audio_queue = queue.Queue()
def audio_callback(indata, frames, time, status):
audio_queue.put(indata.copy())
stream = sounddevice.InputStream(callback=audio_callback, ...)
...
# In classification loop
data = audio_queue.get()
# shift old audio over, add new data
audio_buffer = numpy.roll(audio_buffer, len(data), axis=0)
audio_buffer[len(audio_buffer)-len(data):len(audio_buffer)] = data
new_samples += len(data)
# check if we have received enough new data to do new prediction
if new_samples >= hop_length:
p = model.predict(audio_buffer)
if p < threshold:
print(f'EVENT DETECTED time={datetime.datetime.now()}')
Can one learn Sound Event Detection without annotating the times for each event? Yes!
Criteria for inclusion:
Approx 1000 videos reviewed, 100 usable
Window length bit longer than the event length.
Overlapping gives classifier multiple chances at seeing each event.
Reducing overlap increases resolution! Overlap for AES: 10%