Sound Event Detection with Machine Learning

Jon Nordby
Head of Data Science & Machine Learning
Soundsensing AS
jon@soundsensing.no

EuroPython 2021

Introduction

About Soundsensing

Sound Event Detection

Given input audio
return the timestamps (start, end)
for each event class

Events and non-events

Events are sounds with a clearly-defined duration or onset.

Event (time limited)	Class (continious)
Car passing	Car traffic
Honk	Car traffic
Word	Speech
Gunshot	Shooting

Application

Fermentation tracking when making alcoholic beverages. Beer, Cider, Wine, etc.

Alcohol is produced via fermentation

Airlock activity

In this video clip the fermentation process has started, with medium activity

CO2 is being pushed through the airlock, and escapes out at the top

As you can hear this makes a characteristic sound, a “plop” for each bubble of gas that escapes

This example has a very nice and clear sound. It is not always so nice.

This is something we can track using Machine Learning. We can have a microphone that picks up the sound, pass it through some software and use a machine learning model to detect each individual “plop”

Example of an event. Clear time-defined sound that we want to count.

If you count the plop or bubbling activity then you can estimate how much fermentation is going on. Can also be used to estimate alcohol content, though it is not very precise for that. It can tell at least whether fermentation has started or not, and roughly how the brew is progressing.

Fermentation tracking

Fermentation activity can be tracked as Bubbles Per Minute (BPM).

Our goal

Make a system that can track fermentation activity,
outputting Bubbles per Minute (BPM),
by capturing airlock sound using a microphone,
using Machine Learning to count each “plop”

Machine Learning needs Data!

Supervised Machine Learning

Data requirements: Quantity

Need enough data.

Instances per class	Suitability
100	Minimal
1000	Good
10000+	Very good

Data requirements: Quality

Need realistic data. Capturing natural variation in

the event sound
recording devices used
recording environment

Check the data

Understand the data

Note down characteristics of the sound

Event length
Distance between events
Variation in the event sound
Changes over time
Differences between recordings
Background noises
Other events that could be easily confused

Labeling data manually using Audacity

import pandas

labels = pandas.read_csv(path, sep='\t', header=None,
                        names=['start', 'end', 'annotation'],
                        dtype=dict(start=float,end=float,annotation=str))

Machine Learning system

Audio ML pipeline overview

Spectrogram

import librosa

audio, sr = librosa.load(path)
spec = librosa.feature.melspectrogram(y=audio, sr=sr)
spec_db = librosa.power_to_db(spec, ref=np.max)

lr.display.specshow(ps_db, x_axis='time', y_axis='mel')

CNN classifier model

from tensorflow import keras
from keras.layers import Convolution2D, MaxPooling2D

model = keras.Sequential([
        Convolution2D(filters, kernel,
                      input_shape=(bands, frames, channels)),
        MaxPooling2D(pool_size=pool),
....
])

Evaluation

Event Tracker

Converting to discrete list of events

Threshold the probability from classifier
Keep track of whether we are currently in an event or not

    if not inside_event and probability >= on_threshold:
        inside_event = True
        print('EVENT on', t, probability)
    if inside_event and probability <= off_threshold:
        inside_event = False
        print('EVENT off', t, probability)

Statistics Estimator

To compute the Bubbles Per Minute

Using the typical time-between-events
Assumes regularity
Median more robust against outliers

Tracking over time using Brewfather

# API documentation: https://docs.brewfather.app/integrations/custom-stream
import requests

url = 'http://log.brewfather.net/stream?id=9MmXXXXXXXXX'
data = dict(name='brewaed-0001', bpm=CALCULATED-BPM)
r = requests.post(url, json=data)

Outro

More resources

Github project: jonnor/brewing-audio-event-detection

General Audio ML: jonnor/machinehearing

Sound Event Detection: A tutorial. Virtanen et al.
Audio Classification with Machine Learning (EuroPython 2019)
Environmental Noise Classification on Microcontrollers (TinyML 2021)

Slack: Sound of AI community

What do you want make?

Now that you know the basics of Audio Event Detection with Machine Learning in Python.

Popcorn popping
Bird call
Cough
Umm/aaa speech patterns
Drum hits
Car passing

Continious Monitoring using Audio ML

Want to deploy Continious Monitoring with Audio?
Consider using the Soundsensing sensors and data-platform.

Get in Touch! contact@soundsensing.no

Join Soundsensing

Want to work on Audio Machine Learning in Python?
We have many opportunities.

Full-time positions
Part-time / freelance work
Engineering thesis
Internships
Research or industry partnerships

Get in Touch! contact@soundsensing.no

Questions ?

Sound Event Detection with Machine Learning
EuroPython 2021

Jon Nordby
jon@soundsensing.no
Head of Data Science & Machine Learning

Bonus

Bonus slides after this point

Semi-automatic labelling

Using a Gaussian Mixture, Hidden Markov Model (GMM-HMM)

import hmmlearn.hmm, librosa, sklearn.preprocessing

features = librosa.feature.mfcc(audio, n_mfcc=13, ...)
model = hmmlearn.hmm.GMMHMM(n_components=2, ...)
X = sklearn.preprocessing.StandardScaler().fit_transform(data)
model.fit(X)
probabilities = model.score_samples(X)[1][:,1]

Synthesize data

How to get more data
without gathering “in the wild”?

Mix in diffent kinds of background noise.
Vary Signal to Noise ratio etc
Useful to estimate performance on tricky, not-yet-seen data
Can be used to compensate for small amount of training data
scaper Python library: github.com/justinsalamon/scaper

Streaming inference

Key: Chopping up incoming stream into (overlapping) audio windows

import sounddevice, queue

# Setup audio stream from microphone
audio_queue = queue.Queue()

def audio_callback(indata, frames, time, status):
    audio_queue.put(indata.copy())

stream = sounddevice.InputStream(callback=audio_callback, ...)
...

# In classification loop
    data = audio_queue.get()
    # shift old audio over, add new data
    audio_buffer = numpy.roll(audio_buffer, len(data), axis=0)
    audio_buffer[len(audio_buffer)-len(data):len(audio_buffer)] = data
    new_samples += len(data)
    # check if we have received enough new data to do new prediction
    if new_samples >= hop_length:
        p = model.predict(audio_buffer)
        if p < threshold:
            print(f'EVENT DETECTED time={datetime.datetime.now()}')

Event Detection with Weakly Labeled data

Can one learn Sound Event Detection
without annotating the times for each event?

Yes!

Referred to as weekly labeled Sound Event Detection
Can be tackled with Multiple Instance Learning
Inputs: Audio clips consisting of 0-N events
Labels: True if any events in clip, else false
Multiple analysis windows per 1 label
Using temporal pooling in Neural Network

Data collection via Youtube

Criteria for inclusion:

Preferably couple of minutes long, minimum 15 seconds
No talking to the camera
Mostly stationary camera
No audio editing/effects
One or more airlocks bubbling
Bubbling can be heard by ear

Approx 1000 videos reviewed, 100 usable

Characteristics of Audio Events

Duration
Tonal/atonal
Temporal patterns
Percussive
Frequency content
Temporal envelope
Foreground vs background
Signal to Noise Ratio

Analysis windows

Window length bit longer than the event length.

Overlapping gives classifier multiple chances at seeing each event.

Reducing overlap increases resolution! Overlap for AES: 10%