Erika Pelaez Presenting “Building a Machine Learning Classifier to Listen to Killer Whales”

UPDATED: Video Added from the Conference!

Introducing Erika Pelaez > @midoridancer < presenting “Building a Machine Learning Classifier to Listen to Killer Whales“.

erika-pelaezData Scientist with experience and love for Machine Learning. I like making smart things and have worked with a diverse types of data from biological to transportation.

Did you know that Machine Learning can help protect killer whales? Orcasound ( is a magnificent open source project for people to connect to their neighbor killer whales of the pacific northwest. They provide a platform that continuously broadcasts the underwater sounds from the Puget Sound area to anyone willing to listen. Come to my talk and I’ll show you how we’re applying Machine Learning with the Scikit-learn Python library against web scraped audio to build models that can be used for signal collection and classification.

Avid users spend time listening to mostly noise or ships but frankly the majority is there just to hear the orca sounds. The present talk will narrate the process we are following to automatically detect and classify orca vs non orca (false killer whale and humpback whale) sounds.

The first challenge faced with this project was getting labeled data as the raw transmissions are unlabeled so my first approach was to build a dataset from scratch by web scraping audio from the internet using Beautiful soup then I used a bash script for making sure all the files had a maximum duration of 5s as this is the duration of the files that orcasound saves. The feature extraction of the audio files was handled with the librosa library from Python. I finally built a Random Forest Classifier with the scikit-learn Python library that was able to reach an accuracy of 99 +/-2% with a 10 fold cross validation. The next steps will be to have the model accessible through an API and send the collected signals to it for classification, after a signal is detected a notification could be send to the users so they don’t have to listen to all the noise.

Come check out Erika Pelaez’s talk at ML4ALL happening April 28th-30th in amazing Portland, Oregon! Get your tickets to attend here. For the schedule, our excellent sponsors docs for the conference, check out the ML4ALL Conference Site!