Anna Widiger (@widiger_anna) has a B.A. degree in Computational Linguistics from University of Tübingen. She’s been doing NLP since her very first programming assignment, specializing in Russian morphology, German syntax, cross-lingual named entity recognition, topic modeling, and grammatical error detection.
Anna describes “Your First NLP Machine Learning Project: Perks and Pitfalls of Unstructured Data” to us. Faced with words instead of numbers, many data scientists prefer to feed words straight from csv files into lists without filtering or transformation, but there is a better way! Text normalization improves the quality of your data for future analysis and increases the accuracy of your machine learning model.
Which text preprocessing steps are necessary and which ones are “nice-to-have” depends on the source of your data and the information you want to extract from it. It’s important to know what goes into the bag of words and what metrics are useful to compare word frequencies in documents. In this hands-on talk, I will show some do’s and don’ts for processing tweets, Yelp reviews, and multilingual news articles using spaCy.
- Manuel -> Barriers To Accelerating The Training Of Artificial Neural Networks – A Systemic Perspective
- Ashutosh -> Conducting a Data Science Contest in Your Organization
- Poul -> From Zero to Machine Learning for Everyone
- Anna -> Your First NLP Machine Learning Project: Perks and Pitfalls of Unstructured Data
- Kaleo Ha’o -> Jump or Not to Jump: Solving Flappy Bird with Deep Reinforcement Learning
- Mary & Robert -> Eliminating Machine Bias & Getting from Machine Learning Outputs to Decisions… timely things!
- Kirsten & Rich -> Now, Kirsten Westeinde & Rich Jolly
- Jon & Amy -> Say Hello to Jon and Amy @ML4ALL!
- Igor & Carol -> More @ML4ALL Intros: Meet Igor & Carol! Also, the bike ride map!
- Clair & Ricky -> @ML4ALL Meet Clair J. Sullivan & Ricky Hennessy
- Paige & Suz -> ML4ALL Speakers – Meet Paige & Suz