This week we just had Ryan Zhang present at the Seattle Scalability Meetup. I did a little short presentation just showing some tools that I’ve been using as of late; DataGrip, and related schema migrations and Docker containers as I work through the schema migrations. It was a solid meetup and excellent conversation after meetup, big thanks to everybody who came out to the meetup and joined us for a round of drinks, amazing cheese curds and hummus at Collin’s afterwards! It was a great meetup and looking forward to getting together again on May 28th with Guinevere (@guincodes) presenting “The PR That Wouldn’t Merge“!
In other upcoming events that I’ll be at either presenting or attending. The events I’m attending let’s get talk, I’m always interested in meeting new people and learning about you’re working on, what you’re learning, and where and what efforts are of interest to you. For the events I’m presenting at the same applies, plus I’ll be standing among all the persons and presenting whatever tidbit of knowledge I’ve come to present. Hopefully it’ll be useful and informative for you and we can continue the conversation after the presentation and we all gain more insight, ideas, and ways to move forward more productively with our respective efforts. Here’s a list of the next big meetups and conferences I’m either speaking at or attending, and hope to see and meet many of you dear \m/ readers there!
Introducing Zhi Yang > @zhiiiyang < presenting “Hierarchical Topic Modeling in Cancer Research”.
Topic models have been widely applied to extract topics from various range of documents or collections of texts, i.e., online customers reviews, medical records, scientific
journals, legal documents, books and etc. Its application facilitates the process for us to quickly understand the most featured and commonly shared information embedded texts without actually reading through the entire collection. In addition, topic models also allow us to access the contribution of each topic and its representations across different documents. Human genomes have been exposed to an assortment of mutational processes by contributing to unique patterns of somatic mutations. What would happen if we apply the same concept to the somatic mutations obtained from the cancer patients and look for “topics” of mutations? What would these “topics” tell us about the most important information for our health, genetic, risk factors for cancer and
something more that slip under the radar?
Shiraishi et al’s have proposed a topic model targeted for somatic mutations to capture the characteristics and burdens contributed by mutational processes. By closely examining the burdens, we’d like to compare them across different categories, say, for example, time, cancer subtype, ethnicity, smoking history, etc. Then, we’d like to develop the statistical machinery to infer the difference between the mutational profiles across different categories and associate the variations with the know exposures. This tool is potentially useful for identifying novel and existing mutational processes and correlating them with risk factors in which later can be used to monitor any treatment effects in personalized medicine and targeted therapy.
Introducing Sachi Parikh > @parikhsachi < presenting “My Journey Learning ML and AI through Self Study as a High School Student”.
Sachi is a high school student in the Bay Area who is interested in AI and Machine Learning and loves to code, read and learn. In the talk she’s put together for us she’s delved into the path she’s taken to get into this topic. I’ve seen an outline of this path and I’ll admit, I’m impressed, but you’ll have to come and attend to talk to see the outline!
Introducing Karl Weinmeister > @kweinmeister < presenting “Build, train, and serve your ML models on Kubernetes with Kubeflow”.
Karl is a Developer Advocacy Manager from Google’s Developer Relations Artificial Intelligence and Machine Learning team. Karl has worked extensively in cloud and mobile, and was a contributor to one of the first AI-based crossword puzzle solvers that is still referenced today.
Distributing ML workloads across multiple nodes has become common. To achieve higher and higher levels of accuracy, data scientists are using more data and more complex models than ever before.
Kubeflow is an open-source platform for model building, serving, and training. It is built on industry standard Kubernetes infrastructure and runs in multiple clouds and on-premises.
In this session, we’ll discuss the problems that Kubeflow solves, and how you can use it to create reproducible ML workflows.
That isn’t all though, Aeva gave a lightning talk too, included below!
Introducing Aeva van der Veen > @aevavoom < presenting “Gaming Rigs and ML Pipelines: how to get started with the tools you already have”.
Aeva is an outspoken open source advocate with over a decade experience contributing to F/OSS software and communities. They have been building distributed systems on Linux-based systems since ’99, and are most well known for their work in the OpenStack community wherein they founded Ironic, the Bare-Metal-as-a-Service project. Aeva lives in rainy Seattle and enjoys staying home when not travelling for work.
If you think that only big tech companies or PhD scientists can use ML & AI, I’d like to show you that an individual open-source enthusiast can build and train a model on commodity hardware using Open Data – and then scale that up on a public cloud.
And if you’re a PC gamer, you probably already have all the tools you need!
fast.ai, an easy-to-learn Python ML framework
nvidia-docker on an Ubuntu Gaming PC
public-domain GIS imagery
a couple terabytes of storage space and a fast internet connection
This talk grew out of a startup competition last year: we tried to use public-domain satellite imagery to help predict and prevent forest fires. Even though we chose not to pursue this as a business, it’s an excellent example of how combine open source software, public data, and a gaming PC to build an ML pipeline.