Category Archives: Unknown Code Ramblings

In Flight to Apache Cassandra Days

Another flight down to the bay area. Today it was Alaska Air Flight 330 from Seattle to San Jose. It was mostly a clear day at start, with a solid layer of bright cloud cover exiting Washington on the way down to Oregon. As we crossed over that arbitrary human defined line of Oregon and California, nature presented us with even more perfectly glowing bright cloud cover. This is Cascadia after all and it’s basically covered in clouds the majority of the time. On departure I also noted Bremerton has three aircraft carriers in dock along with a normal plethora of other naval vessels. The amount of naval power in the area is always pretty awe inspiring.

Why was I in flight once again? I am heading down to teach with Jeff Carpenter (@jscarp) at the South Bay Cassandra User Group‘s Cassandra Day events. These are single day events, where we cover an introduction to Apache Cassandra, concepts of data-modeling for Apache Cassandra, and then a wrap up of application development with the respective drivers. Now if you aren’t in Santa Clara – or ya know Menlo Park, San Jose, Oakland, San Francisco, or well, the surrounding area – there are other days scheduled! We also have days scheduled that aren’t even located in the Bay, so check out the full list of events:

https://www.datastax.com/company/events

NOTE: If you’re interested in Seattle, Portland, or Vancouver BC area events, scroll all the way down to the end of this blog entry I’ve got more details for you!

Introduction to Apache Cassandra

In the introduction to Apache Cassandra we cover an overview of the architecture and features of the distributed database. Starting off with a definition of a distributed hash ring and how this is used in Apache Cassandra to provide data storage across the nodes that make up the Apache Cassandra Database. Moving on we’ll get into the other capabilities, trade offs of data replication between nodes, configuration settings, and a lot more.

Data Modeling

For data modeling we start off with a short review of relational database data modeling to provide something that is more familiar for many people. From this, we then build off of many concepts around denormalization, breaking apart various levels of normalization forms, and then get into the thinking and approach behind modeling an application in a distributed database and go deeper with details around Apache Cassandra.

Application Development

For application development, focusing around the Java language and technology stack, we’ll start with some concepts around how the drivers connect to and work with Apache Cassandra. We’ll open up some code too, get into some code changes and additions, to get more familiar with how the driver works and some of the capabilities of the driver itself.

Most of the code, concepts, and related material in use around Java and the tech stack are directly usable on C#, JavaScript, and even using the community open source Go CQL Library.

Coming soon…

In the coming weeks (ok, maybe a month or two) we’ll be updating this material for Apache Cassandra v4 and additionally, I’m aiming to line up some half day and probably some full day workshops in the Cascadian region: Portland, Seattle, and Vancouver BC. They’ll be almost identical except for a few tweaks, but you’ll have to RSVP to find out the details!

Also, if you’re in between any of those cities and have a stop on the Amtrak Cascades, let me know and we’ll get an RSVP list started for your city and see if we can get the required attendee count to make it official!

Seth Jaurez Presenting “Deep Learning for Computer Vision with Applications for Wildlife Conservation”

seth-juarezIntroducing Seth Jaurez > @sethjuarez < presenting “Deep Learning for Computer Vision with Applications for Wildlife Conservation”.

Seth Juarez holds a Master’s Degree in Computer Science where his field of research was Artificial Intelligence, specifically in the realm of Machine Learning. Seth is a Microsoft Evangelist working with the Channel 9 team. When he is not working in that area, Seth devotes his time to an open source Machine Learning Library, specifically for .NET, intended to simplify the use of popular machine learning models, as well as complex statistics and linear algebra.

Deep Learning is one of the buzziest of tech buzz words at the moment. This session is designed to remove the mysticism from this amazing form of Artificial Intelligence. Attendees will be led through the wilderness of the foundational principles underlying modern machine learning until they reach the oasis of a full-fledged understanding of deep learning for computer vision. While this session is not for the faint of heart, initiates will attain a solid understanding of the basics of machine learning all the way through powerful image-deciphering convolutional neural networks.

Come check out Seth Jaurez’s talk at ML4ALL happening April 28th-30th in amazing Portland, Oregon! Get your tickets to attend here. For the schedule, our excellent sponsors docs for the conference, check out the ML4ALL Conference Site!

Erika Pelaez Presenting “Building a Machine Learning Classifier to Listen to Killer Whales”

Introducing Erika Pelaez > @midoridancer < presenting “Building a Machine Learning Classifier to Listen to Killer Whales“.

erika-pelaezData Scientist with experience and love for Machine Learning. I like making smart things and have worked with a diverse types of data from biological to transportation.

Did you know that Machine Learning can help protect killer whales? Orcasound (https://github.com/orcasound) is a magnificent open source project for people to connect to their neighbor killer whales of the pacific northwest. They provide a platform that continuously broadcasts the underwater sounds from the Puget Sound area to anyone willing to listen. Come to my talk and I’ll show you how we’re applying Machine Learning with the Scikit-learn Python library against web scraped audio to build models that can be used for signal collection and classification.

Avid users spend time listening to mostly noise or ships but frankly the majority is there just to hear the orca sounds. The present talk will narrate the process we are following to automatically detect and classify orca vs non orca (false killer whale and humpback whale) sounds.

The first challenge faced with this project was getting labeled data as the raw transmissions are unlabeled so my first approach was to build a dataset from scratch by web scraping audio from the internet using Beautiful soup then I used a bash script for making sure all the files had a maximum duration of 5s as this is the duration of the files that orcasound saves. The feature extraction of the audio files was handled with the librosa library from Python. I finally built a Random Forest Classifier with the scikit-learn Python library that was able to reach an accuracy of 99 +/-2% with a 10 fold cross validation. The next steps will be to have the model accessible through an API and send the collected signals to it for classification, after a signal is detected a notification could be send to the users so they don’t have to listen to all the noise.

Come check out Erika Pelaez’s talk at ML4ALL happening April 28th-30th in amazing Portland, Oregon! Get your tickets to attend here. For the schedule, our excellent sponsors docs for the conference, check out the ML4ALL Conference Site!

Learning Go Episode 3 – More Data Types, Casting, Rendering an SVG file, and writing to Files.

Episode Post & Video Links:  1, 2, 3 (this post), 4 (almost done)

Episode 3 of my recurring “Learning Go” Saturday stream really got more into the particulars of Go data types including integers, strings, more string formatting verbs, concatenation, type casting, and lots of other pedantic details. In the episode I also delve into some OBS details with the audience, and we get the Twitch interface I’ve setup a bit more streamlined for easier readability. Overall, I think it’s looking much better than just the last episode! Hats off to the conversational assist from the audience.

Here’s the play by play of what was covered in episode 3 with the code in gists plus the repo is available on Github. Video below the timeline.

Timeline

0:00 Intro
6:08 The point I fix the sound. Just skip that first bit!
6:24 Re-introducing the book I’m using as a kind of curriculum guide for these go learning sessions.
7:44 Quick fix of the VM, a few updates, discussion of Goland updates, and fixing the Material Theme to a more visually less caustic theme. Also showing where it is in the IDE.
9:52 Getting into the learning flow, starting a new project with Go 1.11.4 using the Goland IDE new project dialog.


10:50 Creating the Github repo for learning-go-episode-3.
12:14 Setting up the initial project and CODING! Finally getting into some coding. It takes a while when we do it from nothing like this, but it’s a fundamentally important part of starting a project!
13:04 From nothing, creating the core basic elements of a code file for go with main.go. In this part I start showing the various ways to declare types, such as int and int64 with options on style.
14:14 Taking a look at printing out the various values of the variables using formatter verbs via the fmt.Printf function.
17:00 Looking at converting values from one type to another type. There are a number of ways to do this in Go.

I also, just recently, posted a quick spot video and code (blog entry + code) on getting the minimum and maximum value in Go for a number of types. This isn’t the course video, just a quick spot. Keep reading for the main episode below.


18:16 Oh dear the mouse falls on the ground. The ongoing battle of streaming, falling objects! But yeah, I get into adding a function – one of the earlier functions being built in the series – and we add a signature with a return int64 value. I continue, with addition of another function and looking at specifics of the signature.
25:50 Build this code and take a look at the results. At this point, some of the formatting is goofed up so I take a look into the formatter verbs to figure out what should be used for the output formatting.
33:40 I change a few things and take a look at more output from the various calculations that I’ve made, showing how various int, int64, and related calculations can be seen.
37:10 Adding a constant, what it is, and when and where and why to declare something as a constant.
38:05 Writing out another for loop for output results of sets.
42:40 A little git work to create a branch, update the .gitignore, and push the content to github. Repo is here btw: https://github.com/Adron/learning-go-episode-3

At this point I had to take a short interruption to get my ssh keys setup for this particular VM so I could push the code! I snagged just a snippet of the video and made a quick spot video out of it too. Seems a useful thing to do.

47:44 Have to add a new ssh key for the virtual machine to github, so this is a good little snippet of a video showing how that is done.
56:38 Building out a rendering of an SVG file to build a graphic. The complete snippet is below, watch the video for more details, troubleshooting, and working through additions and refactoring of the code.

1:15:32 We begin the mission of bumping up the font size in Goland. It’s a little tricky but we get it figured out.
1:33:20 Upon realization, we need to modify for our work, that this outputs directly to a file instead of just the console. Things will work better that way so I work into the code a write out to file.
1:40:05 Through this process of changing it to output to file, I have to work through additional string conversions, refactoring, and more. There’s a lot of nuance and various things to learn during this section of the video, albeit a little slow. i.e. LOTS of strconv usage.
2:01:24 First view of the generated SVG file! Yay! Oh dear!
2:09:10 More troubleshooting to try and figure out where the math problem is!
2:22:50 Wrapping up with the math a little off kilter, but sort of fixed, I move on to getting a look into the build but also pushing each of the respective branches on github. Repo is here btw: https://github.com/Adron/learning-go-episode-3

Creating Distributed Database Application Starter Kits

I’ve boarded a bus, and as always, when I board a bus I almost always code. Unless of course there are people I’m hanging out with then I chit chat, but right now this is the 212 and I don’t know anybody on this chariot anyway. So into the code I go.

I’ve been re-reviewing the Docker and related collateral we offer at DataStax. In that review it seems like it would be worth having some starter kit applications along with these “default” Docker options. This post I’ve created to provide the first language & tech stack of several starter kits I’m going to create.

Starter Kit – The Todo List Template

This first set of starter kits will be based upon a todo list application. It’s really simple, minimal in features, and offers a complete top to bottom implementation of a service, and an application on top of that service all built on Apache Cassandra. In some places, and I’ll clearly mark these places, I might add a few DataStax Enterprise features around search, analytics, or graph.

The Todo List

Features: The following detail the features, from the users perspective, that this application will provide. Each implementation will provide all of these features.

  • A user wants to create a user account to create todo lists with.
  • A user wants to be able to store a username, full name, email, and some simple notes with their account.
  • A user wants to be able to create a todo list that is identified by a user defined name. (i.e. “Grocery List”, “Guitar List”, or “Stuff to do List”)
  • A user want to be able to logout and return, then retrieve a list from a list of their lists.
  • A user wants to be able to delete a todo list.
  • A user wants to be able to update a todo list name.
  • A user wants to be able to add items to a todo list.
  • A user wants to be able to update items in the todo list.
  • A user wants to be able to delete items in a todo list.

Architecture: The following is the architecture of the todo list starter kit application.

  • Database: Apache Cassandra.
  • Service: A small service to manage the data tier of the application.
  • User Interface: A web interface using React/Vuejs ??

As you can see, some of the items are incomplete, but I’ll decide on them soon. My next review is to check out what I really want to use for the user interface, and also to get a user account system figured out. I don’t really want to create the entire user interface, but instead would like to use something like Auth0 or Okta.

May I Ask?

There are numerous things I’d love help with. Are there any user stories you think are missing? Should I add something? What would make these helpful to you? Leave a comment, or tweet at me @Adron. I’d be happy to get some feedback and other’s thoughts on the matter so that I can ensure that these are simple, to the point, usable, and helpful to people. Cheers!

Lena presents “So You Want to Run Data-Intensive Systems on Kubernetes”

If you’re interested in running data-intensive systems (think Apache Cassandra, DataStax Enterprise, Kafka, Spark, Tensorflow, Elasticsearch, Redis, etc) in Kubernetes this is a great talk. @Lenadroid covers what options are available in Kubernetes, how architectural features around pods, jobs, stateful sets, and replica sets work together to provide distributed systems capabilities. Other features she continues and delves into include custom resource definitions (CRDs), operators, and HELM Charts, which include future and peripheral feature capabilities that can help you host various complex distributed systems. I’ve included references below the video here, enjoy.

References:

‘bash’ A.K.A. The Solution for Everything – Bourne Shell as per v7 Unix to Today’s

In 1979 Unix v7 started being distributed with the original Bourne Shell. Simply, it’s a program that sits at /bin/sh and runs in the terminal. You may ask, “what’s the difference between a shell and the terminal?” Let’s cover that right now, because it’s something that routinely isn’t common knowledge, but it really ought to be as it sets the basis for understanding a lot of what is going on in Unix based systems (that includes almost every practical system on a PC, Server, in the cloud, on your phones, and more. Probably easiest to explain it simply as everything that isn’t the Microsoft Windows OS)

A Shell and the Terminal

Terminal – A terminal is the text input and output environment on the system.

Shell – This is the command line interpreter that is run at the terminal.

Another point of context, is that a terminal, shell, and the word console are all used in various ways and sometimes interchangeably. However, these words do not mean the same thing at all. They are distinct individual parts of the system. For example, console, which is used in a strangely disingenuous way all over Microsoft phrasing, is the physical terminal of the system, which is where the system terminal, i.e. the thing I’ve described above, actually runs in so that we can type and interface with it as humans.

Albeit, as English does, these definitions aren’t always taken as the exact, appropriate, and pedantically correct definitions today. For example, many at Microsoft argue that the console is just the terminal, that the terminal is the console. Sure, ok, that’s fine I can still follow along in the conversation, and this adds context, for when someone steps out of line and uses the more historically specific definition in context of a conversation.

Alright, that’s all groovy, so now we can get back to just talking about the shell, all the power it gives the Unix/Linux/POSIX System user, and touch on the terminal or console as we need to with full context of what these things actually are!

Gnu-bash-logo.svgIntroducing Bash!

Alright, with that little bit of context around Bourne Shell, let’s talk about what we’ve actually got today running as our shell in our terminal on our console on the computers we work with! The Bourne Shell, years later had a replacement written for it by Brian Fox. He released it in 1989 and over the years it became a kind of defacto replacement of Bourne Shell. The term ‘BASH’ stands for Bourne Again SHell.

440px-Bash_screenshotThe Bash command syntax is a superset of the Bourne Shell syntax. It provides a wide range of commands that includes ideas drawn from the Korn shell (ksh) and the C shell (csh) such as command line editing, command history, the directory stack, the $RANDOM and $PPID variables, and POSIX command substitution syntax. If many of those things make you think, “WTF are these variables and such?” have no worries, I’ll get to em’ soon enough in this series!

But with that, this is the beginning of many short entries on tips, tricks, tutorials, syntax, history, and context of bash so until next time, cheers!

References & Collected Materials