Property Graph Modeling with an FU Towards Supernodes – Jonathan Lacefield

Some notes along with this talk. Which is about ways to mitigate super nodes, partitioning strategies, and related efforts. Jonathan’s talk is vendor neutral, even though he works at DataStax. Albeit that’s not odd to me, since that’s how we roll at DataStax anyway. We take pride in working with DSE but also with knowing the various products out there, as things are, we’re all database nerds after all. (more below video)

In the video, I found the definition slide for super node was perfect.

supernodes.png

See that super node? Wow, Florida is just covered up by the explosive nature of that super node! YIKES!

In the talk Jonathan also delves deeper into the vertexes, adjacent vertices, and the respective neighbors. With definitions along the way, so it’s a great talk to watch even if you’re not up to speed on graph databases and graph math and all that related knowledge.

impacts

traversal.png

The super node problem he continues on to describe have two specific problems that are detailed; query performance traversals and storage retrieval. Such as a Gremlin traversal (one’s query), moving along creating traversers, until it hits a super node, where a computational explosion occurs.

Whatever your experience, this talk has some great knowledge to expand your ideas on how to query, design, and setup data in your graph databases to work against. Along with that more than a few elements of knowledge about what not to do when designing a schema for your graph data. Give a listen, it’s worth your time.

 

Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation

Part 2 of 3 – Coding Session in Go – Cobra + Viper CLI for Parsing Text Files, Retrieval of Twitter Data, Exports to various file formats, and export to Apache Cassandra.

Part 1 is available “Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files“.

Updated links to each part will be posted at bottom of  this post when I publish them. For code, written walk through, and the like scroll down below the video and timestamps.

Hacking Together a CLI Installing Cassandra, Setting Up the Twitter API, ENV Vars, etc.

0:04 Kick ass intro. Just the standard rocking tune.

3:40 A quick recap. Check out the previous write “Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files” of this series.

4:30 Beginning of completion of twitz parse command for exporting out to XML, JSON, and CSV (already did the text export previous session). This segment also includes a number of refactorings to clean up the functions, break out the control structures and make the code more readable.

In the end of refactoring twitz parse came out like this. The completed list is put together by calling the buildTwitterList() function which is actually in the helpers.go file. Then prints that list out as is, and checks to see if a file export should be done. If there is a configuration setting set for file export then that process starts with a call to exportParsedTwitterList(exportFilename string, exportFormat string, ... etc ... ). Then a simple single level control if then else structure to determine which format to export the data to, and a call to the respective export function to do the actual export of data and writing of the file to the underlying system. There’s some more refactoring that could be done, but for now, this is cleaned up pretty nicely considering the splattering of code I started with at first.

50:00 I walk through a quick install of an Apache Cassandra single node that I’ll use for development use later. I also show quickly how to start and stop post-installation.

Reference: Apache Cassandra, Download Page, and Installation Instructions.

53:50 Choosing the go-twitter API library for Go. I look at a few real quickly just to insure that is the library I want to use.

Reference: go-twitter library

56:35 At this point I go through how I set a Twitter App within the API interface. This is a key part of the series where I take a look at the consumer keys and access token and access token secrets and where they’re at in the Twitter interface and how one needs to reset them if they just showed the keys on a stream (like I just did, shockers!)

57:55 Here I discuss and show where to setup the environment variables inside of Goland IDE to building and execution of the CLI. Once these are setup they’ll be the main mechanism I use in the IDE to test the CLI as I go through building out further features.

1:00:18 Updating the twitz config command to show the keys that we just added as environment variables. I set these up also with some string parsing and cutting off the end of the secrets so that the whole variable value isn’t shown but just enough to confirm that it is indeed a set configuration or environment variable.

1:16:53 At this point I work through some additional refactoring of functions to clean up some of the code mess that exists. Using Goland’s extract method feature and other tooling I work through several refactoring efforts that clean up the code.

1:23:17 Copying a build configuration in Goland. A handy little thing to know you can do when you have a bunch of build configuration options.

1:37:32 At this part of the video I look at the app-auth example in the code library, but I gotta add the caveat, I run into problems using the exact example. But I work through it and get to the first error messages that anybody would get to pending they’re using the same examples. I get them fixed however in the next session, this segment of the video however provides a basis for my pending PR’s and related work I’ll submit to the repo.

The remainder of the video is trying to figure out what is or isn’t exactly happening with the error.

I’ll include the working findem code in the next post on this series. Until then, watch the wrap up and enjoy!

1:59:20 Wrap up of video and upcoming stream schedule on Twitch.

UPDATED SERIES PARTS

    1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
    2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation (this post)

 

Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files

Part 1 of 3 – Coding Session in Go – Cobra + Viper CLI for Parsing Text Files, Retrieval of Twitter Data, Exports to various file formats, and export to Apache Cassandra.

Updated links to each part will be posted at bottom of  this post when I publish them. For code, written walk through, and the like scroll down and I’ll have code samples toward the bottom of the timestamps under the video, so scroll, scroll, scroll.

3:40 Stated goals of application. I go through a number of features that I intend to build into this application, typing them up in a text doc. The following are the items I added to the list.

  1. The ability to import a text file of Twitter handles that are mixed in among a bunch of other text.
  2. The ability to clean up that list.
  3. The ability to export the cleaned up list of handles to a text, csv, JSON, xml, or plain text file.
  4. The ability, using the Twitter API, to pull data form the bio, latest tweets, and other related information.

github8:26 Creating a new Github Repo. A few tips and tricks with creating a new repo with the Github.

9:46 Twitz is brought into existence! Woo
~~ Have to reset up my SSH key real quick. This is a quick tutorial if you aren’t sure how to do it, otherwise skip forward to the part where I continue with project setup and initial coding.

12:40 Call out for @justForFunc and Francesc’s episode on Cobra!

Check out @Francesc’s “Just for Func”. It’s a great video/podcast series of lots and lots of Go!

13:02 Back to project setup. Cloning repo, getting initial README.emd, .gitignore file setup, and related collateral for the basic project. I also add Github’s “issues” files via the Github interface and rebase these changes in later.
14:20 Adding some options to the .gitignore file.
15:20 Set dates and copyright in license file.
16:00 Further setup of project, removing WIKIs, projects, and reasoning for keeping issues.
16:53 Opening Goland up after an update. Here I start going through the specific details of how I setup a Go Project in Goland, setting configuration and related collateral.
17:14 Setup of Goland’s golang Command Line Launcher.
25:45 Introduction to Cobra (and first mention of Viper by association).
26:43 Installation of Cobra with go get github.com/spf13/cobra/cobra/ and the gotchas (remember to have your paths set, etc).
29:50 Using Cobra CLI to build out the command line interface app’s various commands.
35:03 Looking over the generated code and commenting on the comments and their usefulness.
36:00 Wiring up the last bits of Cobra with some code, via main func, for CLI execution.
48:07 I start wiring up the Viper configuration at this point. Onward from here it’s coding, coding, and configuration, and more coding.

Implementing the `twitz config` Command

1:07:20 Confirming config and working up the twitz config command implementation.
1:10:40 First execution of twitz config and I describe where and how Cobra’s documentation strings work through the --help flag and is put into other help files based on the need for the short or long description.

The basic config code when implemented looked something like this end of session. This snippet of code doesn’t do much beyond provide the displace of the configuration variables from the configuration file. Eventually I’ll add more to this file so the CLI can be easier to debug when setting it up.

package cmd

import (
	"fmt"
	"github.com/spf13/cobra"
	"github.com/spf13/viper"
)

// configCmd represents the config command
var configCmd = &cobra.Command{
	Use:   "config",
	Short: "A brief description of your command",
	Long: `A longer description that spans multiple lines and likely contains examples
and usage of using your command. For the custom example:
Cobra is a CLI library for Go that empowers applications.
This application is a tool to generate the needed files
to quickly create a Cobra application.`,
	Run: func(cmd *cobra.Command, args []string) {
		fmt.Printf("Twitterers File: %s\n", viper.GetString("file"))
		fmt.Printf("Export File: %s\n", viper.GetString("fileExport"))
		fmt.Printf("Export Format: %s\n", viper.GetString("fileFormat"))<span data-mce-type="bookmark" id="mce_SELREST_start" data-mce-style="overflow:hidden;line-height:0" style="overflow:hidden;line-height:0;"></span>
	},
}

func init() {
	rootCmd.AddCommand(configCmd)
}

Implementing the `twitz parse` Command

1:14:10 Starting on the implementation of the twitz parse command.
1:16:24 Inception of my “helpers.go” file to consolidate and clean up some of the code.
1:26:22 REDEX Implementation time!
1:32:12 Trying out the REGEX101.com site.

I’ll post the finished code and write up some details on the thinking behind it when I post video two of this application development sessions.

That’s really it for this last session. Hope the summary helps for anybody working through building CLI apps with Go and Cobra. Until next session, happy thrashing code.

UPDATED SERIES PARTS

  1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files (this post)

How to Become a Data Scientist in 6 Months a Hacker’s Approach to Career Planning

I’ve been digging around for some good presentations, giving a listen to some videos from PyData London 2016. Out of the videos I watched, I chose this talk by Tetiana Ivanova. This is a good talk for those looking to get into working as a data scientist. Here’s a few notes I made while watching the video.

Why did Tetiana hack here career? Well, first off, she was a mathematician she wanted to get out of that world of academia and make a change.

…more, including references and links, below the video…

Why does our society support higher education? Tetiana points emphasizes a number of things that that interline in a person’s life such as social pressure, historical changes, prestige, and status signaling to dictate their career options and provide choices in direction. I found this fascinating as I’ve read about, and routinely notice that higher education at an established and known school is largely about prestige and status signaling more than the actual education itself. As we see all the time in society, people will gain a position of status and power all too often based on the prestige and status arbitrarily associated with them because they went to this school or that school or some Ivy League fancy pants school.

But wrap all that up, and Tetiana outlines an important detail, “Prestige is exploitable!”

There’s also some hard realities Tetiana bullet points in a slide that I found worthy of note.

  • Set a realistic time frame.
  • Don’t trust yourself with sticking to deadlines.
  • Make a study plan.
  • Prepare for uncertainty.

I can’t draw enough emphasis to this list. There are also two points that I’d like to point out even more detail.

First is “don’t trust yourself with sticking to deadlines“. The second is “prepare for uncertainty“. If you expect to meet all your deadlines, you are going to dramatically increase your need to prepare for uncertainty. Because you will simply not meet all of your deadlines. For many of us we won’t meet most of our deadlines, let alone all of the deadlines.

Tetiana continues to cover a number of details around willpower, self management and self organization, and an insightful take on the topic of nerd networking. Watch the talk, it’s a good one and will provide a lot of insight, from a clearly introspective and intelligent individual, into clearing and stepping into a data science – or possibly other – career path!

References:

Jonathan Ellis talks about Five Lessons in Distributed Databases

Notes on the talk…

  1. If it’s not SQL it’s not a database. Watch, you’ll get to hear why… ha!

Then Jonathan covers the recent history (sort of recent, the last ~20ish years) of the industry and how we’ve gotten to this point in database technology.

  1. It takes 5+ years to build a database.

Also the tens of millions of dollars with that period of time. Both are needed, in droves, time and money.

…more below the video.

  1. The customer is always right.

Even when they’re clearly wrong, they’re largely right.

For number 4 and 5 you’ll have to watch the video. Lot’s good stuff in this video including comparisons of Cosmos, Dynamo DB, Apache Cassandra, DataStax Enterprise, and how these distributed databases work, their performance (3rd Party metrics are shown) and more details!

7 Tips for Creating Technical Content on The Open Source Show

I’m in a video with the rad Christina Warren!

In this video we walk about 7 tips for creating technical content in a kind of rapid fire back and forth of ideas. Recording this was great, as the way we did it presented us with a chance to put these ideas together like this. Being that both of us have presented and helped people out presenting a few more times then we’ve been able to keep a count of, the crew leading this endeavor basically said, “start brainstorming” as if the show is live. We did that, and it worked rather well. Then the artists, animators, and crew went to work splicing and dicing the video into this watchable format! Lotsa fun, enjoy.

For each of these I’ve elaborated on in the past.

Those 7 Tips for Creating Technical Content

  1. Always be learning!
  2. Know your audience.
  3. Bring ALL the connectors.
  4. Backup, backup, backup your presentations.
  5. Write continuously, regularly, and tell yourself a story.
  6. Try tutorials on a fresh machine.
  7. Observe how people create and present content.

Even more details on all these in the future.

References:

 

 

Lena presents “So You Want to Run Data-Intensive Systems on Kubernetes”

If you’re interested in running data-intensive systems (think Apache Cassandra, DataStax Enterprise, Kafka, Spark, Tensorflow, Elasticsearch, Redis, etc) in Kubernetes this is a great talk. @Lenadroid covers what options are available in Kubernetes, how architectural features around pods, jobs, stateful sets, and replica sets work together to provide distributed systems capabilities. Other features she continues and delves into include custom resource definitions (CRDs), operators, and HELM Charts, which include future and peripheral feature capabilities that can help you host various complex distributed systems. I’ve included references below the video here, enjoy.

References: