Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval

Part 3 of 3 – Coding Session in Go – Cobra + Viper CLI for Parsing Text Files, Retrieval of Twitter Data, and Exports to various file formats.

UPDATED PARTS:

  1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
  2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation
  3. Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval (this post)

Updated links to each part will be posted at bottom of  this post when I publish them. For code, written walk through, and the like scroll down below the video and timestamps.

0:54 The thrashing introduction.
3:40 Getting started, with a recap of the previous sessions but I’ve not got the sound on so ignore this until 5:20.
5:20 I notice, and turn on the volume. Now I manage to get the recap, talking about some of the issues with the Twitter API. I step through setup of the app and getting the appropriate ID’s and such for the Twitter API Keys and Secrets.
9:12 I open up the code base, and review where the previous sessions got us to. Using Cobra w/ Go, parsing and refactoring that was previously done.
10:30 Here I talk about configuration again and the specifics of getting it setup for running the application.
12:50 Talking about Go’s fatal panic I was getting. The dependency reference to Github for the application was different than what is in application and don’t show the code that is actually executing. I show a quick fix and move on.
17:12 Back to the Twitter API use by using the go-twitter library. Here I review the issue and what the fix was for another issue I was having previous session with getting the active token! Thought the library handled it but that wasn’t the case!
19:26 Now I step through creating a function to get the active oath bearer token to use.
28:30 After deleting much of the code that doesn’t work from the last session, I go about writing the code around handling the retrieval of Twitter results for various passed in Twitter Accounts.

The bulk of the next section is where I work through a number of functions, a little refactoring, and answering some questions from the audience/Twitch Chat (working on a way to get it into the video!), fighting with some dependency tree issues, and a whole slew of silliness. Once that wraps up I get some things committed into the Github repo and wrap up the core functionality of the Twitz Application.

58:00 Reviewing some of the other examples in the go-twitter library repo. I also do a quick review of the other function calls form the library that take action against the Twitter API.
59:40 One of the PR’s I submitted to the project itself I review and merge into the repo that adds documentation and a build badge for the README.md.
1:02:48 Here I add some more information about the configuration settings to the README.md file.

1:05:48 The Twitz page is now updated: https://adron.github.io/twitz/
1:06:48 Setup of the continuous integration for the project on Travis CI itself: https://travis-ci.org/Adron/twitz
1:08:58 Setup fo the actual travis.yml file for Go. After this I go through a few stages of troubleshooting getitng the build going, with some white space in the ole’ yaml file and such. Including also, the famous casing issue! Ugh!
1:26:20 Here I start a wrap up of what is accomplished in this session.

NOTE: Yes, I realize I spaced and forgot the feature where I export it out to Apache Cassandra. Yes, I will indeed have a future stream where I build out the part that exports the responses to Apache Cassandra! So subcribe, stay tuned, and I’ll get that one done ASAP!!!

1:31:10 Further CI troubleshooting as one build is green and one build is yellow. More CI troubleshooting! Learn about the travis yaml here.
1:34:32 Finished, just the bad ass outtro now!

The Codez

In the previous posts I outlined two specific functions that were built out:

  • Part 1 – The config function for the twitz config command.
  • Part 2 – The parse function for the twitz parse command.

In this post I focused on updating both of these and adding additional functions for the bearer token retrieval for auth and ident against the Twitter API and other functionality. Let’s take a look at what the functions looked like and read like after this last session wrap up.

The config command basically ended up being 5 lines of fmt.Printf functions to print out pertinent configuration values and environment variables that are needed for the CLI to be used.


var configCmd = &cobra.Command{
Use: "config",
Short: "A brief description of your command",
Long: `A longer description that spans multiple lines and likely contains examples
and usage of using your command. For the custom example:
Cobra is a CLI library for Go that empowers applications.
This application is a tool to generate the needed files
to quickly create a Cobra application.`,
Run: func(cmd *cobra.Command, args []string) {
fmt.Printf("Twitterers File: %s\n", viper.GetString("file"))
fmt.Printf("Export File: %s\n", viper.GetString("fileExport"))
fmt.Printf("Export Format: %s\n", viper.GetString("fileFormat"))
fmt.Printf("Consumer API Key: %s\n", viper.GetString("consumer_api_key")[0:6])
fmt.Printf("Consumer API Secret: %s\n", viper.GetString("consumer_api_secret")[0:6])
},
}

view raw

config.go

hosted with ❤ by GitHub

The parse command was a small bit changed. A fair amount of the functionality I refactored out to the buildTwitterList() and exportFile, and rebuildForExport functions. The buildTwitterList() I put in the helper.go file, which I’ll cover a littler later. But in this file, which could still use some refactoring which I’ll get to, I have several pieces of functionality; the export to formats functions, and the if else if logic of the exportParsedTwitterList function.


var parseCmd = &cobra.Command{
Use: "parse",
Short: "This command will extract the Twitter Accounts form a text file.",
Long: `This command will extract the Twitter Accounts and clean up or disregard other characters
or text around the twitter accounts to create a simple, clean, Twitter Accounts only list.`,
Run: func(cmd *cobra.Command, args []string) {
completedTwittererList := buildTwitterList()
fmt.Println(completedTwittererList)
if viper.Get("fileExport") != nil {
exportParsedTwitterList(viper.GetString("fileExport"), viper.GetString("fileFormat"), completedTwittererList)
}
},
}
func exportParsedTwitterList(exportFilename string, exportFormat string, twittererList []string) {
if exportFormat == "txt" {
exportTxt(exportFilename, twittererList, exportFormat)
} else if exportFormat == "json" {
exportJson(exportFilename, twittererList, exportFormat)
} else if exportFormat == "xml" {
exportXml(exportFilename, twittererList, exportFormat)
} else if exportFormat == "csv" {
exportCsv(exportFilename, twittererList, exportFormat)
} else {
fmt.Println("Export type unsupported.")
}
}
func exportXml(exportFilename string, twittererList []string, exportFormat string) {
fmt.Printf("Starting xml export to %s.", exportFilename)
xmlContent, err := xml.Marshal(twittererList)
check(err)
header := xml.Header
collectedContent := header + string(xmlContent)
exportFile(collectedContent, exportFilename+"."+exportFormat)
}
func exportCsv(exportFilename string, twittererList []string, exportFormat string) {
fmt.Printf("Starting txt export to %s.", exportFilename)
collectedContent := rebuildForExport(twittererList, ",")
exportFile(collectedContent, exportFilename+"."+exportFormat)
}
func exportTxt(exportFilename string, twittererList []string, exportFormat string) {
fmt.Printf("Starting %s export to %s.", exportFormat, exportFilename)
collectedContent := rebuildForExport(twittererList, "\n")
exportFile(collectedContent, exportFilename+"."+exportFormat)
}
func exportJson(exportFilename string, twittererList []string, exportFormat string) {
fmt.Printf("Starting %s export to %s.", exportFormat, exportFilename)
collectedContent := collectContent(twittererList)
exportFile(string(collectedContent), exportFilename+"."+exportFormat)
}
func collectContent(twittererList []string) []byte {
collectedContent, err := json.Marshal(twittererList)
check(err)
return collectedContent
}
func rebuildForExport(twittererList []string, concat string) string {
var collectedContent string
for _, twitterAccount := range twittererList {
collectedContent = collectedContent + concat + twitterAccount
}
if concat == "," {
collectedContent = strings.TrimLeft(collectedContent, concat)
}
return collectedContent
}
func exportFile(collectedContent string, exportFile string) {
contentBytes := []byte(collectedContent)
err := ioutil.WriteFile(exportFile, contentBytes, 0644)
check(err)
}

view raw

parse.go

hosted with ❤ by GitHub

Next up after parse, it seems fitting to cover the helpers.go file code. First I have the check function, which simply wraps the routinely copied error handling code snippet. Check out the file directly for that. Then below that I have the buildTwitterList() function which gets the config setting for the file name to open to parse for Twitter accounts. Then the code reads the file, splits the results of the text file into fields, then steps through and parses out the Twitter accounts. This is done with a REGEX (I know I know now I have two problems, but hey, this is super simple!). It basically finds fields that start with an @ and then verifies the alphanumeric nature, combined with a possible underscore, that then remove unnecessary characters on those fields. Wrapping all that up by putting the fields into a string/slice array and returning that string array to the calling code.


func buildTwitterList() []string {
theFile := viper.GetString("file")
theTwitterers, err := ioutil.ReadFile(theFile)
check(err)
stringTwitterers := string(theTwitterers[:])
splitFields := strings.Fields(stringTwitterers)
var completedTwittererList []string
for _, aField := range splitFields {
if strings.HasPrefix(aField, "@") && aField != "@" {
reg, _ := regexp.Compile("[^a-zA-Z0-9_@]")
processedString := reg.ReplaceAllString(aField, "")
completedTwittererList = append(completedTwittererList, processedString)
}
}
return completedTwittererList
}

The next function in the Helpers.go file is the getBearerToken function. This was a tricky bit of code. This function takes in the consumer key and secret from the Twitter app (check out the video at 5:20 for where to set it up). It returns a string and error, empty string if there’s an error, as shown below.

The code starts out with establishing a POST request against the Twitter API, asking for a token and passing the client credentials. Catches an error if that doesn’t work out, but if it can the code then sets up the b64Token variable with the standard encoding functionality when it receives the token string byte array ( lines 9 and 10). After that the request then has the header built based on the needed authoriztaion and content-type properties (properties, values? I don’t recall what spec calls these), then the request is made with http.DefaultClient.Do(req). The response is returned, or error and empty response (or nil? I didn’t check the exact function signature logic). Next up is the defer to ensure the response is closed when everything is done.

Next up the JSON result is parsed (unmarshalled) into the v struct which I now realize as I write this I probably ought to rename to something that isn’t a single letter. But it works for now, and v has the pertinent AccessToken variable which is then returned.


func getBearerToken(consumerKey, consumerSecret string) (string, error) {
req, err := http.NewRequest("POST", "https://api.twitter.com/oauth2/token",
strings.NewReader("grant_type=client_credentials"))
if err != nil {
return "", fmt.Errorf("cannot create /token request: %+v", err)
}
b64Token := base64.StdEncoding.EncodeToString(
[]byte(fmt.Sprintf("%s:%s", consumerKey, consumerSecret)))
req.Header.Add("Authorization", "Basic "+b64Token)
req.Header.Add("Content-Type", "application/x-www-form-urlencoded;charset=UTF-8")
resp, err := http.DefaultClient.Do(req)
if err != nil {
return "", fmt.Errorf("/token request failed: %+v", err)
}
defer resp.Body.Close()
var v struct {
AccessToken string `json:"access_token"`
}
if err := json.NewDecoder(resp.Body).Decode(&v); err != nil {
return "", fmt.Errorf("error parsing json in /token response: %+v", err)
}
if v.AccessToken == "" {
return "", fmt.Errorf("/token response does not have access_token")
}
return v.AccessToken, nil
}

Wow, ok, that’s a fair bit of work. Up next, the findem.go file and related function for twitz. Here I start off with a few informative prints to the console just to know where the CLI has gotten to at certain points. The twitter list is put together, reusing that same function – yay code reuse right! Then the access token is retrieved. Next up the http client is built, the twitter client is passed that and initialized, and the user lookup request is sent. Finally the users are printed out and below that a count and print out of the count of users is printed.


var findemCmd = &cobra.Command{
Use: "findem",
Short: "A brief description of your command",
Long: `A longer description that spans multiple lines and likely contains examples
and usage of using your command. For example:
Cobra is a CLI library for Go that empowers applications.
This application is a tool to generate the needed files
to quickly create a Cobra application.`,
Run: func(cmd *cobra.Command, args []string) {
fmt.Println("Starting Twitter Information Retrieval.")
completedTwitterList := buildTwitterList()
fmt.Printf("Getting Twitter details for: \n%s", completedTwitterList)
accessToken, err := getBearerToken(viper.GetString("consumer_api_key"), viper.GetString("consumer_api_secret"))
check(err)
config := &oauth2.Config{}
token := &oauth2.Token{AccessToken: accessToken}
// OAuth2 http.Client will automatically authorize Requests
httpClient := config.Client(context.Background(), token)
// Twitter client
client := twitter.NewClient(httpClient)
// users lookup
userLookupParams := &twitter.UserLookupParams{ScreenName: completedTwitterList}
users, _, _ := client.Users.Lookup(userLookupParams)
fmt.Printf("\n\nUsers:\n%+v\n", users)
howManyUsersFound := len(users)
fmt.Println(howManyUsersFound)
},
}

view raw

Findem.go

hosted with ❤ by GitHub

I realized, just as I wrapped this up I completely spaced on the Apache Cassandra export. I’ll have those post coming soon and will likely do another refactor to get the output into a more usable state before I call this one done. But the core functionality, setup of the systemic environment needed for the tool, the pertinent data and API access, and other elements are done. For now, that’s a wrap, if you’re curious about the final refactor and the Apache Cassandra export then subscribe to my Twitch @adronhall and/or my YouTube channel ThrashingCode.

UPDATED SERIES PARTS

    1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
    2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation
    3. Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval (this post)

     

Creating Distributed Database Application Starter Kits

I’ve boarded a bus, and as always, when I board a bus I almost always code. Unless of course there are people I’m hanging out with then I chit chat, but right now this is the 212 and I don’t know anybody on this chariot anyway. So into the code I go.

I’ve been re-reviewing the Docker and related collateral we offer at DataStax. In that review it seems like it would be worth having some starter kit applications along with these “default” Docker options. This post I’ve created to provide the first language & tech stack of several starter kits I’m going to create.

Starter Kit – The Todo List Template

This first set of starter kits will be based upon a todo list application. It’s really simple, minimal in features, and offers a complete top to bottom implementation of a service, and an application on top of that service all built on Apache Cassandra. In some places, and I’ll clearly mark these places, I might add a few DataStax Enterprise features around search, analytics, or graph.

The Todo List

Features: The following detail the features, from the users perspective, that this application will provide. Each implementation will provide all of these features.

  • A user wants to create a user account to create todo lists with.
  • A user wants to be able to store a username, full name, email, and some simple notes with their account.
  • A user wants to be able to create a todo list that is identified by a user defined name. (i.e. “Grocery List”, “Guitar List”, or “Stuff to do List”)
  • A user want to be able to logout and return, then retrieve a list from a list of their lists.
  • A user wants to be able to delete a todo list.
  • A user wants to be able to update a todo list name.
  • A user wants to be able to add items to a todo list.
  • A user wants to be able to update items in the todo list.
  • A user wants to be able to delete items in a todo list.

Architecture: The following is the architecture of the todo list starter kit application.

  • Database: Apache Cassandra.
  • Service: A small service to manage the data tier of the application.
  • User Interface: A web interface using React/Vuejs ??

As you can see, some of the items are incomplete, but I’ll decide on them soon. My next review is to check out what I really want to use for the user interface, and also to get a user account system figured out. I don’t really want to create the entire user interface, but instead would like to use something like Auth0 or Okta.

May I Ask?

There are numerous things I’d love help with. Are there any user stories you think are missing? Should I add something? What would make these helpful to you? Leave a comment, or tweet at me @Adron. I’d be happy to get some feedback and other’s thoughts on the matter so that I can ensure that these are simple, to the point, usable, and helpful to people. Cheers!

Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation

Part 2 of 3 – Coding Session in Go – Cobra + Viper CLI for Parsing Text Files, Retrieval of Twitter Data, Exports to various file formats, and export to Apache Cassandra.

UPDATED PARTS:

  1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
  2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation (this post)
  3. Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval

Updated links to each part will be posted at bottom of  this post when I publish them. For code, written walk through, and the like scroll down below the video and timestamps.

Hacking Together a CLI Installing Cassandra, Setting Up the Twitter API, ENV Vars, etc.

0:04 Kick ass intro. Just the standard rocking tune.

3:40 A quick recap. Check out the previous write “Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files” of this series.

4:30 Beginning of completion of twitz parse command for exporting out to XML, JSON, and CSV (already did the text export previous session). This segment also includes a number of refactorings to clean up the functions, break out the control structures and make the code more readable.

In the end of refactoring twitz parse came out like this. The completed list is put together by calling the buildTwitterList() function which is actually in the helpers.go file. Then prints that list out as is, and checks to see if a file export should be done. If there is a configuration setting set for file export then that process starts with a call to exportParsedTwitterList(exportFilename string, exportFormat string, ... etc ... ). Then a simple single level control if then else structure to determine which format to export the data to, and a call to the respective export function to do the actual export of data and writing of the file to the underlying system. There’s some more refactoring that could be done, but for now, this is cleaned up pretty nicely considering the splattering of code I started with at first.


var parseCmd = &cobra.Command{
Use: "parse",
Short: "This command will extract the Twitter Accounts form a text file.",
Long: `This command will extract the Twitter Accounts and clean up or disregard other characters
or text around the twitter accounts to create a simple, clean, Twitter Accounts only list.`,
Run: func(cmd *cobra.Command, args []string) {
completedTwittererList := buildTwitterList()
fmt.Println(completedTwittererList)
if viper.Get("fileExport") != nil {
exportParsedTwitterList(viper.GetString("fileExport"), viper.GetString("fileFormat"), completedTwittererList)
}
},
}
func exportParsedTwitterList(exportFilename string, exportFormat string, twittererList []string) {
if exportFormat == "txt" {
exportTxt(exportFilename, twittererList, exportFormat)
} else if exportFormat == "json" {
exportJson(exportFilename, twittererList, exportFormat)
} else if exportFormat == "xml" {
exportXml(exportFilename, twittererList, exportFormat)
} else if exportFormat == "csv" {
exportCsv(exportFilename, twittererList, exportFormat)
} else {
fmt.Println("Export type unsupported.")
}
}
func exportXml(exportFilename string, twittererList []string, exportFormat string) {
fmt.Printf("Starting xml export to %s.", exportFilename)
xmlContent, err := xml.Marshal(twittererList)
check(err)
header := xml.Header
collectedContent := header + string(xmlContent)
exportFile(collectedContent, exportFilename+"."+exportFormat)
}
func exportCsv(exportFilename string, twittererList []string, exportFormat string) {
fmt.Printf("Starting txt export to %s.", exportFilename)
collectedContent := rebuildForExport(twittererList, ",")
exportFile(collectedContent, exportFilename+"."+exportFormat)
}
func exportTxt(exportFilename string, twittererList []string, exportFormat string) {
fmt.Printf("Starting %s export to %s.", exportFormat, exportFilename)
collectedContent := rebuildForExport(twittererList, "\n")
exportFile(collectedContent, exportFilename+"."+exportFormat)
}
func exportJson(exportFilename string, twittererList []string, exportFormat string) {
fmt.Printf("Starting %s export to %s.", exportFormat, exportFilename)
collectedContent := collectContent(twittererList)
exportFile(string(collectedContent), exportFilename+"."+exportFormat)
}
func collectContent(twittererList []string) []byte {
collectedContent, err := json.Marshal(twittererList)
check(err)
return collectedContent
}
func rebuildForExport(twittererList []string, concat string) string {
var collectedContent string
for _, twitterAccount := range twittererList {
collectedContent = collectedContent + concat + twitterAccount
}
if concat == "," {
collectedContent = strings.TrimLeft(collectedContent, concat)
}
return collectedContent
}
func exportFile(collectedContent string, exportFile string) {
contentBytes := []byte(collectedContent)
err := ioutil.WriteFile(exportFile, contentBytes, 0644)
check(err)
}

view raw

parse.go

hosted with ❤ by GitHub

50:00 I walk through a quick install of an Apache Cassandra single node that I’ll use for development use later. I also show quickly how to start and stop post-installation.

Reference: Apache Cassandra, Download Page, and Installation Instructions.

53:50 Choosing the go-twitter API library for Go. I look at a few real quickly just to insure that is the library I want to use.

Reference: go-twitter library

56:35 At this point I go through how I set a Twitter App within the API interface. This is a key part of the series where I take a look at the consumer keys and access token and access token secrets and where they’re at in the Twitter interface and how one needs to reset them if they just showed the keys on a stream (like I just did, shockers!)

57:55 Here I discuss and show where to setup the environment variables inside of Goland IDE to building and execution of the CLI. Once these are setup they’ll be the main mechanism I use in the IDE to test the CLI as I go through building out further features.

1:00:18 Updating the twitz config command to show the keys that we just added as environment variables. I set these up also with some string parsing and cutting off the end of the secrets so that the whole variable value isn’t shown but just enough to confirm that it is indeed a set configuration or environment variable.

1:16:53 At this point I work through some additional refactoring of functions to clean up some of the code mess that exists. Using Goland’s extract method feature and other tooling I work through several refactoring efforts that clean up the code.

1:23:17 Copying a build configuration in Goland. A handy little thing to know you can do when you have a bunch of build configuration options.

1:37:32 At this part of the video I look at the app-auth example in the code library, but I gotta add the caveat, I run into problems using the exact example. But I work through it and get to the first error messages that anybody would get to pending they’re using the same examples. I get them fixed however in the next session, this segment of the video however provides a basis for my pending PR’s and related work I’ll submit to the repo.

The remainder of the video is trying to figure out what is or isn’t exactly happening with the error.

I’ll include the working findem code in the next post on this series. Until then, watch the wrap up and enjoy!

1:59:20 Wrap up of video and upcoming stream schedule on Twitch.

UPDATED SERIES PARTS

    1. Twitz Coding Session in Go – Cobra + Viper CLI for Parsing Text Files
    2. Twitz Coding Session in Go – Cobra + Viper CLI with Initial Twitter + Cassandra Installation (this post)
    3. Twitz Coding Session in Go – Cobra + Viper CLI Wrap Up + Twitter Data Retrieval

 

Lena presents “So You Want to Run Data-Intensive Systems on Kubernetes”

If you’re interested in running data-intensive systems (think Apache Cassandra, DataStax Enterprise, Kafka, Spark, Tensorflow, Elasticsearch, Redis, etc) in Kubernetes this is a great talk. @Lenadroid covers what options are available in Kubernetes, how architectural features around pods, jobs, stateful sets, and replica sets work together to provide distributed systems capabilities. Other features she continues and delves into include custom resource definitions (CRDs), operators, and HELM Charts, which include future and peripheral feature capabilities that can help you host various complex distributed systems. I’ve included references below the video here, enjoy.

References:

Cassandra Datacenter & Racks

The last post in this series is Distributed Database Things to Know: Consistent Hashing.

Let’s talk about the analogy of Apache Cassandra Datacenter & Racks to actual datacenter and racks. I kind of enjoy the use of the terms datacenter and racks to describe architectural elements of Cassandra. However, as time moves on the relationship between these terms and why they’re called datacenter and racks can be obfuscated.

Take for instance, a datacenter could just be a cloud provider, an actual physical datacenter location, a zone in Azure, or region in some other provider. What an actual Datacenter in Cassandra parlance actually is can vary, but the origins of why it’s called a Datacenter remains the same. The elements of racks also can vary, but also remain the same.

Origins: Racks & Datacenters?

Let’s cover the actual things in this industry we call datacenter and racks first, unrelated to Apache Cassandra terms.

Racks: The easiest way to describe a physical rack is to show pictures of datacenter racks via the ole’ Google images.

racks.png

A rack is something that is located in a data-center, or even just someone’s garage in some odd scenarios. Ya know, if somebody wants serious hardware to work with. The rack then has a number of servers, often various kinds, within that rack itself. As you can see from the images above there’s a wide range of these racks.

Datacenter: Again the easiest way to describe a datacenter is to just look at a bunch of pictures of datacenter, albeit you see lots of racks again. But really, that’s what a datacenter is, is a building that has lots and lots of racks.

data-center.png

However in Apache Cassandra (and respectively DataStax Enterprise products) a datacenter and rack do not directly correlate to a physical rack or datacenter. The idea is more of an abstraction than hard mapping to the physical realm. In turn it is better to think of datacenter and racks as a way to structure and organize your DataStax Enterprise or Apache Cassandra architecture. From a tree perspective of organizing your cluster, think of things in this hierarchy.

  • Cluster
    • Datacenter(s)
      • Rack(s)
        • Server(s)
          • Node (vnode)

Apache Cassandra Datacenter

An Apache Cassandra Datacenter is a group of nodes, related and configured within a cluster for replication purposes. Setting up a specific set of related nodes into a datacenter helps to reduce latency, prevent transactions from impact by other workloads, and related effects. The replication factor can also be setup to write to multiple datacenter, providing additional flexibility in architectural design and organization. One specific element of datacenter to note is that they must contain only one node type:

Depending on the replication factor, data can be written to multiple datacenters. Datacenters must never span physical locations.Each datacenter usually contains only one node type. The node types are:

  • Transactional: Previously referred to as a Cassandra node.
  • DSE Graph: A graph database for managing, analyzing, and searching highly-connected data.
  • DSE Analytics: Integration with Apache Spark.
  • DSE Search: Integration with Apache Solr. Previously referred to as a Solr node.
  • DSE SearchAnalytics: DSE Search queries within DSE Analytics jobs.

Apache Cassandra Racks

An Apache Cassandra Rack is a grouped set of servers. The architecture of Cassandra uses racks so that no replica is stored redundantly inside a singular rack, ensuring that replicas are spread around through different racks in case one rack goes down. Within a datacenter there could be multiple racks with multiple servers, as the hierarchy shown above would dictate.

To determine where data goes within a rack or sets of racks Apache Cassandra uses what is referred to as a snitch. A snitch determines which racks and datacenter a particular node belongs to, and by respect of that, determines where the replicas of data will end up. This replication strategy which is informed by the snitch can take the form of numerous kinds of snitches, some examples include;

  • SimpleSnitch – this snitch treats order as proximity. This is primarily only used when in a single-datacenter deployment.
  • Dynamic Snitching – the dynamic snitch monitors read latencies to avoid reading from hosts that have slowed down.
  • RackInferringSnitch – Proximity is determined by rack and datacenter, assumed corresponding to 3rd and 2nd octet of each node’s IP address. This particular snitch is often used as an example for writing a custom snitch class since it isn’t particularly useful unless it happens to match one’s deployment conventions.

In the future I’ll outline a few more snitches, how some of them work with more specific detail, and I’ll get into a whole selection of other topics. Be sure to subscribe to the blog, the ole’ RSS feed works great too, and follow @CompositeCode for blog updates. For discourse and hot takes follow me @Adron.

Distributed Database Things to Know Series

  1. Consistent Hashing
  2. Apache Cassandra Datacenter & Racks (this post)