Property Graph Modeling with an FU Towards Supernodes – Jonathan Lacefield

Some notes along with this talk. Which is about ways to mitigate super nodes, partitioning strategies, and related efforts. Jonathan’s talk is vendor neutral, even though he works at DataStax. Albeit that’s not odd to me, since that’s how we roll at DataStax anyway. We take pride in working with DSE but also with knowing the various products out there, as things are, we’re all database nerds after all. (more below video)

In the video, I found the definition slide for super node was perfect.

supernodes.png

See that super node? Wow, Florida is just covered up by the explosive nature of that super node! YIKES!

In the talk Jonathan also delves deeper into the vertexes, adjacent vertices, and the respective neighbors. With definitions along the way, so it’s a great talk to watch even if you’re not up to speed on graph databases and graph math and all that related knowledge.

impacts

traversal.png

The super node problem he continues on to describe have two specific problems that are detailed; query performance traversals and storage retrieval. Such as a Gremlin traversal (one’s query), moving along creating traversers, until it hits a super node, where a computational explosion occurs.

Whatever your experience, this talk has some great knowledge to expand your ideas on how to query, design, and setup data in your graph databases to work against. Along with that more than a few elements of knowledge about what not to do when designing a schema for your graph data. Give a listen, it’s worth your time.

 

Jonathan Ellis talks about Five Lessons in Distributed Databases

Notes on the talk…

  1. If it’s not SQL it’s not a database. Watch, you’ll get to hear why… ha!

Then Jonathan covers the recent history (sort of recent, the last ~20ish years) of the industry and how we’ve gotten to this point in database technology.

  1. It takes 5+ years to build a database.

Also the tens of millions of dollars with that period of time. Both are needed, in droves, time and money.

…more below the video.

  1. The customer is always right.

Even when they’re clearly wrong, they’re largely right.

For number 4 and 5 you’ll have to watch the video. Lot’s good stuff in this video including comparisons of Cosmos, Dynamo DB, Apache Cassandra, DataStax Enterprise, and how these distributed databases work, their performance (3rd Party metrics are shown) and more details!

DataStax Developer Days

Over the last week I had the privilege and adventure of coming out to Chicago and Dallas to teach about operations and security capabilities of DataStax Enterprise. More about that later in this post, first I’ll elaborate on and answer the following:

  • What is DataStax Developer Day? Why would you want to attend?
  • Where are the current DataStax Developer Day events that have been held, and were future events are going to be held?
  • Possibilities for future events near a city you live in.

What is DataStax Developer Day?

The way we’ve organized this developer day event at DataStax, is focused around the DataStax Enterprise built on Apache Cassandra product, however I have to add the very important note that this isn’t merely just a product pitch type of thing, you can and will learn about distributed databases and systems in a general sense too. We talk about a number of the core principles behind distributed systems such as the pivotally important consistent hash ring, datacenter and racks, gossip, replication, snitches, and more. We feel it’s important that there’s enough theory that comes along with the configuration and features covered to understand who, what, where, why, and how behind the configuration and features too.

The starting point of the day’s course material is based on the idea that one has not worked with or played with a Apache Cassandra or DataStax Enterprise. However we have a number of courses throughout the day that delve into more specific details and advanced topics. There are three specific tracks:

  1. Cassandra Track – this track consists of three workshops: Core Cassandra, Cassandra Data Modeling, and Cassandra Application Development. [more details]
  2. DSE Track – this track consists of three workshops: DataStax Enterprise Search, DataStax Enterprise Analytics, and DataStax Enterprise Graph. [more details]
  3. Bonus Content – This track has two workshops: DataStax Enterprise Overview and DataStax Enterprise Operations and Security.  [more details]

Why would you want to attend?

  • One huge rad awesome reason is that the developer day events are FREE. But really, nothing is ever free right? You’d want to take a day away from the office to join us, so there’s that.
  • You also might want to even stay a little later after the event as we always have a solidly enjoyable happy hour so we can all extend conversations into the evening and talk shop. After all, working with distributed databases, managing data, and all that jazz is honestly pretty enjoyable when you’ve got awesome systems like this to work with, so an extended conversation into the evening is more than worth it!
  • You’ll get a firm basis of knowledge and skillset around the use, management, and more than a few ideas about how Apache Cassandra and DataStax Enterprise can extend your system’s systemic capabilities.
  • You’ll get a chance to go beyond merely the distributed database system of Apache Cassandra itself and delve into graph, what it is and how it works, analytics, and search too. All workshops take a look at the architecture, uses, and what these capabilities will provide your systems.
  • You’ll also have one on one time with DataStax engineers, and other technical members of the team to ask questions, talk about architecture and solutions that you may be working on, or generally discuss any number of graph, analytics, search, or distributed systems related questions.

Where are the current DataStax Developer Day events that have been held, and were future events are going to be held? So far we’ve held events in New York City, Washington DC, Chicago, and Dallas. We’ve got two more events scheduled with one in London, England and one in Paris, France.

Future events? With a number of events completed and a few on the calendar, we’re interested in hearing about future possible locations for events. Where are you located and where might an event of this sort be useful for the community? I can think of a number of cities, but organizing them into order to know where to get something scheduled next is difficult, which is why the team is looking for input. So ping me via @Adron, email, or just send me a quick message from here.

DSE6 + .NET v?

Project Repo: Interoperability Black Box

First steps. Let’s get .NET installed and setup. I’m running Ubuntu 18.04 for this setup and start of project. To install .NET on Ubuntu one needs to go through a multi-command process of keys and some other stuff, fortunately Microsoft’s teams have made this almost easy by providing the commands for the various Linux distributions here. The commands I ran are as follows to get all this initial setup done.

wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.asc.gpg
sudo mv microsoft.asc.gpg /etc/apt/trusted.gpg.d/
wget -q https://packages.microsoft.com/config/ubuntu/18.04/prod.list
sudo mv prod.list /etc/apt/sources.list.d/microsoft-prod.list
sudo chown root:root /etc/apt/trusted.gpg.d/microsoft.asc.gpg
sudo chown root:root /etc/apt/sources.list.d/microsoft-prod.list

After all this I could then install the .NET SDK. It’s been so long since I actually installed .NET on anything that I wasn’t sure if I just needed the runtime, the SDK, or what I’d actually need. I just assumed it would be safe to install the SDK and then install the runtime too.

sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-2.1

Then the runtime.

sudo apt-get install aspnetcore-runtime-2.1

logoAlright. Now with this installed, I wanted to also see if Jetbrains Rider would detect – or at least what would I have to do – to have the IDE detect that .NET is now installed. So I opened up the IDE to see what the results would be. Over the left hand side of the new solution dialog, if anything isn’t installed Rider usually will display a message that X whatever needs installed. But it looked like everything is showing up as installed, “yay for things working (at this point)!

rider-01

Next up is to get a solution started with the pertinent projects for what I want to build.

dse2

Kazam_screenshot_00001

For the next stage I created three projects.

  1. InteroperationalBlackBox – A basic class library that will be used by a console application or whatever other application or service that may need access to the specific business logic or what not.
  2. InteroperationalBlackBox.Tests – An xunit testing project for testing anything that might need some good ole’ testing.
  3. InteroperationalBlackBox.Cli – A console application (CLI) that I’ll use to interact with the class library and add capabilities going forward.

Alright, now that all the basic projects are setup in the solution, I’ll go out and see about the .NET DataStax Enterprise driver. Inside Jetbrains Rider I can right click on a particular project that I want to add or manage dependencies for. I did that and then put “dse” in the search box. The dialog pops up from the bottom of the IDE and you can add it by clicking on the bottom right plus sign in the description box to the right. Once you click the plus sign, once installed, it becomes a little red x.

dse-adding-package

Alright. Now it’s almost time to get some code working. We need ourselves a database first however. I’m going to setup a cluster in Google Cloud Platform (GCP), but feel free to use whatever cluster you’ve got. These instructions will basically be reusable across wherever you’ve got your cluster setup. I wrote up a walk through and instructions for the GCP Marketplace a few weeks ago. I used the same offering to get this example cluster up and running to use. So, now back to getting the first snippets of code working.

Let’s write a test first.

[Fact]
public void ConfirmDatabase_Connects_False()
{
    var box = new BlackBox();
    Assert.Equal(false, box.ConfirmConnection());
}

In this test, I named the class called BlackBox and am planning to have a parameterless constructor. But as things go tests are very fluid, or ought to be, and I may change it in the next iteration. I’m thinking, at least to get started, that I’ll have a method to test and confirm a connection for the CLI. I’ve named it ConfirmConnection for that purpose. Initially I’m going to test for false, but that’s primarily just to get started. Now, time to implement.

namespace InteroperabilityBlackBox
using System;
using Dse;
using Dse.Auth;

namespace InteroperabilityBlackBox
{
    public class BlackBox
    {
        public BlackBox()
        {}

        public bool ConfirmConnection()
        {
            return false;
        }
    }
}

That gives a passing test and I move forward. For more of the run through of moving from this first step to the finished code session check out this

By the end of the coding session I had a few tests.

using Xunit;

namespace InteroperabilityBlackBox.Tests
{
    public class MakingSureItWorksIntegrationTests
    {
        [Fact]
        public void ConfirmDatabase_Connects_False()
        {
            var box = new BlackBox();
            Assert.Equal(false, box.ConfirmConnection());
        }

        [Fact]
        public void ConfirmDatabase_PassedValuesConnects_True()
        {
            var box = new BlackBox("cassandra", "", "");
            Assert.Equal(false, box.ConfirmConnection());
        }

        [Fact]
        public void ConfirmDatabase_PassedValuesConnects_False()
        {
            var box = new BlackBox("cassandra", "notThePassword", "");
            Assert.Equal(false, box.ConfirmConnection());
        }
    }
}

The respective code for connecting to the database cluster, per the walk through I wrote about here, at session end looked like this.

using System;
using Dse;
using Dse.Auth;

namespace InteroperabilityBlackBox
{
    public class BlackBox : IBoxConnection
    {
        public BlackBox(string username, string password, string contactPoint)
        {
            UserName = username;
            Password = password;
            ContactPoint = contactPoint;
        }

        public BlackBox()
        {
            UserName = "ConfigValueFromSecretsVault";
            Password = "ConfigValueFromSecretsVault";
            ContactPoint = "ConfigValue";
        }

        public string ContactPoint { get; set; }
        public string UserName { get; set; }
        public string Password { get; set; }

        public bool ConfirmConnection()
        {
            IDseCluster cluster = DseCluster.Builder()
                .AddContactPoint(ContactPoint)
                .WithAuthProvider(new DsePlainTextAuthProvider(UserName, Password))
                .Build();

            try
            {
                cluster.Connect();
                return true;
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
                return false;
            }

        }
    }
}

With my interface providing the contract to meet.

namespace InteroperabilityBlackBox
{
    public interface IBoxConnection
    {
        string ContactPoint { get; set; }
        string UserName { get; set; }
        string Password { get; set; }
        bool ConfirmConnection();
    }
}

Conclusions & Next Steps

After I wrapped up the session two things stood out that needed fixed for the next session. I’ll be sure to add these as objectives for the next coding session at 3pm PST on Thursday.

  1. The tests really needed to more resiliently confirm the integrations that I was working to prove out. My plan at this point is to add some Docker images that would provide the development integration tests a point to work against. This would alleviate the need for something outside of the actual project in the repository to exist. Removing that fragility.
  2. The application, in its “Black Box”, should do something. For the next session we’ll write up some feature requests we’d want, or maybe someone has some suggestions of functionality they’d like to see implemented in a CLI using .NET Core working against a DataStax Enterprise Cassandra Database Cluster? Feel free to leave a comment or three about a feature, I’ll work on adding it during the next session.

Chapter 2 in My Twitch Streaming

A while back I started down the path of getting a Twitch Channel started. At this point I’ve gotten a channel setup which I’ve dubbed Thrashing Code albeit it still just has “adronhall” all over it. I’ll get those details further refined as I work on it more.

Today I recorded a new Twitch stream about doing a twitch stream and created an edited video of all the pieces and cameras and angles. I could prospectively help people get started, it’s just my experiences so far and efforts to get everything connected right. The actual video stream recording is available, and I’ll leave it on the channel. However the video I edited will be available and I’ll post a link here.

Tomorrow will be my first official Twitch stream at 3pm PST. If you’re interested in watching check out my Twitch profile here follow and it’ll ping you when I go live. This first streaming session, or episode, or whatever you want to call it, will include a couple topics. I’ll be approaching these topics from that of someone just starting, so if you join help hold me to that! Don’t let me skip ahead or if you notice I left out something key please join and chat at me during the process. I want to make sure I’m covering all the bases as I step through toward achieving the key objectives. Which speaking of…

Tomorrow’s Mission Objectives

  1. Create a DataStax Enterprise Cassandra Cluster in Google Cloud Platform.
  2. Create a .NET project using the latest cross-platform magical bits that will have a library for abstracting the data source(s), a console interface for using the application, and of course a test project.
  3. Configure & connect to the distributed database cluster.

Mission Stretch Objectives

  1. Start a github repo to share the project with others.
  2. Setup some .github templates for feature request issues or related issues.
  3. Write up some Github Issue Feature requests and maybe even sdd some extra features to the CLI for…??? no idea ??? determine 2-3 during the Twitch stream.

If you’d like to follow along, here’s what I have installed. You’re welcome to a range of tooling to follow along with that is the same as what I’ve got here or a variance of other things. Feel free to bring up tooling if you’re curious about it via chat and I’ll answer questions where and when I can.

  • Ubuntu v18.04
  • .NET core v2.1
  • DataStax Enterprise v6

A Collection of Links & Tour of DataStax Docker Images

Another way to get up and running with a DataStax Enterprise 6 setup on your local machine is to use the available (and supported by DataStax) Docker images. For additional description of what each of the images is, what is contained on the images, I read up on via Kathryn Erickson’s (@012345) blog entry “DataStax Now Offering Docker Images for Development“. Also there’s a video Jeff Carpenter (@jscarp) put together which talks from the 5.x version of the release (since v6 wasn’t released at the time).

Continue reading “A Collection of Links & Tour of DataStax Docker Images”