DSE6 + .NET v?

Project Repo: Interoperability Black Box

First steps. Let’s get .NET installed and setup. I’m running Ubuntu 18.04 for this setup and start of project. To install .NET on Ubuntu one needs to go through a multi-command process of keys and some other stuff, fortunately Microsoft’s teams have made this almost easy by providing the commands for the various Linux distributions here. The commands I ran are as follows to get all this initial setup done.

[sourcecode language=”bash”]
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg –dearmor > microsoft.asc.gpg
sudo mv microsoft.asc.gpg /etc/apt/trusted.gpg.d/
wget -q https://packages.microsoft.com/config/ubuntu/18.04/prod.list
sudo mv prod.list /etc/apt/sources.list.d/microsoft-prod.list
sudo chown root:root /etc/apt/trusted.gpg.d/microsoft.asc.gpg
sudo chown root:root /etc/apt/sources.list.d/microsoft-prod.list
[/sourcecode]

After all this I could then install the .NET SDK. It’s been so long since I actually installed .NET on anything that I wasn’t sure if I just needed the runtime, the SDK, or what I’d actually need. I just assumed it would be safe to install the SDK and then install the runtime too.

[sourcecode language=”bash”]
sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-2.1
[/sourcecode]

Then the runtime.

[sourcecode language=”bash”]
sudo apt-get install aspnetcore-runtime-2.1
[/sourcecode]

Alright. Now with this installed, I wanted to also see if Jetbrains Rider would detect – or at least what would I have to do – to have the IDE detect that .NET is now installed. So I opened up the IDE to see what the results would be. Over the left hand side of the new solution dialog, if anything isn’t installed Rider usually will display a message that X whatever needs installed. But it looked like everything is showing up as installed, “yay for things working (at this point)!”

Next up is to get a solution started with the pertinent projects for what I want to build.

For the next stage I created three projects.

InteroperationalBlackBox – A basic class library that will be used by a console application or whatever other application or service that may need access to the specific business logic or what not.
InteroperationalBlackBox.Tests – An xunit testing project for testing anything that might need some good ole’ testing.
InteroperationalBlackBox.Cli – A console application (CLI) that I’ll use to interact with the class library and add capabilities going forward.

Alright, now that all the basic projects are setup in the solution, I’ll go out and see about the .NET DataStax Enterprise driver. Inside Jetbrains Rider I can right click on a particular project that I want to add or manage dependencies for. I did that and then put “dse” in the search box. The dialog pops up from the bottom of the IDE and you can add it by clicking on the bottom right plus sign in the description box to the right. Once you click the plus sign, once installed, it becomes a little red x.

Alright. Now it’s almost time to get some code working. We need ourselves a database first however. I’m going to setup a cluster in Google Cloud Platform (GCP), but feel free to use whatever cluster you’ve got. These instructions will basically be reusable across wherever you’ve got your cluster setup. I wrote up a walk through and instructions for the GCP Marketplace a few weeks ago. I used the same offering to get this example cluster up and running to use. So, now back to getting the first snippets of code working.

Let’s write a test first.

[sourcecode language=”csharp”]
[Fact]
public void ConfirmDatabase_Connects_False()
{
var box = new BlackBox();
Assert.Equal(false, box.ConfirmConnection());
}
[/sourcecode]

In this test, I named the class called BlackBox and am planning to have a parameterless constructor. But as things go tests are very fluid, or ought to be, and I may change it in the next iteration. I’m thinking, at least to get started, that I’ll have a method to test and confirm a connection for the CLI. I’ve named it ConfirmConnection for that purpose. Initially I’m going to test for false, but that’s primarily just to get started. Now, time to implement.

[sourcecode language=”csharp”]
namespace InteroperabilityBlackBox
using System;
using Dse;
using Dse.Auth;

namespace InteroperabilityBlackBox
{
public class BlackBox
{
public BlackBox()
{}

public bool ConfirmConnection()
{
return false;
}
}
}
[/sourcecode]

That gives a passing test and I move forward. For more of the run through of moving from this first step to the finished code session check out this

By the end of the coding session I had a few tests.

[sourcecode language=”csharp”]
using Xunit;

namespace InteroperabilityBlackBox.Tests
{
public class MakingSureItWorksIntegrationTests
{
[Fact]
public void ConfirmDatabase_Connects_False()
{
var box = new BlackBox();
Assert.Equal(false, box.ConfirmConnection());
}

[Fact]
public void ConfirmDatabase_PassedValuesConnects_True()
{
var box = new BlackBox(“cassandra”, “”, “”);
Assert.Equal(false, box.ConfirmConnection());
}

[Fact]
public void ConfirmDatabase_PassedValuesConnects_False()
{
var box = new BlackBox(“cassandra”, “notThePassword”, “”);
Assert.Equal(false, box.ConfirmConnection());
}
}
}
[/sourcecode]

The respective code for connecting to the database cluster, per the walk through I wrote about here, at session end looked like this.

[sourcecode language=”csharp”]
using System;
using Dse;
using Dse.Auth;

namespace InteroperabilityBlackBox
{
public class BlackBox : IBoxConnection
{
public BlackBox(string username, string password, string contactPoint)
{
UserName = username;
Password = password;
ContactPoint = contactPoint;
}

public BlackBox()
{
UserName = “ConfigValueFromSecretsVault”;
Password = “ConfigValueFromSecretsVault”;
ContactPoint = “ConfigValue”;
}

public string ContactPoint { get; set; }
public string UserName { get; set; }
public string Password { get; set; }

public bool ConfirmConnection()
{
IDseCluster cluster = DseCluster.Builder()
.AddContactPoint(ContactPoint)
.WithAuthProvider(new DsePlainTextAuthProvider(UserName, Password))
.Build();

try
{
cluster.Connect();
return true;
}
catch (Exception e)
{
Console.WriteLine(e);
return false;
}

}
}
}
[/sourcecode]

With my interface providing the contract to meet.

[sourcecode language=”csharp”]
namespace InteroperabilityBlackBox
{
public interface IBoxConnection
{
string ContactPoint { get; set; }
string UserName { get; set; }
string Password { get; set; }
bool ConfirmConnection();
}
}
[/sourcecode]

Conclusions & Next Steps

After I wrapped up the session two things stood out that needed fixed for the next session. I’ll be sure to add these as objectives for the next coding session at 3pm PST on Thursday.

The tests really needed to more resiliently confirm the integrations that I was working to prove out. My plan at this point is to add some Docker images that would provide the development integration tests a point to work against. This would alleviate the need for something outside of the actual project in the repository to exist. Removing that fragility.
The application, in its “Black Box”, should do something. For the next session we’ll write up some feature requests we’d want, or maybe someone has some suggestions of functionality they’d like to see implemented in a CLI using .NET Core working against a DataStax Enterprise Cassandra Database Cluster? Feel free to leave a comment or three about a feature, I’ll work on adding it during the next session.

Project Repo: https://github.com/Adron/InteroperabilityBlackBox
File an Feature Request: https://github.com/Adron/InteroperabilityBlackBox/issues/new?template=feature_request.md

Chapter 2 in My Twitch Streaming

A while back I started down the path of getting a Twitch Channel started. At this point I’ve gotten a channel setup which I’ve dubbed Thrashing Code albeit it still just has “adronhall” all over it. I’ll get those details further refined as I work on it more.

Today I recorded a new Twitch stream about doing a twitch stream and created an edited video of all the pieces and cameras and angles. I could prospectively help people get started, it’s just my experiences so far and efforts to get everything connected right. The actual video stream recording is available, and I’ll leave it on the channel. However the video I edited will be available and I’ll post a link here.

Tomorrow will be my first official Twitch stream at 3pm PST. If you’re interested in watching check out my Twitch profile here follow and it’ll ping you when I go live. This first streaming session, or episode, or whatever you want to call it, will include a couple topics. I’ll be approaching these topics from that of someone just starting, so if you join help hold me to that! Don’t let me skip ahead or if you notice I left out something key please join and chat at me during the process. I want to make sure I’m covering all the bases as I step through toward achieving the key objectives. Which speaking of…

Tomorrow’s Mission Objectives

Create a DataStax Enterprise Cassandra Cluster in Google Cloud Platform.
Create a .NET project using the latest cross-platform magical bits that will have a library for abstracting the data source(s), a console interface for using the application, and of course a test project.
Configure & connect to the distributed database cluster.

Mission Stretch Objectives

Start a github repo to share the project with others.
Setup some .github templates for feature request issues or related issues.
Write up some Github Issue Feature requests and maybe even sdd some extra features to the CLI for…??? no idea ??? determine 2-3 during the Twitch stream.

If you’d like to follow along, here’s what I have installed. You’re welcome to a range of tooling to follow along with that is the same as what I’ve got here or a variance of other things. Feel free to bring up tooling if you’re curious about it via chat and I’ll answer questions where and when I can.

Ubuntu v18.04
.NET core v2.1
DataStax Enterprise v6

Distributed Systems: Cassandra, DataStax, a Short SITREP

SITREP = Situation Report. It’s military speak. 💂🏻‍♂️

Apache Cassandra is one of the most popular databases in use today. It has many characteristics and distinctive architectural details. In this post I’ll provide a description and some details for a number of these features and characteristics, divided as such. Then, after that (i.e. toward the end, so skip there if you just want to the differences) I’m doing to summarize key differences with the latest release of the DataStax Enterprise 6 version of the database.

Cassandra Characteristics

Cassandra is a linearly scalable, highly available, fault tolerant, distributed database. That is, just to name a few of the most important characteristics. The Cassandra database is also cross-platform (runs on any operating systems), multi-cloud (runs on and across multiple clouds), and can survive regional data center outages or even in multi-cloud scenarios entire cloud provider outages!

Columnar Store, Column Based, or Column Family? What? Ok, so you might have read a number of things about what Cassandra actually is. Let’s break this down. First off, a columnar or column store or column oriented database guarantees data location for a single column in a node on disk. The column may span a bunch of or all of the rows that depend on where or how you specify partitions. However, this isn’t what the Cassandra Database uses. Cassandra is a column-family database.

A column-family storage architecture makes sure the data is stored based on locality of the data at the partition level, not the column level. Cassandra partitions group rows and columns split by a partition key, then clustered together by a specified clustering column or columns. To query Cassandra, because of this, you must know the partition key in order to avoid full data scans!

Cassandra has these partitions that guarantee to be on the same node and sort strings table (referred to most commonly as an SSTable *) in the same location within that file. Even though, depending on the compaction strategy, this can change things and the partition can be split across multiple files on a disk. So really, data locality isn’t guaranteed.

Column-family stores are great for high throughput writes and the ability to linearly scale horizontally (ya know, getting lots and lots of nodes in the cloud!). Reads using the partition key are extremely fast since this key points to exactly where the data resides. However, this often – at least last I know of – leads to a full scan of the data for any type of ad-hoc query.

A sort of historically trivial but important point is the column-family term comes from the storage engine originally used based on a key value store. The value was a set of column value tuples, which where often referenced as family, and later this family was abstracted into partitions, and then the storage engine was matched to that abstraction. Whew, ok, so that’s a lot of knowledge being coagulated into a solid eh! [scuse’ my odd artful language use if you visualized that!]

With all of this described, a that little history sprinkled in, when reading the description of Cassandra in the README.asc file of the actual Cassandra Github Repo things make just a little more sense. In the file it starts off with a description,

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

Now that I’ve covered the 101 level of what Cassandra is I’ll give a look at DataStax and their respective offering.

DataStax

DataStax Enterprise at first glance might be a bit confusing since immediate questions pop up like, “Doesn’t DataStax make Cassandra?”, “Isn’t DataStax just selling support for Cassandra?”, or “Eh, wha, who is DataStax and what does this have to do with Cassandra?”. Well, I’m gonna tell ya all about where we are today regarding all of these things fit.

Performance

DataStax provides a whole selection of amenities around a database, which is derived from the Cassandra Distributed Database System. The core product and these amenities are built into what we refer to as the “DataStax Enterprise 6“. Some of specific differences are that the database engine itself has been modified out of band and now delivers 2x the performance of the standard Cassandra implemented database engine. I was somewhat dubious when I joined but after the third party benchmarks where completed that showed the difference I grew more confident. My confidence in this speed increase grew as I’ve gotten to work with the latest version I can tell in more than a few situations that it’s faster.

Read Repair & NodeSync

If you already use Cassandra, read repair works a certain way and that still works just fine in DataStax Enterprise 6. But one also has the option of using NodeSync which can help eliminate scripting, manual intervention, and other repair operations.

Spark SQL Connectivity

There’s also an always on SQL Engine for automated uptime for apps using DataStax Enterprise Analytics. This provides a better level of analytics requests and end -user analytics. Sort of on this related note, DataStax Studio also has notebook support for Spark SQL now. Writing one’s Spark SQL gets a little easier with this option.

Multi-Cloud / Hybrid-Cloud

Another huge advantage of DataStax Enterprise is going multi-cloud or hybrid-cloud with DataStax Enterprise Cassandra. Between the Lifecycle Manager (LCM), OpsCenter, and related tooling getting up and running with a cluster across a varying range of data-centers wherever they may be is quick and easy.

Summary

I’ll be providing deeper dives into the particular technology, the specific differences, and more in the future. For now I’ll wrap up this post as I’ve got a few others coming distinctively related to distributed database systems themselves ranging from specific principles (like CAP Theorem) to operational (how to and best ways to manage) and development (patterns and practices of developing against) related topics.

Overall the solutions that DataStax offers are solid advantages if you’re stepping into any large scale data (big data or whatever one would call their plethora of data) needs. Over the coming months I’ve got a lot of material – from architectural research and guidance to tactical coding implementation work – that I’ll be blogging about and providing. I’m really looking forward to exploring these capabilities, being the developer advocate to DataStax for the community of users, and learning a thing or three million.

The Conversations and Samples of Multi-Cloud

Over the last few weeks the I’ve been putting together multi-cloud conversations and material related to multi-cloud implementation, conversations, and operational situations that exist today. I took a quick look at some of my repos on Github and realized I’d put together a multi-cloud Node.js sample app some time ago and should update it. I’ll get to that, hopefully, but I also stumbled into some tweets and other material I wanted to collect a few of them together.

Some Demo Code for Multi-Cloud

DataStax Enterprise Deployed Multi-cloud Demo
Our Killrvideo Reference Application Work @ DataStax which we’ll have more coverage of, blog entries, and related material coming out in the coming weeks about what we’ve been doing to make this reference application multi-cloud ready and then multi-cloud capable.

Conversations on Multi-cloud

Mitchell Hashimoto of HashiCorp posted a well written comment/article on what he’s been seeing (for some time) on Reddit.
A well worded tweet… lot’s of talk per Google’s somewhat underlying push for GKE on prem. Which means more clouds, more zones, and more multi-cloud options.

MyPOV: @googlecloud announces GKE on prem. It's just another zone.

Here's to #multicloud and a consistent approach across all deployment types

This is something @awscloud can't do which is a differentiator for hybrid models#GoogleNEXT18 #cio pic.twitter.com/CYoCZ8krZ5

— @ces R “Ray” Wang 王瑞光 #1A #RuleTheWorld #CES2021 (@rwang0) July 24, 2018

Distributed Data Show Conversations

Leave a comment, tweet at me (@adron), let me know your thoughts or what you’re working on re: multi-cloud. I’m curious to learn about and know more war stories.

Oh, exFAT Doesn’t Work on Linux

But to the rescue comes the search engine. I found some material on the matter and, as I’ve learned frequently, don’t count out Linux when it comes to support of nearly everything on Earth. Sure enough, there’s support for exFAT (really, why wouldn’t there be?)

Check out this repo: https://github.com/relan/exfat

There’s of course the git clone and make and make install path or there’s also the apt install path.

git clone https://github.com/relan/exfat.git
cd exfat
autoreconf --install
./configure
make

Then make install.

make install

Of course, as with things on Linux, no reboot needed just use it now to mount a drive.

mount.exfat-fuse /dev/spec /mnt/exfat

To note, if you’re using Ubuntu 18.04 the support will just be available now so re-click on the attached drive or memory device you’ve just attached and it will now appear. Pretty sweet. If you want to use apt just run this command.

apt install exfat-fuse

That’s it. Now you’ve

Adron's Composite Code

DSE6 + .NET v?

Conclusions & Next Steps

Like this:

Chapter 2 in My Twitch Streaming

Tomorrow’s Mission Objectives

Mission Stretch Objectives

Like this:

Oh, exFAT Doesn’t Work on Linux

Like this:

Conclusions & Next Steps

Share this:

Like this:

Tomorrow’s Mission Objectives

Mission Stretch Objectives

Share this:

Like this:

Cassandra Characteristics

DataStax

Performance

Read Repair & NodeSync

Spark SQL Connectivity

Multi-Cloud / Hybrid-Cloud

Summary

Share this:

Like this:

Some Demo Code for Multi-Cloud

Conversations on Multi-cloud

Share this:

Like this:

Share this:

Like this: