Emerald City Technology Conversations Premier with Archis Gore

Today I’ve cut the final, the video and audio are getting better and better as I determine how exactly these conversations are best recorded and streamed! Archis and I had a conversation a few weeks back and the premier is going live today at 2:00pm. My plan with these conversations is pretty straight forward, after each recording I’ll do some post-processing on the video and audio and set the conversation to premier on YouTube and Twitch and join the premiers to answer questions in chat and general add more conversation to the conversation.

For more information or come and join me for a conversation, check out this post “Join Me for a Live Stream Conversation on Programming, Infrastructure, Data, Databases, or Your Opinions!“.

Today, enjoy the conversation with Archis and I’ll see you in chat!

DataStax Developer Days

Over the last week I had the privilege and adventure of coming out to Chicago and Dallas to teach about operations and security capabilities of DataStax Enterprise. More about that later in this post, first I’ll elaborate on and answer the following:

  • What is DataStax Developer Day? Why would you want to attend?
  • Where are the current DataStax Developer Day events that have been held, and were future events are going to be held?
  • Possibilities for future events near a city you live in.

What is DataStax Developer Day?

The way we’ve organized this developer day event at DataStax, is focused around the DataStax Enterprise built on Apache Cassandra product, however I have to add the very important note that this isn’t merely just a product pitch type of thing, you can and will learn about distributed databases and systems in a general sense too. We talk about a number of the core principles behind distributed systems such as the pivotally important consistent hash ring, datacenter and racks, gossip, replication, snitches, and more. We feel it’s important that there’s enough theory that comes along with the configuration and features covered to understand who, what, where, why, and how behind the configuration and features too.

The starting point of the day’s course material is based on the idea that one has not worked with or played with a Apache Cassandra or DataStax Enterprise. However we have a number of courses throughout the day that delve into more specific details and advanced topics. There are three specific tracks:

  1. Cassandra Track – this track consists of three workshops: Core Cassandra, Cassandra Data Modeling, and Cassandra Application Development. [more details]
  2. DSE Track – this track consists of three workshops: DataStax Enterprise Search, DataStax Enterprise Analytics, and DataStax Enterprise Graph. [more details]
  3. Bonus Content – This track has two workshops: DataStax Enterprise Overview and DataStax Enterprise Operations and Security.  [more details]

Why would you want to attend?

  • One huge rad awesome reason is that the developer day events are FREE. But really, nothing is ever free right? You’d want to take a day away from the office to join us, so there’s that.
  • You also might want to even stay a little later after the event as we always have a solidly enjoyable happy hour so we can all extend conversations into the evening and talk shop. After all, working with distributed databases, managing data, and all that jazz is honestly pretty enjoyable when you’ve got awesome systems like this to work with, so an extended conversation into the evening is more than worth it!
  • You’ll get a firm basis of knowledge and skillset around the use, management, and more than a few ideas about how Apache Cassandra and DataStax Enterprise can extend your system’s systemic capabilities.
  • You’ll get a chance to go beyond merely the distributed database system of Apache Cassandra itself and delve into graph, what it is and how it works, analytics, and search too. All workshops take a look at the architecture, uses, and what these capabilities will provide your systems.
  • You’ll also have one on one time with DataStax engineers, and other technical members of the team to ask questions, talk about architecture and solutions that you may be working on, or generally discuss any number of graph, analytics, search, or distributed systems related questions.

Where are the current DataStax Developer Day events that have been held, and were future events are going to be held? So far we’ve held events in New York City, Washington DC, Chicago, and Dallas. We’ve got two more events scheduled with one in London, England and one in Paris, France.

Future events? With a number of events completed and a few on the calendar, we’re interested in hearing about future possible locations for events. Where are you located and where might an event of this sort be useful for the community? I can think of a number of cities, but organizing them into order to know where to get something scheduled next is difficult, which is why the team is looking for input. So ping me via @Adron, email, or just send me a quick message from here.

Security, A Rant Sort Of… and a shocker.

Here’s a shocking statement for a lot of people in technology and especially outside of technology.  “Your money is at greater risk because it isn’t in a cloud.”  Here’s another shocker “Your medical information is at greater risk at your on-premises Doctor than if it were stored and protected by access control in the cloud.

Is that shocking to you?  If you aren’t shocked you probably know a lot about cloud technology.  The cloud is more secure than most of the IT Departments, physical server locations, secure Government installation, and other environments than one might imagine.

Why Am I Writing This Blog Entry?

While I was listening to Steve Riley’s talk on AWS Security I started this blog entry.  A few of the questions that were brought up made me realize how little of the physical and platform level security is actually understood.  Even though this was about AWS it also applies to Azure, Google, and other cloud environments and platforms.  After several weeks of studying Azure and several years of working with Cloud type technology at Webtrends this statement shocked me, “A bank or a medical entity wouldn’t put its data in the cloud.”*  I couldn’t help but think that someone posing this statement as a fact (even though I know that it is absolutely not a fact) is sorely misinformed about cloud computing and technology.

Well, I wanted to retort this this statement myself, but Steve handled the question as a rock star presenter would.  But I still want to elaborate on this topic.  Also check my previous blog entry “Your Cloud, My Cloud, Security in the Cloud” (* See Addendum) as I touched on this topic from the vantage point of web analytics.  What we have here is the conversation of data that truly needs to be secure.

Cloud Security – Physical

The cloud environments has physical locations all over the world.  Each of these locations are not advertised or easily located.  They are obfuscated and not listed for the reasons of security.  Once you get to one of these facilities the location has numerous physical security restrictions including; time based access codes, security cards, some have retinal scanners, and the list goes on.  In addition, many of these security methods are used concurrently with others.

In addition to this, people maintaining the cloud technology centers don’t have access to the data.  They do not even know how, nor could someone specifically tell them how to gain access to specific drives or machines that have the data of specific instances without extensive work.  That alone provides an immediate level of security, both for data and physically.  That leads me to this next point.

Data Security in the Cloud

Having data spread across virtualized storage mediums is a step into another realm of security.  For more than just security reasons data is spread across multiple storage locations.  Because of the virtualized nature of this storage the actual data is located in a number of locations that is shared among machines.  These machines are not maintained in relation to these storage points.  The storage points are tracked by the machines, in secure ways, so that only an account can access that data.  In addition to this spread of the data, the storage is actually moved from point to point on machine at various times to maintain uptime and redundancy.  Because of this it also increases the complexity in finding this data by nefarious means.

One final point of physical security for data is that each customer, has completely segmented data stored in separate virtual instances.  This separation is equivalent to two storefront businesses side by side.  They are separated by a physical wall just like the manipulation of data in the cloud.  This is important to grasp on many levels as nobody would question placing one business next to another – entire cities have existed for hundreds of years that way – so can businesses within the cloud.

Security at the Platform Level…

…I wanted to continue on this topic but I’m going to hold off.  Right now for work and personally I’m researching a number of additional security ideas within the cloud.  It includes physical, data, access control and other security principles.  I’ll have that write up for for another day, inclusive of the platform level security.

…as for now, that wraps up this semi-ranting piece.

Your Cloud, My Cloud, Security in the Cloud

I had a great conversation the other night while at the Seattle Web Analytics Wednesday (#waw) with Carlos (@inflatemouse) and a dozen others.  @inflatemouse brought up the idea that an analytics provider using the cloud, increases or at least possibly increases the risk of security breach to the data.  This is, after all a valid point, but because of the inherent way web analytics works this is and is not a concern.

Web Analytics is Inherently Insecure

Web analytics data is collected with a Javascript Tag.  Omniture, Webtrends, Google, Yahoo, and all of the analytics providers use Javascript.  Javascript is a scripting language, which is not compiled, and stored in plain text in the page or an include, or passed into the URI when needed.  This plain text Javascript is all over the place, and able to be read merely by looking at it.  So the absolute first point of data collection, the Javascript tags, is 100% insecure.

The majority of data is not private.  So this insecurity isn’t a huge risk or at least should not be.  If it is, you have larger issues before you even contemplate using an on-premise and cloud solution to bump up your compute and storage capabilities.  Collecting data that needs to be secure via web analytics is an absolute no.  Do NOT collect secure, private, or other important pieces of data this way.  If you have even the slightest legal breach in this context, your entire analytics provision could have this data scraped, possibly used in court in a class action suite, or in other ways even.

For the rest of this write up, I will assume that you?ve appropriately encrypted, or enabled SSL, or otherwise secured your analytics or data collection in some way.

Getting that Boost on Black Friday

eE-commerce has gotten HUGE over the last decade.  The last Black Friday sales and holiday season saw the largest e-commerce activity in history.  Omniture, Webtrends, and all of the other web analytics providers often see a ten fold increase in web traffic over this period of time.  Sometimes, for some clients, this traffic is handled flawlessly by racks and racks of computers sitting in multiple collocation facilities around the world.  However, for some clients that have exceedingly large traffic boosts, data is lost.  (yes, ALL the providers lose data, more so during these massive boosts)  The reason is simple, the machines can?t process in time or handle the incoming traffic because the extra throughput isn?t available to scale.

Enter the cloud.  The cloud has vastly more scalability, almost an infinite supply by comparison, to any of the infrastructure available to the analytics providers.  Matter of fact the cloud has more scale available than all of the analytics providers.  This is actually saying a lot, because Webtrends (and maybe some of the others) I know does an amazing job with their scalability and data collection, arguably more accurate and consistent than any of the other providers (especially since many of them just sample and "guess" at the data).

So when you extend your capabilities to the cloud for web analytics do you really increase your security vulnerability?  Most of the providers of web analytics have their own array of security measures, that I won’t go into on levels of security.  However, does introducing the cloud change anything?  Does it alter the architecture so significantly as to introduce legitimate security concerns?

Immediately, from a functional point of view, assuming good architecture, intelligent system design, and good security practices are in use already, introducing the cloud should and is transparent to clients.  For the provider it should not increase legal concerns, functional concerns, or otherwise pending the aforementioned items are taken care of appropriately.  But that is just it, every single current provider has legacy architecture, various other elements that do not provide a solid basis for a migration to the cloud for that extra bump of power and storage.

So what should be done?  What if a provider wants that extra power?  Can the technical debts be paid to use the awesome promises of the cloud?  Is the security really secure enough?

Probably not.  Probably so.  But . . .

This provides a prospective opportunity for a new solution for web analytics to be provided.  It provides a great opportunity for a modern cloud based solution, that provides more than just a mere Javascript tag and insecure unencrypted data to be collected for analysis.  It provides the grand opportunity to design an architecture that could truly lead the industry into the future.  Will Webtrends, Omniture, Unica, or someone else step in to lead the analytics industry into the future?

At this point I’m not really sure, but it definitely is an interesting thought and a conversation that I have had a lot of people at #altnet meetings, cloud meetups, and with cloud architects, engineers, and others that have similar curiosities.  I await impatiently to see someone or some business take the lead!