The Keys & Relationships of Relational Databases

This is a continuation of my posts on relational databases, started here. Previous posts on this theme, “Designing a Relational Database – Data Modeling, “Let’s Talk About Database Schema“, and “The Exasperating Topic of Database Indexes“.

In the context of relational databases, “keys” and “relationships” are fundamental concepts that help in ensuring data integrity, organizing data efficiently, and enabling meaningful data retrieval. Let’s break these down:

1. Keys

A key is an attribute (or a set of attributes) that uniquely identifies a record in a table. Various types of keys exist in relational databases:

  • Primary Key: This uniquely identifies each record in a table. A primary key cannot have NULL values, and each value must be unique. A table can have only one primary key, which may consist of single or multiple columns.
  • Foreign Key: This is an attribute or set of attributes in a table that refers to the primary key in another table. It establishes a relationship between two tables. A foreign key ensures records in one table correspond to records in another.
  • Unique Key: Similar to a primary key, it uniquely identifies each record in the table. The difference is that a table can have more than one unique key, but only one primary key. Unique keys can have NULL values, but the values must be unique.
  • Composite Key: When more than one attribute is used to uniquely identify a record in the table, such a key is called a composite key.
  • Candidate Key: Any attribute or set of attributes that could serve as the primary key is a candidate key. It means it can uniquely identify records in the table.
  • Super Key: It is a set of attributes that, when taken collectively, can be used to uniquely identify records. A super key can have additional attributes that are not strictly necessary for unique identification.

2. Relationships

Relationships in relational databases determine how tables connect to one another and how data correlates:

  • One-to-One (1:1) Relationship: In this relationship, one record in a table is related to one and only one record in another table. For instance, each employee in a company might have one unique work ID.
  • One-to-Many (1:N) or Many-to-One (N:1) Relationship: One record in a table can be related to one or more records in another table. For example, one customer can place many orders, but each order is made by one customer.
  • Many-to-Many (M:N) Relationship: Multiple records in one table are related to multiple records in another table. For instance, students and courses. One student can enroll in many courses, and one course can have many students. This relationship is typically resolved in relational databases using a junction table (or bridge table).

Relationships are typically enforced using Foreign Keys. When a foreign key in one table refers to the primary key of another, it creates a linkage that ensures the validity of the data and maintains referential integrity.

Continue reading “The Keys & Relationships of Relational Databases”

A Recap Of My Top 4 Tech Article Reads From Pocket

Sometimes I get overwhelmed with the number of articles that are in my pocket. I’ve got articles on livability, transit, cycling, auto issues, node.js, java, javascript, coding practices, software craftsmanship, feminism, heavy metal, death metal, black metal, jazz, progressive jazz, fusion jazz, NASA news, space discoveries, space research, Star Trek news, Star Wars news, information on sci-fi books and a slight spattering of politics and some other just interesting nonsensical stuff.

Here's a shot of Pocket on OS-X with an article about Seattle's Tech Advantage over many American cities being rooted in urban density. Which, I'd also argue, gives Seattle a unique advantage (And is a serious pain point for Microsoft's misstep into the suburbs decades ago)
Here’s a shot of Pocket on OS-X with an article about Seattle’s Tech Advantage over many American cities being rooted in urban density. Which, I’d also argue, gives Seattle a unique advantage (And is a serious pain point for Microsoft’s misstep into the suburbs decades ago)

I’ve taken the time to sort through this list of articles, pick out the top technical articles and get this down to a manageable level again. In the process I’ve created this list of solid articles that I’ve now officially read or found useful in some way and present it here for you dear reader. Enjoy, I hope they’re useful to you too.

Article Recon, The Top

  1. Zef Hemel wrote up a piece titled “Docker: Using Linux Containers to Support Portable Application Deployment“. In the article Zef delves into a number of things that are key to understanding Docker and the notion of portland application deployment. Other topics covered include isolation, security, reproducing deployments and resource constraints. The article closes with an example of  application containers and their respective deployment.
  2. 7 Javascript Basics Many Developers Aren’t Using (Properly) albeit slightly useful, I found this one more entertaining. It does give some small insight to the scope of oddities that JavaScript has and how one can easily miss the basics in JavaScript.
  3. Even though the article is from late last year, “The Premature Return to SQL” is a good read. As Alex Popescu   states it, “This pisses me off. A lot.” I too find myself pissed off a lot at the naive understanding and decisions making around SQL or alternate options. It’s almost as if some people decide to just flip a coin to make these determinations with zero insight into what they’re actually attempting to do.
  4. The article “No Deadlines for You! Software Dev Without Estimates, Specs or Other Lies” is spectacular in laying out how bullshit specs and estimates are. They’re almost entirely wasted effort on the developers part. In my own opinion it is often a failure (and yeah, I’ve been in management and leadership too, and removed these issues) of management to understand in the slightest what is actually being built or how it is being built. A lack of vision on behalf of the project is a sure fire sign that the original estimates are already completely off, the design and build out of whatever it is will likely be wrong and a host of other issues. Building software isn’t a bridge, it’s more like a painting, you decide as you go. There is no paint by numbers in software development.

Anyway, that’s my list from the 50+ tech articles that were in my Pocket app. Maybe on day I can get disciplined enough to keep the list limited to really good reads and I’ll start putting together a “My Top Pocket Reads this Month” blog entries? That sounds like it could be useful. Until then, happy coding.

Riak in a .NET World

Jeremiah's Demo Works, IT WORKS IT WORKS!
Jeremiah’s Demo Works, IT WORKS IT WORKS!

A few days ago Troy Howard, Jeremiah Peschka and I all traveled via Amtrak Cascades up to Seattle. The mission was simple, Jeremiah was presenting “Riak in a .NET World”, I was handling logistics and Troy was handling video.

So I took the video that Troy shot, I edited it, put together some soundtrack to it and let Jeremiah’s big data magic shine. He covers the basics around RDBMSes, SQL Server in this case but easily it applies to any RDBMS in large part. These basics bring us up to where and why an architecture needs to shift from an RDBMS solution to a distributed solution like Riak. After stepping through some of the key reasons to move to Riak, Jeremiah walks through a live demo of using CorrugatedIron, the .NET Client for Riak (Github repo). During the walk through he covers the specific characteristics of how CorrugatedIron interacts with Riak through indexs, buckets and during puts and pulls of data.

Toward the end of the video Joseph Blomstedt @jtuple, Troy Howard @thoward37, Jeremiah Peschka @peschkaj, Clive Boulton @iC and Richard Turner @bitcrazed. Also note, I’ve enabled download for this specific video since it is actually a large video (1.08GB total). So you may want to download and watch it if you don’t have a super reliable high speed internet connection.

Also for more on Jeremiah’s work check out http://www.brentozar.com/articles/riak/  and contact him at http://www.brentozar.com/contact/

The Database Deluge… Who’s Who

These are the top NoSQL Solutions in the market today that are open source, readily available, with a strong and active community, and actively making forward progress in development and innovations in the technology. I’ve provided them here, in no order, with basic descriptions, links to their main website presence, and with short lists of some of their top users of each database. Toward the end I’ve provided a short summary of the database and the respective history of the movement around No SQL and the direction it’s heading today.

Cassandra

http://cassandra.apache.org/

Cassandra is a distributed databases that offers high availability and scalability. Cassandra supports a host of features around replicating data across multiple datacenters, high availability, horizontal scaling for massive linear scaling, fault tolerance and a focus, like many NoSQL solutions around commodity hardware.

Cassandra is a hybrid key-value & row based database, setup on top of a configuration focused architecture. Cassandra is fairly easy to setup on a single machine or a cluster, but is intended for use on a cluster of machines. To insure the availability of features around fault tolerance, scaling, et al you will need to setup a minimal cluster, I’d suggest at least 5 nodes (5 nodes being my personal minimum clustered database setup, this always seems to be a solid and safe minimum).

Cassandra also has a query language called CQL or Cassandra Query Langauge. Cassandra also support Apache Projects Hive, Pig with Hadoop integration for map reduce.

Who uses Cassandra?

  • IBM
  • HP
  • Netflix
  • …many others…

HBase

http://hbase.apache.org/

In the book, Seven Databases in Seven Weeks, the Apache HBase Project is described as a nail gun. You would not use HBase to catalog your sales list just like you wouldn’t use a nail gun to build a dollhouse. This is an apt description of HBase.

HBase is a column-oriented database. It’s very good at scaling out. The origins of HBase are rooted in BigTable by Google. The proprietary database is described in in the 2006 white paper, “Bigtable: A Distributed Storage System for Structured Data.”

HBase stores data in buckets called tables, the tables contain cells that are at the intersection of rows and columns. Because of this HBase has a lot of similar characteristics to a relational database. However the similarities are only in name.

HBase also has several features that aren’t available in other databases, such as; versioning, compression, garbage collection and in memory tables. One other feature that is usually only available in relational databases is strong consistency guarantees.

The place where HBase really shines however is in queries against enormous datasets.

HBase is designed architecturally to be fault tolerate. It does this through write-ahead logging and distributed configuration. At the core of the architecture HBase is built on Hadoop. Hadoop is a sturdy, scalable computing platform that provides a distribute file system and mapreduce capabilities.

Who is using it?

  • Facebook uses HBase for its messaging infrastructure.
  • Stumpleupon uses it for real-time data storage and analytics.
  • Twitter uses HBase for data generation around people search & storing logging & monitoring data.
  • Meetup uses it for site data.
  • There are many others including Yahoo!, eBay, etc.

Mongo

http://www.mongodb.org/

MongoDB is built and maintained by a company called 10gen. MongoDB was released in 2009 and has been rising in popularity quickly and steadily since then. The name, contrary to the word mongo, comes from the word humongous. The key goals behind MongoDB are performance and easy data access.

The architecture of MongoDB is around document database principles. The data can be queried in an ad-hoc way, with the data persisted in a nested way. This database also, like most NoSQL databases enforces no schema, however can have specific document fields that can be queried off of.

Who is using it?

  • Foursquare
  • bit.ly
  • CERN for collecting data from the large Hadron Collider
  • …others…

Redis

http://redis.io/

Redis stands for Remote Dictionary Service. The most common capability Redis is known for, is blindingly fast speed. This speed comes from trading durability. At a base level Redis is a key-value store, however sometimes classifying it isn’t straight forward.

Redis is a key-value store, and often referred to as a data structure server with keys that can be string, hashes, lists, sets and sorted sets. Redis is also, stepping away from only being a key-value store, into the realm of being a publish-subscribe and queue stack. This makes Redis one very flexible tool in the tool chest.

Who is using it?

  • Blizzard (You know, that World of Warcraft game maker)  😉
  • Craigslist
  • flickr
  • …others…

Couch

http://couchdb.apache.org/

Another Apache Project, CouchDB is the idealized JSON and REST document database. It works as a document database full of key-value pairs with the values a set number of types including nested with other key-value objects.

The primary mode of querying CouchDB is to use incremental mapreduce to produce indexed views.

One other interesting characteristic about CouchDB is that it’s built with the idea of a multitude of deployment scenarios. CouchDB might be deployed to some big servers or may be a mere service running on your Android Phone or Mac OS-X Desktop.

Like many NoSQL options CouchDB is RESTful in operation and uses JSON to send data to and from clients.

The Node.js Community also has an affinity for Couch since NPM and a lot of the capabilities of Couch seem like they’re just native to JavaScript. From the server aspect of the database to the JSON format usage to other capabilities.

Who uses it?

  • NPM – Node Package Manager site and NPM uses CouchDB for storing and providing the packages for Node.js.

Couchbase (UPDATED January 18th)

Ok, I realized I’d neglected to add Couchbase (thus the Jan 18th update), which is an open source and interesting solution built off of Membase and Couch. Membase isn’t particularly a distributed database, or database, but between it and couch joining to form Couchbase they’ve turned it into a distributed database like couch except with some specific feature set differences.

A lot of the core architecture features of Couch are available, but the combination now adds auto-sharding clusters, live/hot swappable upgrades and changes, memchaced APIs, and built in data caching.

Who uses it?

  • Linkedin
  • Orbitz
  • Concur
  • …and others…

Neo4j

http://www.neo4j.org/

Neo4j steps away from many of the existing NoSQL databases with its use of a graph database model. It stored data as a graph, mathematically speaking, that relates to the other data in the database. This database, of all the databases among the NoSQL and SQL world, is very whiteboard friendly.

Neo4j also has a varied deployment model, being able to deploy to a small or large device or system. It has the ability to store dozens of billions of edges and nodes.

Who is using it?

  • Accenture
  • Adobe
  • Lufthansa
  • Mozilla
  • …others…

Riak

Riak is a key-value, distributed, fault tolerant, resilient database written in Erlang.  It uses the Riak Core project as a codebase for the distributed core of the system. I further explained Riak, since yes, I work for Basho who are the makers of Riak, in a separate blog entry “Riak is… A Big List of Things“. So for a description of the features around Riak check that out.

Who is using Riak?

In Summary

One of the things you’ll notice with a lot of these databases and the NoSQL movement in general is that it originated from companies needing to go “web scale” and RDBMSs just couldn’t handle or didn’t meet the specific requirements these companies had for the data. NoSQL is in no way a replacement to relational or SQL databases except in these specific cases where need is outside of the capability or scope of SQL & Relational Databases and RDBMSs.

Almost every NoSQL database has origins that go pretty far back, but the real impetus and push forward with the technology came about with key efforts at Google and Amazon Web Services. At Google it was with BigTable Paper and at Amazon Web Services it was with the Dynamo Paper. As time moved forward with the open source community taking over as the main innovator and development model around big data and the NoSQL database movement. Today the Apache Project has many of the projects under its guidance along with other companies like Basho and 10gen.

In the last few years, many of the larger mainstays of the existing database industry have leapt onto the bandwagon. Companies like Microsoft, Dell, HP and Oracle have made many strategic and tactical moves to stay relevant with this move toward big data and nosql databases solutions. However, the leadership is still outside of these stalwarts and in the hands of the open source community. The related companies and organizations that are focused on that community such as 10gen, Basho and the Apache Organization still hold much of the future of this technology in the strategic and tactical actions that they take since they’re born from and significant parts of the community itself.

For an even larger list of almost every known NoSQL Database in existence check out NoSQL Database .org.