Designing a Relational Database – Data Modeling

Designing a system against a relational database involves several key elements that ensure data integrity, efficient query performance, and maintainability. Here’s a summarized overview of these design elements: This article I am going to strive to cover the first of numerous key elements of designing a system against a relational database. I will eventually cover the following topics, but this post will specifically be based on the first topic data modeling:

Data Modeling

The general idea with data modeling for a relational database (and for other types of databases) is to build the database in a way that caters to your specific usage needs. This involves multiple layered tasks. Each of the sections below I’ll define simple the task, then elaborate on characteristics of that particular task.

Continue reading “Designing a Relational Database – Data Modeling”

MariaDB & SkySQL DBAAS

This is a getting started guide for MariaDB SkySQL. Let’s start with two prerequisites definitions:

  1. MariaDB – MariaDB is an open-source relational database management system (RDBMS) that is a fork of MySQL, another popular open-source database system. It was created by the original developers of MySQL after concerns arose about the acquisition of MySQL by Oracle Corporation in 2010. MariaDB is designed to be a drop-in replacement for MySQL, which means that many applications and tools developed for MySQL can also work seamlessly with MariaDB without requiring significant changes. It retains much of the same syntax, APIs, and commands as MySQL, making the transition relatively straightforward for users.
  2. MariaDB SkySQL – MariaDB SkySQL is a cloud-native Database as a Service (DBaaS) offering provided by MariaDB Corporation, the company behind the development of the MariaDB open-source database system. SkySQL is designed to simplify database management, deployment, and scaling by providing a fully managed and highly available MariaDB database solution in the cloud.

Some key features of MariaDB include:

  1. High Performance: MariaDB incorporates optimizations and improvements to enhance query execution speed and overall performance.
  2. Storage Engines: MariaDB supports multiple storage engines, including the popular InnoDB and Aria engines. Each engine has its own characteristics and performance attributes, allowing users to choose the one that best fits their requirements.
  3. Security: MariaDB includes various security enhancements, such as data encryption at rest and in transit, improved authentication methods, and better access control mechanisms.
  4. Open Source: MariaDB is fully open source, which means its source code is available for anyone to inspect, modify, and contribute to.
  5. Community and Development: MariaDB has a vibrant and active community of developers and contributors who work on its continued development and improvement.
  6. Compatibility: As mentioned earlier, MariaDB aims for compatibility with MySQL, allowing applications developed for MySQL to work with minimal changes.
  7. Extensions: MariaDB introduces some features not present in MySQL, such as the Aria storage engine, thread pooling, and more advanced geographic information system (GIS) functionality.
  8. Replication and Clustering: Like MySQL, MariaDB supports various replication methods and clustering solutions for high availability and fault tolerance.
  9. Plugins: MariaDB offers a plugin architecture that allows users to add custom functionality and features to the database system.

To elaborate further on the specifics of MariaDB SkySQL, here are some of the features of the DBAAS (DataBase As A Service):

  1. Managed Service: SkySQL takes care of database administration tasks such as provisioning, backup, monitoring, maintenance, and security updates. This allows users to focus more on their applications and less on managing the underlying database infrastructure.
  2. High Availability: SkySQL offers built-in high availability configurations that ensure database uptime and data durability. This includes automatic failover and replication setups.
  3. Scalability: SkySQL supports both vertical and horizontal scaling. Vertical scaling involves adjusting the resources of a single database instance, while horizontal scaling involves distributing data across multiple nodes for improved performance and capacity.
  4. Security: Security features such as encryption at rest and in transit, role-based access control, and network security protocols are integrated to help protect sensitive data.
  5. Multi-Cloud Support: SkySQL is designed to work across various cloud providers, enabling users to choose the cloud environment that best suits their needs. It supports popular cloud platforms like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
  6. Compatibility: SkySQL maintains compatibility with the MariaDB database, which means applications developed for MariaDB can run seamlessly on SkySQL with minimal modifications.
  7. Global Distributed Architecture: SkySQL offers the capability to deploy databases across multiple geographic regions for improved performance and data availability across different parts of the world.
  8. Managed Upgrades: Regular updates and improvements to the MariaDB database engine are managed by the SkySQL service, ensuring that your databases remain up to date without manual intervention.
  9. Pay-as-You-Go: SkySQL’s pricing model is typically based on usage, allowing users to pay for the resources they consume. This can be cost-effective for businesses as it eliminates the need to invest in and maintain dedicated database infrastructure.
Continue reading “MariaDB & SkySQL DBAAS”

Shortlist of Database as a Service Providers

Some top database providers for various open source databases like MariaDB, PostgreSQL, MongoDB, Apache Cassandra, Redis, Elasticsearch, and Neo4j:

  1. MariaDB:
  2. PostgreSQL:
  3. MongoDB:
  4. Apache Cassandra:
  5. Redis:
  6. Elasticsearch:
  7. Neo4j (Graph Database):

Splitting a Postgres Timestamp with Generated Columns & GraphQL Query with Hasura

Recently I created a video short on how to split out a timestamp column for Hasura. This included the SQL for Postgres via a schema migration and also details on how this appears in the Hasura user interface. You can check out the video here.

The break out of what I show in the video is available in a Github repository also.

https://github.com/Adron/graphql-relational-concept-mapping

Postgres Table Creation SQL

Here is the specific database query that creates the table with the timestamp being broken out to the year, month, and day as generated column data.

create table standard_relational_model.users_data
(
    user_id uuid PRIMARY KEY,
    address_id uuid,
    signup_date timestamp DEFAULT now(),
    year int  GENERATED ALWAYS AS (date_part('year', signup_date)) STORED,
    month int  GENERATED ALWAYS AS (date_part('month', signup_date)) STORED,
    day int  GENERATED ALWAYS AS (date_part('day', signup_date)) STORED,
    points int,
    details jsonb
);

In this SQL the signup_date column is the timestamp column that I want split out to year, month, and day. I’ve set it up with a default function call of now() just to seed the column and not require entry when inserting a new row. With that seed, then the generated columns of year, month, and day use the date_part() function to extract the particular value out of the signup_date column and store it in the respective column.

The other columns are just there for other references.

The Hasura Console

In the Hasura Console those columns would look something like this.

Notice the syntax displayed for these is different than the migration that created them.

date_part('day'::text, signup_date)

The above of course is for day, and each respective part is designated by month, year, etc.

When the data is added to the table the results return as follow with GraphQL and results.

GraphQL

The query.

query MyQuery {
   users_data {
    signup_date
    year
    month
    day
  }
}

The results.

{
  "data": {
    "users_data": [
      {
        "signup_date": "1999-04-21T00:00:00",
        "year": 1999,
        "month": 4,
        "day": 21
      },
    
            ... etc ...
            
      {
        "signup_date": "2007-01-02T00:00:00",
        "year": 2007,
        "month": 1,
        "day": 2
      },
      {
        "signup_date": "2021-06-29T00:09:48.359247",
        "year": 2021,
        "month": 6,
        "day": 29
      }
    ]
  }
}

SQL

The query.

select signup_date, year, month, day
from standard_relational_model.users_data;

The results.

1999-04-21 00:00:00.000000,1999,4,21
2012-07-04 00:00:00.000000,2012,7,4
2019-06-24 00:00:00.000000,2019,6,24
2013-03-07 00:00:00.000000,2013,3,7
2007-01-02 00:00:00.000000,2007,1,2
2021-06-29 00:09:48.359247,2021,6,29

That is how to build generated columns in Postgres and how they’re available via Hasura to expose via GraphQL!

Dynamic Data Generation with JavaScript

This video shows the process detailed below in this blog entry, to provide the choice of video or a quick read! 👍🏻😁

I coded up some JavaScript to generate some data for a table recently and it seemed relatively useful, so here it is ready to use as you may. (The complete js file is below the description of the individual code segments below). This file simple data generation is something I put together to create a csv for some quick data imports into a database (Postgres, SQL Server, or anything you may want). With that in mind, I added the libraries and initialized the repo with the libraries I would need.

npm install faker
npm install fs
faker = require('faker');
fs = require('fs');

Next up I included the column row of data for the csv. I decided to go ahead and setup the variable at this point, as it would be needed as I would add the rest of the csv data to the variable itself. There is probably a faster way to do this, but this was the quickest path from the perspective of getting something working right now.

After the colum row, I also setup the base 8 UUIDs that would related to the project_id values to randomly use throughout data generation. The idea behind this is that the project_id values are the range of values that would be in the data that Subhendu would have, and all the ip and other recorded data would be recorded with and related to a specific project_id. I used a UUID generation site to generate these first 8 values, that site is available here.

After that I went ahead and added the for loop that would be used to step through and generate each record.

var data = "id,country,ip,created_at,updated_at,project_id\n";
let project_ids = [
    'c16f6dd8-facb-406f-90d9-45529f4c8eb7',
    'b6dcbc07-e237-402a-bf11-12bf2226c243',
    '33f45cab-0e14-4830-a51c-fd44a62d1adc',
    '5d390c9e-2cfa-471d-953d-f6727972aeba',
    'd6ef3dfd-9596-4391-b0ef-3d7a8a1a6d10',
    'e72c0ed8-d649-4c53-97c5-da793d7a8228',
    'bf020fd2-2514-4709-8108-a2810e61c503',
    'ead66a4a-968a-448c-a796-51c6a1da0c20'];

for (var i = 0; i < 500000; i++) {
    // TODO: Generation will go here.
}

The next thing that I wanted to sort out are the two dates. One would be the created_at value and the other the updated_at value. The updated_at date needed to show as occurring after the created_at date, for obvious reasons. To make sure I could get this calculated I added a function to perform the randomization! First two functions to get additions for days and hours, then getting the random value to add for each, then getting the calculated dates.

function addDays(datetime, days) {
    let date = new Date(datetime.valueOf());
    date.setDate(date.getDate() + days);
    return date;
}

function addHours(datetime, hours) {
    let time = new Date(datetime.valueOf())
    time.setTime(time.getTime() + (hours*60*60*1000));
    return time;
}

var days = faker.datatype.number({min:0, max:7})
var hours = faker.datatype.number({min:0, max:24})

var updated_at = new Date(faker.date.past())
var created_at = addHours(addDays(updated_at, -days), -hours)

With the date time stamps setup for the row data generation I moved on to selecting the specific project_id for the row.

var proj_id = project_ids[faker.datatype.number({min:0, max: 7})]

One other thing that I knew I’d need to do is filter for the ' or , values located in the countries that would be selected. The way I clean that data to ensure it doesn’t break the SQL bulk import process is kind of cheap and in production data I wouldn’t do this, but it works great for generated data like this.

var cleanCountry = faker.address.country().replace(",", " ").replace("'", " ")

If you’re curious why I’m calculating these before I do the general data generation and set the row up, I like to keep the row of actual data calls to either a set variable assignment or at most one dot level deep in my calls. As you’ll see now in the row level data being generated below.

data2 += 
    faker.datatype.uuid() + "," +
    cleanCountry + "," +
    faker.internet.ip() + "," +
    created_at.toISOString() + "," +
    updated_at.toISOString() + "," +
    proj_id + "\n"

Now the last step is to create the file for all these csv rows.

fs.writeFile('kundu_table_data.csv', data, function (err) {
  if (err) return console.log(err);
  console.log('Data file written.');
});

The results.