Splitting a Postgres Timestamp with Generated Columns & GraphQL Query with Hasura

Recently I created a video short on how to split out a timestamp column for Hasura. This included the SQL for Postgres via a schema migration and also details on how this appears in the Hasura user interface. You can check out the video here.

The break out of what I show in the video is available in a Github repository also.

https://github.com/Adron/graphql-relational-concept-mapping

Postgres Table Creation SQL

Here is the specific database query that creates the table with the timestamp being broken out to the year, month, and day as generated column data.

create table standard_relational_model.users_data
(
    user_id uuid PRIMARY KEY,
    address_id uuid,
    signup_date timestamp DEFAULT now(),
    year int  GENERATED ALWAYS AS (date_part('year', signup_date)) STORED,
    month int  GENERATED ALWAYS AS (date_part('month', signup_date)) STORED,
    day int  GENERATED ALWAYS AS (date_part('day', signup_date)) STORED,
    points int,
    details jsonb
);

In this SQL the signup_date column is the timestamp column that I want split out to year, month, and day. I’ve set it up with a default function call of now() just to seed the column and not require entry when inserting a new row. With that seed, then the generated columns of year, month, and day use the date_part() function to extract the particular value out of the signup_date column and store it in the respective column.

The other columns are just there for other references.

The Hasura Console

In the Hasura Console those columns would look something like this.

Notice the syntax displayed for these is different than the migration that created them.

date_part('day'::text, signup_date)

The above of course is for day, and each respective part is designated by month, year, etc.

When the data is added to the table the results return as follow with GraphQL and results.

GraphQL

The query.

query MyQuery {
   users_data {
    signup_date
    year
    month
    day
  }
}

The results.

{
  "data": {
    "users_data": [
      {
        "signup_date": "1999-04-21T00:00:00",
        "year": 1999,
        "month": 4,
        "day": 21
      },
    
            ... etc ...
            
      {
        "signup_date": "2007-01-02T00:00:00",
        "year": 2007,
        "month": 1,
        "day": 2
      },
      {
        "signup_date": "2021-06-29T00:09:48.359247",
        "year": 2021,
        "month": 6,
        "day": 29
      }
    ]
  }
}

SQL

The query.

select signup_date, year, month, day
from standard_relational_model.users_data;

The results.

1999-04-21 00:00:00.000000,1999,4,21
2012-07-04 00:00:00.000000,2012,7,4
2019-06-24 00:00:00.000000,2019,6,24
2013-03-07 00:00:00.000000,2013,3,7
2007-01-02 00:00:00.000000,2007,1,2
2021-06-29 00:09:48.359247,2021,6,29

That is how to build generated columns in Postgres and how they’re available via Hasura to expose via GraphQL!

Dynamic Data Generation with JavaScript

This video shows the process detailed below in this blog entry, to provide the choice of video or a quick read! 👍🏻😁

I coded up some JavaScript to generate some data for a table recently and it seemed relatively useful, so here it is ready to use as you may. (The complete js file is below the description of the individual code segments below). This file simple data generation is something I put together to create a csv for some quick data imports into a database (Postgres, SQL Server, or anything you may want). With that in mind, I added the libraries and initialized the repo with the libraries I would need.

npm install faker
npm install fs
faker = require('faker');
fs = require('fs');

Next up I included the column row of data for the csv. I decided to go ahead and setup the variable at this point, as it would be needed as I would add the rest of the csv data to the variable itself. There is probably a faster way to do this, but this was the quickest path from the perspective of getting something working right now.

After the colum row, I also setup the base 8 UUIDs that would related to the project_id values to randomly use throughout data generation. The idea behind this is that the project_id values are the range of values that would be in the data that Subhendu would have, and all the ip and other recorded data would be recorded with and related to a specific project_id. I used a UUID generation site to generate these first 8 values, that site is available here.

After that I went ahead and added the for loop that would be used to step through and generate each record.

var data = "id,country,ip,created_at,updated_at,project_id\n";
let project_ids = [
    'c16f6dd8-facb-406f-90d9-45529f4c8eb7',
    'b6dcbc07-e237-402a-bf11-12bf2226c243',
    '33f45cab-0e14-4830-a51c-fd44a62d1adc',
    '5d390c9e-2cfa-471d-953d-f6727972aeba',
    'd6ef3dfd-9596-4391-b0ef-3d7a8a1a6d10',
    'e72c0ed8-d649-4c53-97c5-da793d7a8228',
    'bf020fd2-2514-4709-8108-a2810e61c503',
    'ead66a4a-968a-448c-a796-51c6a1da0c20'];

for (var i = 0; i < 500000; i++) {
    // TODO: Generation will go here.
}

The next thing that I wanted to sort out are the two dates. One would be the created_at value and the other the updated_at value. The updated_at date needed to show as occurring after the created_at date, for obvious reasons. To make sure I could get this calculated I added a function to perform the randomization! First two functions to get additions for days and hours, then getting the random value to add for each, then getting the calculated dates.

function addDays(datetime, days) {
    let date = new Date(datetime.valueOf());
    date.setDate(date.getDate() + days);
    return date;
}

function addHours(datetime, hours) {
    let time = new Date(datetime.valueOf())
    time.setTime(time.getTime() + (hours*60*60*1000));
    return time;
}

var days = faker.datatype.number({min:0, max:7})
var hours = faker.datatype.number({min:0, max:24})

var updated_at = new Date(faker.date.past())
var created_at = addHours(addDays(updated_at, -days), -hours)

With the date time stamps setup for the row data generation I moved on to selecting the specific project_id for the row.

var proj_id = project_ids[faker.datatype.number({min:0, max: 7})]

One other thing that I knew I’d need to do is filter for the ' or , values located in the countries that would be selected. The way I clean that data to ensure it doesn’t break the SQL bulk import process is kind of cheap and in production data I wouldn’t do this, but it works great for generated data like this.

var cleanCountry = faker.address.country().replace(",", " ").replace("'", " ")

If you’re curious why I’m calculating these before I do the general data generation and set the row up, I like to keep the row of actual data calls to either a set variable assignment or at most one dot level deep in my calls. As you’ll see now in the row level data being generated below.

data2 += 
    faker.datatype.uuid() + "," +
    cleanCountry + "," +
    faker.internet.ip() + "," +
    created_at.toISOString() + "," +
    updated_at.toISOString() + "," +
    proj_id + "\n"

Now the last step is to create the file for all these csv rows.

fs.writeFile('kundu_table_data.csv', data, function (err) {
  if (err) return console.log(err);
  console.log('Data file written.');
});

The results.

Scaling PostgreSQL + Top 12 Curated Posts

A video introduction into the basics of scaling a relational database like PostgreSQL.

There are two main ways to scale data storage, especially databases, and the resources available to store and process that data.

Horizontal Scaling (scale-out): This is done through adding more individual machines in some way, creating a cluster for the database to operate. Various databases handle this in different ways ranging from a system like a distributed database, which is specifically built to scale horizontally, to relational databases which require different strategies like sharding the data itself.

Vertical Scaling (scale-up): This includes increasing the individual system resources allocated to a database on a single vertically integrated machine, such as the CPU or number of CPUs, memory, and disk or disks.

For horizontal scaling the approach often requires things like possible sharding of the data across multiple databases and then pooling those resources together via connection pooling, load balancers, and other infrastructure resources to direct the correct requests and traffic at the right sharded database resource.

Sometimes from a purely process-centric horizontal scaling perspective, we just need more query and processing power but the data can be stored on a singular machine. In this case, we might implement a proxy to direct traffic across some bouncers that would then get requests and responses to and from the individual machines.

Vertical scaling often requires new hardware and a migration to the larger machine. Sometimes however it may just require more CPU, RAM, or more drive storage. If it is a singular need it often can be met by adding one of those elements or changing out one. For example, if a databases is continually hitting the peak processing of the CPUs to handle queries, one might be able to just add another processor or change the processor to a faster processor or one with more cores to process with. In the case of memory being overloaded, the same for that, simply increase the maximum memory in the system. Of course, the underlying limitations of vertical scaling always come up when you already have the fastest CPU or the memory is already maxed out. In that situation one might need to think about horizontally scaling as described in the previous described example.

The following is a top X list of blog entries that cover the expansion of the topic in many ways. I’ve found these (along with more than a few others) very useful in my own efforts around scaling Postgres (and other databases).

  1. Vertically Scaling PostgreSQL
  2. Vertically Scale Your PostgreSQL Infrastructure w/ pgpool -3- Adding Another Standby
  3. Vertically scale your PostgreSQL infrastructure with pgpool — 2 — Automatic failover and reconfiguration
  4. How well can PostgreSQL scale horizontally?
  5. How to Horizontally Scale Your Postgres Database Using Citus
  6. Horizontal scalability with Sharding in PostgreSQL — Where it is going Part 1 of 3
  7. Lessons learned scaling PostgreSQL database to 1.2bn records/month
  8. IDEAS FOR SCALING POSTGRESQL TO MULTI-TERABYTE AND BEYOND
  9. Scaling PostgreSQL Using Connection Poolers & Load Balancers
  10. Vertically Scaling PostgreSQL
  11. Scaling PostgreSQL for Large Amounts of Data
  12. Scaling Postgres

Quick Answers: What is the UUID column type good for and what exactly is a UUID?

The question has come up a few times recently about what UUIDs are and what they’re good for. In this quick answers post I explain what they are and provide reference links to further material, as well as a Hasura Short video to implementation and use in Postgres & Hasura.

A Hasura Bit

“A Hasura Bit – What sit he UUID column type good for and what exactly is a UUID?” video.

UUID

UUID stands for a universally unique identifier. Another term which is common for a UUID, is the GUID, or globally unique identifier, which is the common term Microsoft created for their UUIDs. Just know, that a GUID is a UUID, it’s just company naming convention vs the standard industry naming convention.

UUIDs are standardized by the Open Software Foundation (OSF) as part of the Distributed Computer Environment (DCE). Specifically UUID are designed and used from an USO/IEC spec, which if you’d like to know more about the standards they’re based on check out the Wikipedia page @ https://en.wikipedia.org/wiki/Universally_unique_identifier

UUID Format

The canonical textual representation, the 16 octets of a UUID are represented as 32 hexadecimal (base-016) digits. They’re displayed across five groups separated by hyphens in 8-4-4-4-12 format. Even though there are 32 hexadecimal digits, this makes a total of 36 characters for format display. If storing as a string for example, the string needs to be able to hold 36 characters.

Uses for a UUID

The first key use case for a UUID is to have something generate the UUID to use it as a completely unique value for use with a subset of related data. UUIDs are prefect for primary keys in a database, or simply any type of key to ensure uniqueness across a system.

UUIDs can be generated from many different origin points too without any significant concern for collision (i.e. duplicate UUIDs). For example, the database itself has database functions that enable the generation of a UUID at time of a data row’s insertion, as a default value. This means a client inserting data wouldn’t need to generate that UUID. However this can be flipped over to the client side as a responsibility and the client side development stack (i.e. like Go UUID generation) can generate the UUID. Which then enables the creation of a primary key entity being created with a UUID as the primary key, that can then be used to create what would be foreign key items and so on down the chain of a relationship. Then once all of these are created on the client side they can all be inserted in a batch, and even if ordered appropriately can be made transactional to ensure the integrity of the data.

New @ Hasura, Me!

I would not assume you did know this dear reader, but I’ve joined the amazing team at Hasura! Over the last few years I’ve gotten back into a number of data oriented development efforts, often related to my own interest in database systems and web development. From this angle Hasura has a superb technology solution, a solid team with founders @tanmaigo@rajoshighosh leading the crew, and I’ll introduce many of them to you all too in the coming weeks and months! 👍🏻

What is Hasura?

Hasura is a GraphQL API Server, that can be serverless via the Hasura Cloud, deployed to any cloud provider, or run locally on your own infrastructure. It is open source too, so you can dig in and checkout how things work via the Github repo.

First Steps @ Hasura

My first step when joining I put together a deployment around infrastructure as code and wrote a blog entry “Setup Postgres and a GraphQL API with Hasura on Azure” using Terraform. It’s a fairly thorough post, albeit always open to critique, and will have a follow up real soon! Some of the next steps will include further data modeling, covering various relational database paradigms and how those map to Hasura, and lots of additional information. If there’s something you’d like to see, or technologies you’d like to see me put together, do reach out to me @Adron and I’ll see about getting some customized content put together!

Some of the first endeavors I’ve started tackling is coordinating new learning material around each of the features, capabilities, patterns, practices, and ideas behind GraphQL, development around and uses of GrahpQL, the Hasura product, and how all of these technologies fit together to make software development [pick awesome adjectives here: better, faster, etc] when one is working toward an idea!

See you deep in the code and data, science, data extraction, transformation, loading, and all the software development around it all! 🚀

References

For JavaScript, Go, Python, Terraform, and more infrastructure, infrastructure as code, web dev, data management, data science, data extraction, transformation, loading, and coding around all of this I stream regularly on Twitch at https://twitch.tv/adronhall, post the VOD’s to YouTube along with entirely new tech and metal content at https://youtube.com/c/ThrashingCode. I’ll be regularly participating in, scheduling, and adding content at Hasura’s Twitch & YouTube Channels too, so follow and subscribe over there too.

Last, another way to get updated on just the bare minimum of content and coding I’m doing register for the Thrashing Code Newsletter!