Setting Up TimescaleDB for Time Series Data with Postgres.app for all your “Time Scales”!

I used an image from The Geological Society of America‘s “Time Scale” but this isn’t about that, but it kind of is and I’ll have a future post about using TimescaleDB to organize “Time Scale” like this, but that’ll be a little bit in the future. For now, I posted that Time Scale for the LOLz. 🤙🏻

I’ve been using TimescaleDB for time-series data on and off for a while now. I recently fired up Postgres.app for local development. It’s one of the cleanest ways to get PostgreSQL running on macOS, and adding TimescaleDB is surprisingly straightforward once you know where to look.

Time-series data is everywhere—sensor readings, application metrics, user events, IoT data. Regular PostgreSQL can handle it, but once you’re dealing with millions of rows, you’ll notice queries slowing down. TimescaleDB solves this by turning your time-series tables into hypertables that automatically partition by time, compress old data, and optimize queries. The best part? It’s still PostgreSQL, so all your existing tools and SQL knowledge work exactly the same.

I’ve built enough dashboards and monitoring systems to know when you need proper time-series handling versus when you can get away with a regular table and a good index. Once you’re dealing with millions of rows and need to query across time ranges efficiently, TimescaleDB is worth the setup.

Continue reading →

Setup TimescaleDB with Docker Compose: A Step-by-Step Guide

This tutorial will guide you through setting up and using TimescaleDB using Docker Compose.

Prerequisites

Docker installed on your system
Docker Compose installed on your system

Getting Started

Start the TimescaleDB Container

Navigate to the directory containing the docker-compose.yml file and run:
```
docker-compose up -d
```
This will start TimescaleDB in detached mode. The database will be accessible on port 5432.
Connection Details
- Host: localhost
- Port: 5432
- Database: timescale
- Username: timescale
- Password: timescale

Verify the Installation

You can connect to the database using psql or any PostgreSQL client:

docker exec -it timescale-timescaledb-1 psql -U timescale -d timescale

Once connected, you can verify TimescaleDB is properly installed:

SELECT default_version, installed_extensions FROM pg_available_extensions WHERE name = 'timescaledb';

Continue reading →

Relational Database Query Optimization

This is a continuation of my posts on relational databases, started here. Previous posts on this theme, “Data Modeling“, “Let’s Talk About Database Schema“, “The Exasperating Topic of Database Indexes“, and “The Keys & Relationships of Relational Databases“.

Query optimization refers to the process of selecting the most efficient way to execute a SQL query. The goal is to retrieve the requested data as quickly and efficiently as possible, making the best use of system resources. Given that there can be multiple strategies to retrieve the same set of data, the optimizer’s role is to choose the best one based on factors like data distribution, statistics, and available resources.

Query Optimizers

Continue reading →

Dynamic Data Generation with JavaScript

This video shows the process detailed below in this blog entry, to provide the choice of video or a quick read! 👍🏻😁

I coded up some JavaScript to generate some data for a table recently and it seemed relatively useful, so here it is ready to use as you may. (The complete js file is below the description of the individual code segments below). This file simple data generation is something I put together to create a csv for some quick data imports into a database (Postgres, SQL Server, or anything you may want). With that in mind, I added the libraries and initialized the repo with the libraries I would need.

npm install faker
npm install fs

faker = require('faker');
fs = require('fs');

Next up I included the column row of data for the csv. I decided to go ahead and setup the variable at this point, as it would be needed as I would add the rest of the csv data to the variable itself. There is probably a faster way to do this, but this was the quickest path from the perspective of getting something working right now.

After the colum row, I also setup the base 8 UUIDs that would related to the project_id values to randomly use throughout data generation. The idea behind this is that the project_id values are the range of values that would be in the data that Subhendu would have, and all the ip and other recorded data would be recorded with and related to a specific project_id. I used a UUID generation site to generate these first 8 values, that site is available here.

After that I went ahead and added the for loop that would be used to step through and generate each record.

var data = "id,country,ip,created_at,updated_at,project_id\n";
let project_ids = [
    'c16f6dd8-facb-406f-90d9-45529f4c8eb7',
    'b6dcbc07-e237-402a-bf11-12bf2226c243',
    '33f45cab-0e14-4830-a51c-fd44a62d1adc',
    '5d390c9e-2cfa-471d-953d-f6727972aeba',
    'd6ef3dfd-9596-4391-b0ef-3d7a8a1a6d10',
    'e72c0ed8-d649-4c53-97c5-da793d7a8228',
    'bf020fd2-2514-4709-8108-a2810e61c503',
    'ead66a4a-968a-448c-a796-51c6a1da0c20'];

for (var i = 0; i < 500000; i++) {
    // TODO: Generation will go here.
}

The next thing that I wanted to sort out are the two dates. One would be the created_at value and the other the updated_at value. The updated_at date needed to show as occurring after the created_at date, for obvious reasons. To make sure I could get this calculated I added a function to perform the randomization! First two functions to get additions for days and hours, then getting the random value to add for each, then getting the calculated dates.

function addDays(datetime, days) {
    let date = new Date(datetime.valueOf());
    date.setDate(date.getDate() + days);
    return date;
}

function addHours(datetime, hours) {
    let time = new Date(datetime.valueOf())
    time.setTime(time.getTime() + (hours*60*60*1000));
    return time;
}

var days = faker.datatype.number({min:0, max:7})
var hours = faker.datatype.number({min:0, max:24})

var updated_at = new Date(faker.date.past())
var created_at = addHours(addDays(updated_at, -days), -hours)

With the date time stamps setup for the row data generation I moved on to selecting the specific project_id for the row.

var proj_id = project_ids[faker.datatype.number({min:0, max: 7})]

One other thing that I knew I’d need to do is filter for the ' or , values located in the countries that would be selected. The way I clean that data to ensure it doesn’t break the SQL bulk import process is kind of cheap and in production data I wouldn’t do this, but it works great for generated data like this.

var cleanCountry = faker.address.country().replace(",", " ").replace("'", " ")

If you’re curious why I’m calculating these before I do the general data generation and set the row up, I like to keep the row of actual data calls to either a set variable assignment or at most one dot level deep in my calls. As you’ll see now in the row level data being generated below.

data2 += 
    faker.datatype.uuid() + "," +
    cleanCountry + "," +
    faker.internet.ip() + "," +
    created_at.toISOString() + "," +
    updated_at.toISOString() + "," +
    proj_id + "\n"

Now the last step is to create the file for all these csv rows.

fs.writeFile('kundu_table_data.csv', data, function (err) {
  if (err) return console.log(err);
  console.log('Data file written.');
});

The results.

Setup Postgres, and GraphQL API with Hasura on Azure

Key Technologies: Hasura, Postgres, Terraform, Docker, and Azure.

I created a data model to store railroad systems, services, scheduled, time points, and related information, detailing the schema “Beyond CRUD n’ Cruft Data-Modeling” with a few tweaks. The original I’d created for Apache Cassandra, and have since switched to Postgres giving the option of primary and foreign keys, relations, and the related connections for the model.

In this post I’ll use that schema to build out an infrastructure as code solution with Terraform, utilizing Postgres and Hasura (OSS).

Prerequisites

Terraform & Azure CLI – Installation & Setup.
Docker – Installation & Setup on Windows/Windows Home, MacOS, Linux Ubuntu, Fedora, Debian, or CentOS.
Hasura CLI – Installing the binary globally, and other options.

Docker Compose Development Environment

For the Docker Compose file I just placed them in the root of the repository. Add a docker-compose.yaml file and then added services. The first service I setup was the Postgres/PostgreSQL database. This is using the standard Postgres image on Docker Hub. I opted for version 12, I do want it to always restart if it gets shutdown or crashes, and then the last of the obvious settings is the port which maps from 5432 to 5432.

For the volume, since I might want to backup or tinker with the volume, I put the db_data location set to my own Codez directory. All my databases I tend to setup like this in case I need to debug things locally.

The POSTGRES_PASSWORD is an environment variable, thus the syntax ${PPASSWORD}. This way no passwords go into the repo. Then I can load the environment variable via a standard export POSTGRES_PASSWORD="theSecretPasswordHere!" line in my system startup script or via other means.

services:
  postgres:
    image: postgres:12
    restart: always
    volumes:
      - db_data:/Users/adron/Codez/databases
    environment:
      POSTGRES_PASSWORD: ${PPASSWORD}
    ports:
      - 5432:5432

For the db_data volume, toward the bottom I add the key value setting to reference it.

volumes:
  db_data:

Next I added the GraphQL solution with Hasura. The image for the v1.1.0 probably needs to be updated (I believe we’re on version 1.3.x now) so I’ll do that soon, but got the example working with v1.1.0. Next I’ve got the ports mapped to open 8080 to 8080. Next, this service will depend on the postgres service already detailed. Restart, also set on always just as the postgres service. Finally two evnironment variables for the container:

HASURA_GRAPHQL_DATABASE_URL – this variable is the base postgres URL connection string.
HASURA_GRAPHQL_ENABLE_CONSOLE – this is the variable that will set the console user interface to initiate. We’ll definitely want to have this for the development environment. However in production I’d likely want this turned off.

  graphql-engine:
    image: hasura/graphql-engine:v1.1.0
    ports:
      - "8080:8080"
    depends_on:
      - "postgres"
    restart: always
    environment:
      HASURA_GRAPHQL_DATABASE_URL: postgres://postgres:logistics@postgres:5432/postgres
      HASURA_GRAPHQL_ENABLE_CONSOLE: "true"

At this point the commands to start this are relatively minimal, but in spite of that I like to create a start and stop shell script. My start script and stop script simply look like this:

Starting the services.

docker-compose up -d

For the first execution of the services you may want to skip the -d and instead watch the startup just to become familiar with the events and connections as they start.

Stopping the services.

docker-compose down

🚀 That’s it for the basic development environment, we’re launched and ready for development. With the services started, navigate to https://localhost:8080/console to start working with the user interface, which I’ll have a more details on the “Beyond CRUD n’ Cruft Data-Modeling” swap to Hasura and Postgres in an upcoming blog post.

For full syntax of the docker-compose.yaml check out this gist: https://gist.github.com/Adron/0b2ea637b5e00681f4d62404805c3a00

Terraform Production Environment

For the production deployment of this stack I want to deploy to Azure, use Terraform for infrastructure as code, and the Azure database service for Postgres while running Hasura for my API GraphQL tier.

For the Terraform files I created a folder and added a main.tf file. I always create a folder to work in, generally, to keep the state files and initial prototyping of the infrastructre in a singular place. Eventually I’ll setup a location to store the state and fully automate the process through a continues integration (CI) and continuous delivery (CD) process. For now though, just a singular folder to keep it all in.

For this I know I’ll need a few variables and add those to the file. These are variables that I’ll use to provide values to multiple resources in the Terraform templating.

variable "database" {
  type = string
}

variable "server" {
  type = string
}

variable "username" {
  type = string
}

variable "password" {
  type = string
}

One other variable I’ll want so that it is a little easier to verify what my Hasura connection information is, will look like this.

output "hasura_url" {
  value = "postgres://${var.username}%40${azurerm_postgresql_server.logisticsserver.name}:${var.password}@${azurerm_postgresql_server.logisticsserver.fqdn}:5432/${var.database}"
}

Let’s take this one apart a bit. There is a lot of concatenated and interpolated variables being wedged together here. This is basically the Postgres connection string that Hasura will need to make a connection. It includes the username and password, and all of the pertinent parsed and string escaped values. Note specifically the %40 between the ${var.username} and ${azurerm_postgresql_server.logisticsserver.name} variables while elsewhere certain characters are not escaped, such as the @ sign. When constructing this connection string, it is very important to be prescient of all these specific values being connected together. But, I did the work for you so it’s a pretty easy copy and paste now!

Next I’ll need the Azure provider information.

provider "azurerm" {
  version = "=2.20.0"
  features {}
}

Note that there is a features array that is just empty, it is now required for the provider to designate this even if the array is empty.

Next up is the resource group that everything will be deployed to.

resource "azurerm_resource_group" "adronsrg" {
  name     = "adrons-rg"
  location = "westus2"
}

Now the Postgres Server itself. Note the location and resource_group_name simply map back to the resource group. Another thing I found a little confusing, as I wasn’t sure if it was a Terraform name or resource name tag or the server name itself, is the “name” key value pair in this resource. It is however the server name, which I’ve assigned var.server. The next value assigned “B_Gen5_2” is the Azure designator, which is a bit cryptic. More on that in a future post.

After that information the storage is set to, I believe if I RTFM’ed correctly to 5 gigs of storage. For what I’m doing this will be fine. The backup is setup for 7 days of retention. This means I’ll be able to fall back to a backup from any of the last seven days, but after 7 days the backups are rolled and the last day is deleted to make space for the newest backup. The geo_redundant_backup_enabled setting is set to false, because with Postgres’ excellent reliability and my desire to not pay for that extra reliability insurance, I don’t need geographic redundancy. Last I set auto_grow_enabled to true, albeit I do need to determine the exact flow of logic this takes for this particular implementation and deployment of Postgres.

The last chunk of details for this resource are simply the username and password, which are derived from variables, which are derived from environment variables to keep the actual username and passwords out of the repository. The last two bits set the ssl to enabled and the version of Postgres to v9.5.

resource "azurerm_postgresql_server" "logisticsserver" {
  name = var.server
  location = azurerm_resource_group.adronsrg.location
  resource_group_name = azurerm_resource_group.adronsrg.name
  sku_name = "B_Gen5_2"

  storage_mb                   = 5120
  backup_retention_days        = 7
  geo_redundant_backup_enabled = false
  auto_grow_enabled            = true

  administrator_login          = var.username
  administrator_login_password = var.password
  version                      = "9.5"
  ssl_enforcement_enabled      = true
}

Since the database server is all setup, now I can confidently add an actual database to that database. Here the resource_group_name pulls from the resource group resource and the server_name pulls from the server resource. The name, being the database name itself, I derive from a variable too. Then the character set is UTF8 and collation is set to US English, which are generally standard settings on Postgres being installed for use within the US.

resource "azurerm_postgresql_database" "logisticsdb" {
  name                = var.database
  resource_group_name = azurerm_resource_group.adronsrg.name
  server_name         = azurerm_postgresql_server.logisticsserver.name
  charset             = "UTF8"
  collation           = "English_United States.1252"
}

The next thing I discovered, after some trial and error and a good bit of searching, is the Postgres specific firewall rule. It appears this is related to the Postgres service in Azure specifically, as for a number of trials and many errors I attempted to use the standard available firewalls and firewall rules that are available in virtual networks. My understanding now is that the Postgres Servers exist outside of that paradigm and by relation to that have their own firewall rules.

This firewall rule basically attaches the firewall to the resource group, then the server itself, and allows internal access between the Postgres Server and the Hasura instance.

resource "azurerm_postgresql_firewall_rule" "pgfirewallrule" {
  name                = "allow-azure-internal"
  resource_group_name = azurerm_resource_group.adronsrg.name
  server_name         = azurerm_postgresql_server.logisticsserver.name
  start_ip_address    = "0.0.0.0"
  end_ip_address      = "0.0.0.0"
}

The last and final step is setting up the Hasura instance to work with the Postgres Server and the designated database now available.

To setup the Hasura instance I decided to go with the container service that Azure has. It provides a relatively inexpensive, easier to setup, and more concise way to setup the server than setting up an entire VM or full Kubernetes environment just to run a singular instance.

The first section sets up a public IP address, which of course I’ll need to change as the application is developed and I’ll need to provide an actual secured front end. But for now, to prove out the deployment, I’ve left it public, setup the DNS label, and set the OS type.

The next section in this resource I then outline the container details. The name of the container can be pretty much whatever you want it to be, it’s your designator. The image however is specifically hasura/graphql-engine. I’ve set the CPU and memory pretty low, at 0.5 and 1.5 respectively as I don’t suspect I’ll need a ton of horsepower just to test things out.

Next I set the port available to port 80. Then the environment variables HASURA_GRAPHQL_SERVER_PORT and HASURA_GRAPHQL_ENABLE_CONSOLE to that port to display the console there. Then finally that wild concatenated interpolated connection string that I have setup as an output variable – again specifically for testing – HASURA_GRAPHQL_DATABASE_URL.

resource "azurerm_container_group" "adronshasure" {
  name                = "adrons-hasura-logistics-data-layer"
  location            = azurerm_resource_group.adronsrg.location
  resource_group_name = azurerm_resource_group.adronsrg.name
  ip_address_type     = "public"
  dns_name_label      = "logisticsdatalayer"
  os_type             = "Linux"


  container {
    name   = "hasura-data-layer"
    image  = "hasura/graphql-engine"
    cpu    = "0.5"
    memory = "1.5"

    ports {
      port     = 80
      protocol = "TCP"
    }

    environment_variables = {
      HASURA_GRAPHQL_SERVER_PORT = 80
      HASURA_GRAPHQL_ENABLE_CONSOLE = true
    }
    secure_environment_variables = {
      HASURA_GRAPHQL_DATABASE_URL = "postgres://${var.username}%40${azurerm_postgresql_server.logisticsserver.name}:${var.password}@${azurerm_postgresql_server.logisticsserver.fqdn}:5432/${var.database}"
    }
  }

  tags = {
    environment = "datalayer"
  }
}

With all that setup it’s time to test. But first, just for clarity here’s the entire Terraform file contents.

provider "azurerm" {
  version = "=2.20.0"
  features {}
}

resource "azurerm_resource_group" "adronsrg" {
  name     = "adrons-rg"
  location = "westus2"
}

resource "azurerm_postgresql_server" "logisticsserver" {
  name = var.server
  location = azurerm_resource_group.adronsrg.location
  resource_group_name = azurerm_resource_group.adronsrg.name
  sku_name = "B_Gen5_2"

  storage_mb                   = 5120
  backup_retention_days        = 7
  geo_redundant_backup_enabled = false
  auto_grow_enabled            = true

  administrator_login          = var.username
  administrator_login_password = var.password
  version                      = "9.5"
  ssl_enforcement_enabled      = true
}

resource "azurerm_postgresql_database" "logisticsdb" {
  name                = var.database
  resource_group_name = azurerm_resource_group.adronsrg.name
  server_name         = azurerm_postgresql_server.logisticsserver.name
  charset             = "UTF8"
  collation           = "English_United States.1252"
}

resource "azurerm_postgresql_firewall_rule" "pgfirewallrule" {
  name                = "allow-azure-internal"
  resource_group_name = azurerm_resource_group.adronsrg.name
  server_name         = azurerm_postgresql_server.logisticsserver.name
  start_ip_address    = "0.0.0.0"
  end_ip_address      = "0.0.0.0"
}

resource "azurerm_container_group" "adronshasure" {
  name                = "adrons-hasura-logistics-data-layer"
  location            = azurerm_resource_group.adronsrg.location
  resource_group_name = azurerm_resource_group.adronsrg.name
  ip_address_type     = "public"
  dns_name_label      = "logisticsdatalayer"
  os_type             = "Linux"


  container {
    name   = "hasura-data-layer"
    image  = "hasura/graphql-engine"
    cpu    = "0.5"
    memory = "1.5"

    ports {
      port     = 80
      protocol = "TCP"
    }

    environment_variables = {
      HASURA_GRAPHQL_SERVER_PORT = 80
      HASURA_GRAPHQL_ENABLE_CONSOLE = true
    }
    secure_environment_variables = {
      HASURA_GRAPHQL_DATABASE_URL = "postgres://${var.username}%40${azurerm_postgresql_server.logisticsserver.name}:${var.password}@${azurerm_postgresql_server.logisticsserver.fqdn}:5432/${var.database}"
    }
  }

  tags = {
    environment = "datalayer"
  }
}

variable "database" {
  type = string
}

variable "server" {
  type = string
}

variable "username" {
  type = string
}

variable "password" {
  type = string
}

output "hasura_url" {
  value = "postgres://${var.username}%40${azurerm_postgresql_server.logisticsserver.name}:${var.password}@${azurerm_postgresql_server.logisticsserver.fqdn}:5432/${var.database}"
}

To run this, similarly to how I setup the dev environment, I’ve setup a startup and shutdown script. The startup script named prod-start.sh has the following commands. Note the $PUSERNAME and $PPASSWORD are derived from environment variables, where as the other two values are just inline.

cd terraform

terraform apply -auto-approve \
    -var 'server=logisticscoresystemsdb' \
    -var 'username='$PUSERNAME'' \
    -var 'password='$PPASSWORD'' \
    -var 'database=logistics'

For the full Terraform file check out this gist: https://gist.github.com/Adron/6d7cb4be3a22429d0ff8c8bd360f3ce2

Executing that script gives me results that, if everything goes right, looks similarly to this.

./prod-start.sh 
azurerm_resource_group.adronsrg: Creating...
azurerm_resource_group.adronsrg: Creation complete after 1s [id=/subscriptions/77ad15ff-226a-4aa9-bef3-648597374f9c/resourceGroups/adrons-rg]
azurerm_postgresql_server.logisticsserver: Creating...
azurerm_postgresql_server.logisticsserver: Still creating... [10s elapsed]
azurerm_postgresql_server.logisticsserver: Still creating... [20s elapsed]


...and it continues.

Do note that this process will take a different amount of time and is completely normal for it to take ~3 or more minutes. Once the server is done in the build process a lot of the other activities start to take place very quickly. Once it’s all done, toward the end of the output I get my hasura_url output variable so that I can confirm that it is indeed put together correctly! Now that this is preformed I can take next steps and remove that output variable, start to tighten security, and other steps. Which I’ll detail in a future blog post once more of the application is built.

... other output here ...


azurerm_container_group.adronshasure: Still creating... [40s elapsed]
azurerm_postgresql_database.logisticsdb: Still creating... [40s elapsed]
azurerm_postgresql_database.logisticsdb: Still creating... [50s elapsed]
azurerm_container_group.adronshasure: Still creating... [50s elapsed]
azurerm_postgresql_database.logisticsdb: Creation complete after 51s [id=/subscriptions/77ad15ff-226a-4aa9-bef3-648597374f9c/resourceGroups/adrons-rg/providers/Microsoft.DBforPostgreSQL/servers/logisticscoresystemsdb/databases/logistics]
azurerm_container_group.adronshasure: Still creating... [1m0s elapsed]
azurerm_container_group.adronshasure: Creation complete after 1m4s [id=/subscriptions/77ad15ff-226a-4aa9-bef3-648597374f9c/resourceGroups/adrons-rg/providers/Microsoft.ContainerInstance/containerGroups/adrons-hasura-logistics-data-layer]

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

hasura_url = postgres://postgres%40logisticscoresystemsdb:theSecretPassword!@logisticscoresystemsdb.postgres.database.azure.com:5432/logistics

Now if I navigate over to logisticsdatalayer.westus2.azurecontainer.io I can view the Hasura console! But where in the world is this fully qualified domain name (FQDN)? Well, the quickest way to find it is to navigate to the Azure portal and take a look at the details page of the container itself. In the upper right the FQDN is available as well as the IP that has been assigned to the container!

Navigating to that FQDN URI will bring up the Hasura console!

Next Steps

From here I’ll take up next steps in a subsequent post. I’ll get the container secured, map the user interface or CLI or whatever the application is that I build lined up to the API end points, and more!

References

Postgres
Hasura (OSS)
My past post about the data model I’ll be working with, “Beyond CRUD n’ Cruft Data-Modeling”
adron-ecosystem repository (Github code for this example)

Sign Up for Thrashing Code

For JavaScript, Go, Python, Terraform, and more infrastructure, web dev, and coding in general I stream regularly on Twitch at https://twitch.tv/adronhall, post the VOD’s to YouTube along with entirely new tech and metal content at https://youtube.com/c/ThrashingCode.

Tag: postgresql

Setting Up TimescaleDB for Time Series Data with Postgres.app for all your “Time Scales”!

Like this:

Setup TimescaleDB with Docker Compose: A Step-by-Step Guide

Prerequisites

Getting Started

Like this:

Relational Database Query Optimization

Query Optimizers

Like this:

Dynamic Data Generation with JavaScript

Like this:

Setup Postgres, and GraphQL API with Hasura on Azure

Prerequisites

Docker Compose Development Environment

Terraform Production Environment

Next Steps

References

Sign Up for Thrashing Code

Like this:

Share this:

Like this:

Prerequisites

Getting Started

Share this:

Like this:

Query Optimizers

Share this:

Like this:

Share this:

Like this:

Prerequisites

Docker Compose Development Environment

Terraform Production Environment

Next Steps

References

Sign Up for Thrashing Code

Share this:

Like this: