A video introduction into the basics of scaling a relational database like PostgreSQL.
There are two main ways to scale data storage, especially databases, and the resources available to store and process that data.
Horizontal Scaling (scale-out): This is done through adding more individual machines in some way, creating a cluster for the database to operate. Various databases handle this in different ways ranging from a system like a distributed database, which is specifically built to scale horizontally, to relational databases which require different strategies like sharding the data itself.
Vertical Scaling (scale-up): This includes increasing the individual system resources allocated to a database on a single vertically integrated machine, such as the CPU or number of CPUs, memory, and disk or disks.
For horizontal scaling the approach often requires things like possible sharding of the data across multiple databases and then pooling those resources together via connection pooling, load balancers, and other infrastructure resources to direct the correct requests and traffic at the right sharded database resource.
Sometimes from a purely process-centric horizontal scaling perspective, we just need more query and processing power but the data can be stored on a singular machine. In this case, we might implement a proxy to direct traffic across some bouncers that would then get requests and responses to and from the individual machines.
Vertical scaling often requires new hardware and a migration to the larger machine. Sometimes however it may just require more CPU, RAM, or more drive storage. If it is a singular need it often can be met by adding one of those elements or changing out one. For example, if a databases is continually hitting the peak processing of the CPUs to handle queries, one might be able to just add another processor or change the processor to a faster processor or one with more cores to process with. In the case of memory being overloaded, the same for that, simply increase the maximum memory in the system. Of course, the underlying limitations of vertical scaling always come up when you already have the fastest CPU or the memory is already maxed out. In that situation one might need to think about horizontally scaling as described in the previous described example.
The following is a top X list of blog entries that cover the expansion of the topic in many ways. I’ve found these (along with more than a few others) very useful in my own efforts around scaling Postgres (and other databases).
- Vertically Scaling PostgreSQL
- Vertically Scale Your PostgreSQL Infrastructure w/ pgpool -3- Adding Another Standby
- Vertically scale your PostgreSQL infrastructure with pgpool — 2 — Automatic failover and reconfiguration
- How well can PostgreSQL scale horizontally?
- How to Horizontally Scale Your Postgres Database Using Citus
- Horizontal scalability with Sharding in PostgreSQL — Where it is going Part 1 of 3
- Lessons learned scaling PostgreSQL database to 1.2bn records/month
- IDEAS FOR SCALING POSTGRESQL TO MULTI-TERABYTE AND BEYOND
- Scaling PostgreSQL Using Connection Poolers & Load Balancers
- Vertically Scaling PostgreSQL
- Scaling PostgreSQL for Large Amounts of Data
- Scaling Postgres