Getting COPY For Bulk CSV Working on a Container Running PostgreSQL

(Video is available at the end of this post, be sure to check it out for further details)

First off, check out this post if you’re curious how I put together the csv file. There is a ton of data out there that you can just download that is in csv format, but I wanted to have data that was specific to a DDL (Data Definition Langauge) I wrote. Once I had a file, I went through the following steps to make it easily repeated through automation.

With that file make sure, at least for this example, that the first row of data has the column names as the commas seperated values. It should look something like this (per my example mentioned above), and obviously have far more data than the example below.

id,country,ip,created_at,updated_at,project_id
4cf25606-5f3e-4575-86ea-c585513fcc39,Solomon Islands,169.137.96.151,2020-12-25T05:04:26.124Z,2020-12-30T11:04:26.124Z,d6ef3dfd-9596-4391-b0ef-3d7a8a1a6d10
cbeaf6ab-23cc-43c5-a740-7191f59d8cc5,Solomon Islands,106.92.223.22,2020-12-25T05:04:26.124Z,2020-12-30T11:04:26.124Z,d6ef3dfd-9596-4391-b0ef-3d7a8a1a6d10
17a7ec0b-2758-4721-b104-f368c8109b6a,Solomon Islands,248.8.206.229,2020-12-25T05:04:26.124Z,2020-12-30T11:04:26.124Z,d6ef3dfd-9596-4391-b0ef-3d7a8a1a6d10

Now my next steps was to get this file copied to the server where the database is running. To check the containers I have running I call a quick docker ps to get a list.

The container list of running containers.

In this list, the PostgreSQL server is the container named pgdb. To copy the file to that server, the following command will get the job done.

docker cp ./name_of_file.csv pgdb:/name_of_file.csv

If you’d like the destination name of the file to be different, that is possible in this command too. I like to keep them the same to prevent confusion. For example, if I log into the server and have a bunch of terminals open, seeing the file named on things one place and another in the other place, that’ll throw a wrench into things.

With the file copied, I can now run the PostgreSQL command to bulk copy the data into a table. The command to get that done looks like this.

COPY the_table_where_the_data_will_go FROM '/name_of_file.csv' CSV HEADER;

That’s it! Well ok not really, this is one of the simpliest use cases. Such as, if any columns don’t map one to one with the column names of the csv file to the table column names, this command won’t work as is. Check out the COPY documentation to get the details on how to use the command if you have various things you need to tweak for your own work. However, for this basic premise and for what I want to talk about next, that’s it! 😁

To get into more details about how I automated this process, check out the video.

A Story of an Ivory Tower Architect Clusterfuck

Astronaut Architect or Ivory Tower Architect – Both of these are generally terms of dirision toward software architects who have become disconnected from anything related to actual implementation, coding, or related knowledge and work that has to do with the work of building software applications and services.

Astronaut Architecturalism or Ivory Tower Architecturalism – Topics that an architect might bring up which could be a sign they have reached the Ivory Tower or Astronaut level of architecture work with software. This ism is something to be very concerned about, as sometimes these individuals can completely derail and wreck a project with lofty – yet extremely disconnected ideas – about how or in what way the software should be built for a project.

Chief Emotive Product Technology Architect

Circa 2004 – In the city of Portland, Oregon.

The team, a reasonably sized startup team, of ~12 people sat around the table. It was a oval shaped table, specifically in this conference room so that all members could have involvement while seated at the table, in a conference room called “King Arthur’s Round Table”. Oh, the “LOLz”. The 12 individuals had been working on a project that was a B-round funded venture that had to do with musical things. These 12 individuals had so far, self-organized and been working at a reasonable clip, to get this software built and realize the ideas behind the venture.

The 12 had been gathered here by their manager so that a new member of the team could be introduced. Nobody knew who this person was or what the role of this person would be, except of course the hiring manager. Because I suppose, that was how things were going to be done in this year of 2004 with this startup with this manager. Yes, all sorts of indicators were already being fired off for the experienced of the 12. As these indicators fired off in their mind, concerns were growing, but the experienced had kept their mouths shut so far and offered the manage the benefit of the doubt.

The manager, who I’ll call Thomasen stood before everyone, in conflict with the ideals of Arthur’s Round Table. Thomasen mentioned the ideals of Arthur’s Round Table, which seemed ironic since he stood there before – removed from equal voice by the mere act of standing and spoke to everyone. But also it seemed fitting, for he then announced that he had hired a new member for the team. This unilateral decision of Thomasen’s seemed as if done in spite of the ideals of the Round Table, in spite of his just stating the ideals of the Round Table. What kind of fresh hell was being birthed at this moment? Well, that’s were things get lit, if not outright explosive!

Ericson, one of the experienced developer’s asked, “Ok, you just made a unilateral decision to hire someone, but what even is this person’s role? What are they gonna do here?” Thomasen replied, “I’ve hired Jacobson Tirelorn to join us as the Chief Emotive Product Technology Architect!” To which, the metal head of the team whipped to attention in his seat and inquired, a bit forcefully as a metal head might be imagined to do, “What the Holy Hell fuck is a Chief Emotive Product Technology Architect?”

Thomasen replied, “Kirk, jeez man you don’t have to be so aggresive and vulgar, could you tone it back about 10 notches?” Kirk, not one to take bullshit at even volume level 1, let alone at 11, responded kindly with a simple statement, “Nope.” Thomasen breathed heavily, the sigh clearly evident to all in the room. The newer folks of the 12 watched this with apprehension, while those that were familiar with Kirk’s lack of accepting bullshit, smirked as he laid out the guantlet for Thomasen. Thomasen responded finally after this long sigh, “Ok Kirk.”

Thomasen continued, “I’ve hired Jacobson Tirelorn in this role to help use build our product and really get connected on an emotional level to our users. Kirk, again sniped in, “We don’t have users yet we’re a startup in stealth mode.” Thomasen growing frustrated, “I know but we will.” Kirk, “Sure. But what exactly is the emotional level of our users?” Grunting almost, “I’m getting to that point Kirk.”

Thomasen elaborated, “Jacobson, the architect will also help us significantly figure out what technology we’ll use for the product and how all the parts should fit appropriately to make a highly scalable solution for our users!”

Lasinia raised her hand, “Can I ask a question?” to which Thomasen responded calmly, “Sure Lasinia, what’s your question?” Lasinia started off, “Per what you said, I decided to raise my hand since I’m confused about the Round Table ideals, if we all have an equal seat why are you standing and why do I have to raise my hand to speak?” Another of the 12 responded, “you don’t have to raise your hand…” “yeah but I felt like it because we’re not really following the ideals very well if Thomasen is standing up, Thomasen, could you sit down and continue telling us about this architect you hired?”

Thomasen pulled up a chair and sat down. Muttering somewhat under his breath, but still clear to the 12, “God dammit everybody.” He sat and continued finally.

Eventually he ended. The era of theh Chief Emotive Product Technology Architect was upon the team, regardless of what that meant!

The Chief Emotive Product Technology Architect Era Begins

Two weeks after the first Round Table, Jaconson arrived and called a meeting for Wednesday. It being Monday everybody easily accepted the meeting for 8am Wednesday. On Tuesday morning Jacobson sent out a document detailing all sorts of great and lofty goals for his architect role. He was, after all exuberently ready to get things started! Maximum emotional user support and all!

Around mid-day Tuesday, after the document with the details had arrived that morning Jacobson sent out another calendar invite for 9am Wednesday. Again, everybody went along with this and now had an 8am and 9am meeting scheduled for Wednesday! However, the foreshadowing got amped up another level when Jacobson is his zest and zeal sent out another meeting for 1pm that Wednesday. This meeting invite, however, was sent out at 4:51pm on Tuesday, which would have placed the meetings at 8am, 9am, and 1pm. The scheduling, and titles, went something like this.

8am – Software Release Schedule
9am – Emotional User Stories
1pm – Design Studies

This was going to be one helluva kick off of the Chief Emotive Product Technology Architect Era!

The day rolled around. Having sent out that 1pm invite at 4:51pm multiple people had not received the invite in time to know about the meeting before the 8am meeting kicked off. Thus, this is when the horror of the era truly began.

8am everybody stood at the door to the conference room, again King Arthur’s Round Table conference room. The time ticked past 8am. This seemed normal, it was after all Portland and rarely are people precisely on time on the west coast. 8:03am ticked by. Still, none of the 12 were concerned yet as Thomasen arrived to open the door. Again, this whole irony here, being the door to the Round Table conference room was locked, as if someone was going to sneak in and steal the massive multi-hundred pound table?

Thomasen and the 12 all sat down at the table. Being 2004, Thomasen was the only one with a laptop, another important piece of context here. In 2004 most people still programmed at desktop machines, thus, barely any of the 12 knew of the 1pm meeting, but were still ready to get things started and give this thing a shot!

8:31am arrived and Jacobson walks in as if no tardy one moment. He then pulls out a chair and sits down at the table and announces, “Hello everybody, it’s great to be here I’m Jacobson Tirelorn and I’m going to help you get this software solution, your processes, and your emotional well being figured out!” Kirk, yes Kirk again, spearheads the next immediate assertion, “Confirming, you’re going to help with the software solution architecture, the processes for the project, and our emotional well being? Why would I want you to help me with my emotional well being? Is there an assumption my emotional well being is fucked up?”

Now I need to paint some context here. Jacobson had entered the room casually, as if not late, but also wearing the garb of a 1970s era hippy. Multicolored attire and jeans, which wasn’t to crazy for a startup in Portland, but was still slightly dated and odd for this era. To add to this off-date situation, Jacobson looked like he was approximately 18 or so years old.

Thomasen chimed in, “Kirk, we’ll talk about your emotional well being later, let’s go ahead and let Jacobson get into the architecture and planning first.” Kirk “alright.” shrugs.

Jacobson starts in on some wording, “Alright, what we’re building is going to need to scale, massively, at a moments notice and currently we’ve only got the capacity for about 20% of this capability. So we need to really amp up our systems and our architecture to handle 150% of our expected capacity! With that in mind I’ve drawn this architecture diagram to help us get there.”

He then commences to put this on display, after messing with a projetor for approximately 5 minutes. What he then shows looks like a soup bowl was spilled out onto paper, which bright colors and odd shapes representing the Sun Servers, which were drawn not with the icons of the era that represented a Sun Server but with actual suns showing bright yellow. The network connections and backplane were shown with fuzzy ropes and other tangled bushes. It almost looked more like a wildlife sketch about the magic of the birds and bees than anything to do with software.

Jacobson then made a statement that immediately shaped the project, “What I’ve done here is use a little artistic license to draw up the server and network diagram that will get us to the needed scale!”

All of the 12, in their minds, and with their coursing eyes looking amongst each other thought in horror and disbelief. The all imagined that an oil train had just derailed and fell from a cliff into the ocean. As Jacobson continued the idea spread and so did the oil spill idea, as if it were lit on fire to burn uncontrollably.

Jacobson continued. He talked for the raminder of the meeting and then stated, “everybody take a bio break now and grab a snack for the next meeting, we’ll all meet back here at 9:05am to get started on the emotional user studies.”

At 9:05 everyone except Jacobson and Kirk entered the room and sat down. Nobody spoke, but just sat and waited patiently, some snacking on food and others just pondering what was going on. At 9:11 Jacobson walks back in and shows another chart that resembled Maslav’s Heirarchy. He then asks, “Where is Kirk?”

Kirk walks back in at 9:12, just a mere minute later and Jacobson says, “Good you’re here, a little late but we can get started now.” Kirk states, “you walked back in here a minute ago, it’s 9:12 now. You were late, I’m just following your lead.” Jacobson seems to not even acknowledge that Kirk has spoken and being a spiel about emotional well being.

Just a few minutes into this Sarah, one of the 12, new to the team asks, “What are the key aspects of everybody’s well being should we primarily be focused on?” “We should be focused 100% on the zen of people’s well being.”

Kirk stands up. Walks toward the door and leaves. Thomasen responds, “go ahead and continue Jacobson I’ll see what the problem is.” Jacobson continues.

Meanwhile Thomasen and Kirk have a chat, Kirk states simply that this guy is bullshit and you can keep me and I’m just going to work on the project or you can toss him, but I’m not going to work with this guy while he spouts this nonsense. Kirk having been key so far the lead of the development teams and practically a founder, leaves Thomasen to agree to the terms. Kirk informs him he’ll give Jacobson a chance but isn’t going to listen to the emotional zen nonsense, so he’ll be in other meetings, and Thomasen seems relieved, and life continues for the project.

Thomasen rejoins the meeting, and the meeting runs on and on and on and on. At 12:07 Jacobson says, “everybody take lunch now and we’ll get back to things later! thanks all!” To which everybody, releived, heads off to lunch. But do note, not a single person of the 12, except for Kirk, has been back to their desks to work or read emails on their desktops. Not a soul, except Kirk, knows now that there is a 1pm meeting. Not even Thomasen.

1pm rolls around and Kirk walks into the conference room ready for design details. Jacobson enters and immediately states, “well I guess since you’re the lead nobody else really needs to join us.” But Kirk knowing the anti-pattern in that states, “Well, others should join us as everybody needs to know the design of the system that is going to be working on the design of the system.” “Well, you Kirk can tell them later right?” Kirk smiles wryly and laughing says, “Alright, this is turning into a train wreck already so yeah, let’s go with that idea.”

Jacobson informs Kirk of his ideas.

Kirk finishs the 1pm meeting with the conclusion that Jacobson is an Ivory Tower Architect of no use to the team that needs to implement the product.

Three months pass. A status meeting is called by the CEO of the company. It starts off plainly.

“So how are things going team? I’d like to get a preview look at the product before our coming release in two weeks.”

Kirk looks up and asks, “In two weeks?”

Jacobson responds, “Yeah, we’re releasing in two weeks.”

Kirk and Sarah both look toward each other and simultaneously ask, “We’re releasing what exactly?”

Jacobson starts to respond and says, “It’ll be version one of the prod…”

Thomasen cuts in, “Wait a second, just to clarify we’re releasing a beta of the version 1 of the product.”

Sarah, “So like, an alpha product?”

Jacobson, “You could say that, but it’ll be more like a beta, using the design patterns we’ve implemented and architecture I’ve designed.”

Larry, one of the 12, softspoke and rarely speaks unless specifically asked a question, “What architecture and design patterns?”

Kirk says, “The ones Jacobson dreamed up, but don’t worry, it’s those that I’ve shown you but we don’t really call them design pattersn. We’ve just been calling it our architecture.”

Larry, almost assured “Oh, that, ok. But, per Sarah’s question, what are we releasing exactly?”

Jacobson says, “The v1 product.”

Thomasen corrects, “The beta v1 product.”

The CEO asks, “What about the alpha product?”

Thomasen “There is no alpha product.”

Jacobson emotes “Well, we could just rename this the alpha v1 product.”

CEO stands, causing consternation among everybody, “What do you mean rename, we don’t have an alpha product? How can we not have an alpha product and we’re about to release a beta product? Why do you keep skiping the beta part and saying v1 and Thomasen keeps interjecting beta? Who the fuck is in charge of this?”

“I am, and Jacobson is building the architecture design.”

“No he’s not” interupts Kirk, “I’m building the design based on Jacobson’s architecture.”

As you can imagine, none of this is getting resolved quickly, so let’s fast forward to the results of this CEO scheduled meeting.

Post Trash Fire Meeting SITREP

The SITREP, or situational report, went something like this. The meeting went on for another twenty plus minutes of the CEO, Jacobson, and Thomasen being confused about alpha, beta, and v1 variations and Kirk eventually sat back and just let it happen unabated. Sarah got up about 15 minutes into the meeting and nobody noticed she left. In addition to leaving the meeting though she went to her desk, got her personal things and left never to return again.

The others among the 12 listened and eventually faded out and started ignoring the clusterfuck that unfolded before them that day. Two others besides Sarah left within the next three days. A week after that meeting, Kirk decided he was done too and resigned. Jacobson, being the root of many of these problems was fired by Thomasen, which then the CEO in a fit of rage, merely 2 weeks after this meeting as the startup fell apart, fired Thomasen.

4 Weeks after that meeting the CEO was then fired, even though a founder, and the board having all the power to do this started the process of rehiring everybody to fill the roles of those that had left. First they managed to get Kirk to come in for a conversation about what exactly went wrong.

Board inquired “So Kirk, thanks for coming to speak with us. Could you tell us about what exactly happened? We’re not really sure ourselves.”

Kirk happily, ya know happily for a metal head, smiled and simple said, “Not really, I know, but it’s not worth going into.” and then turned just a bit and opened up a laptop he’d bought. “However, here’s some design and ideas about what should be built to acheive what you want and how much it’ll cost to have me come in and get it done.” The board started looking at the architecture and then asked, “Could you elaborate on what all this architecture shows us?”

Kirk went on to detail, in depth the technical challenges and what the design would require to meet or exceed the needs of the prospective userbase. Then, the board flipped the page after they had felt they understood enough. On this next page of details Kirk had layed out a SOW, or Statement of Work, to detail what he’d cost and what he would do to make this happen. The board was agast at Kirk’s hefty 2004 hourly price tag of $100 per hour. They stated they would have to think about it.

A week later they decided, after talking to Jacobson, to hire Jacobson at $80 bucks an hour to lead the effort around the architecture that Kirk had shown them. At least, to the best of their memory.

The project kicked back off again, now many weeks behind. At this point you might know what happened next, Jacobson cratered. He left a blast radius of unhired roles, unfinished design that didn’t make sense, and a massively unfinished project. This took about 4 months before the board wised up to their poor hiring decision, and having hired this astronaut architect stuck in his lofty ivory tower, they then opted to fold and reposition the company, eating the entirity of the losses.

TLDR Summary

2004 was a whopper of a year, considering the economy hadn’t even improved much after the 2000 apacolyptic tech crash! It was a time when many startups had survived with just enough of their hubris and naivety intact that they were trying some pretty crazy things trying to get products and services delivered. In spite of that, many continued failing.

At this point I want to add an important caveat, that the names are changed but this is the recollection of events from multiple people involved in this particular startup. This wildly happened more than a few times during this period when many startups went under. However, many others started to rise from the ashes during this time too.

Moral of the story, beware the astronaut or ivory tower architects!

Dynamic Data Generation with JavaScript

This video shows the process detailed below in this blog entry, to provide the choice of video or a quick read! 👍🏻😁

I coded up some JavaScript to generate some data for a table recently and it seemed relatively useful, so here it is ready to use as you may. (The complete js file is below the description of the individual code segments below). This file simple data generation is something I put together to create a csv for some quick data imports into a database (Postgres, SQL Server, or anything you may want). With that in mind, I added the libraries and initialized the repo with the libraries I would need.

npm install faker
npm install fs
faker = require('faker');
fs = require('fs');

Next up I included the column row of data for the csv. I decided to go ahead and setup the variable at this point, as it would be needed as I would add the rest of the csv data to the variable itself. There is probably a faster way to do this, but this was the quickest path from the perspective of getting something working right now.

After the colum row, I also setup the base 8 UUIDs that would related to the project_id values to randomly use throughout data generation. The idea behind this is that the project_id values are the range of values that would be in the data that Subhendu would have, and all the ip and other recorded data would be recorded with and related to a specific project_id. I used a UUID generation site to generate these first 8 values, that site is available here.

After that I went ahead and added the for loop that would be used to step through and generate each record.

var data = "id,country,ip,created_at,updated_at,project_id\n";
let project_ids = [
    'c16f6dd8-facb-406f-90d9-45529f4c8eb7',
    'b6dcbc07-e237-402a-bf11-12bf2226c243',
    '33f45cab-0e14-4830-a51c-fd44a62d1adc',
    '5d390c9e-2cfa-471d-953d-f6727972aeba',
    'd6ef3dfd-9596-4391-b0ef-3d7a8a1a6d10',
    'e72c0ed8-d649-4c53-97c5-da793d7a8228',
    'bf020fd2-2514-4709-8108-a2810e61c503',
    'ead66a4a-968a-448c-a796-51c6a1da0c20'];

for (var i = 0; i < 500000; i++) {
    // TODO: Generation will go here.
}

The next thing that I wanted to sort out are the two dates. One would be the created_at value and the other the updated_at value. The updated_at date needed to show as occurring after the created_at date, for obvious reasons. To make sure I could get this calculated I added a function to perform the randomization! First two functions to get additions for days and hours, then getting the random value to add for each, then getting the calculated dates.

function addDays(datetime, days) {
    let date = new Date(datetime.valueOf());
    date.setDate(date.getDate() + days);
    return date;
}

function addHours(datetime, hours) {
    let time = new Date(datetime.valueOf())
    time.setTime(time.getTime() + (hours*60*60*1000));
    return time;
}

var days = faker.datatype.number({min:0, max:7})
var hours = faker.datatype.number({min:0, max:24})

var updated_at = new Date(faker.date.past())
var created_at = addHours(addDays(updated_at, -days), -hours)

With the date time stamps setup for the row data generation I moved on to selecting the specific project_id for the row.

var proj_id = project_ids[faker.datatype.number({min:0, max: 7})]

One other thing that I knew I’d need to do is filter for the ' or , values located in the countries that would be selected. The way I clean that data to ensure it doesn’t break the SQL bulk import process is kind of cheap and in production data I wouldn’t do this, but it works great for generated data like this.

var cleanCountry = faker.address.country().replace(",", " ").replace("'", " ")

If you’re curious why I’m calculating these before I do the general data generation and set the row up, I like to keep the row of actual data calls to either a set variable assignment or at most one dot level deep in my calls. As you’ll see now in the row level data being generated below.

data2 += 
    faker.datatype.uuid() + "," +
    cleanCountry + "," +
    faker.internet.ip() + "," +
    created_at.toISOString() + "," +
    updated_at.toISOString() + "," +
    proj_id + "\n"

Now the last step is to create the file for all these csv rows.

fs.writeFile('kundu_table_data.csv', data, function (err) {
  if (err) return console.log(err);
  console.log('Data file written.');
});

The results.

Scaling PostgreSQL + Top 12 Curated Posts

A video introduction into the basics of scaling a relational database like PostgreSQL.

There are two main ways to scale data storage, especially databases, and the resources available to store and process that data.

Horizontal Scaling (scale-out): This is done through adding more individual machines in some way, creating a cluster for the database to operate. Various databases handle this in different ways ranging from a system like a distributed database, which is specifically built to scale horizontally, to relational databases which require different strategies like sharding the data itself.

Vertical Scaling (scale-up): This includes increasing the individual system resources allocated to a database on a single vertically integrated machine, such as the CPU or number of CPUs, memory, and disk or disks.

For horizontal scaling the approach often requires things like possible sharding of the data across multiple databases and then pooling those resources together via connection pooling, load balancers, and other infrastructure resources to direct the correct requests and traffic at the right sharded database resource.

Sometimes from a purely process-centric horizontal scaling perspective, we just need more query and processing power but the data can be stored on a singular machine. In this case, we might implement a proxy to direct traffic across some bouncers that would then get requests and responses to and from the individual machines.

Vertical scaling often requires new hardware and a migration to the larger machine. Sometimes however it may just require more CPU, RAM, or more drive storage. If it is a singular need it often can be met by adding one of those elements or changing out one. For example, if a databases is continually hitting the peak processing of the CPUs to handle queries, one might be able to just add another processor or change the processor to a faster processor or one with more cores to process with. In the case of memory being overloaded, the same for that, simply increase the maximum memory in the system. Of course, the underlying limitations of vertical scaling always come up when you already have the fastest CPU or the memory is already maxed out. In that situation one might need to think about horizontally scaling as described in the previous described example.

The following is a top X list of blog entries that cover the expansion of the topic in many ways. I’ve found these (along with more than a few others) very useful in my own efforts around scaling Postgres (and other databases).

  1. Vertically Scaling PostgreSQL
  2. Vertically Scale Your PostgreSQL Infrastructure w/ pgpool -3- Adding Another Standby
  3. Vertically scale your PostgreSQL infrastructure with pgpool — 2 — Automatic failover and reconfiguration
  4. How well can PostgreSQL scale horizontally?
  5. How to Horizontally Scale Your Postgres Database Using Citus
  6. Horizontal scalability with Sharding in PostgreSQL — Where it is going Part 1 of 3
  7. Lessons learned scaling PostgreSQL database to 1.2bn records/month
  8. IDEAS FOR SCALING POSTGRESQL TO MULTI-TERABYTE AND BEYOND
  9. Scaling PostgreSQL Using Connection Poolers & Load Balancers
  10. Vertically Scaling PostgreSQL
  11. Scaling PostgreSQL for Large Amounts of Data
  12. Scaling Postgres