Fruit and Snakes: Frequent Mutative Mongo User Database with Python

The end results of this code base are used in another post I made, titled “Java Time with Introspective GraphQL on Chaos Database AKA Pre- Refactor Prototype Mutating Database Spring Boot Java Hack App“.

Recently I had a singular mission to build a GraphQL API against a Mongo database where the idea is, one could query the underlying collections, documents, and fields with the assumption that users would be adding or possibly removing said collections, documents, and fields as they needed.

That sounds somewhat straight forward enough, but before even getting started with the GraphQL API I really needed some type of environment that would mimic this process. That is what this article is about, creating a test bed for this criteria.

The Mongo Database & Environment

First thing I did was setup a new Python environment using virtualenv. I wrote about that a bit in the past if you want to dig into that deeper, the post is available here.

virtualenv fruit_schema_watcher

Next up I created a git repo with git init then added a README.md, LICENSE (MIT), and .gitignore file. The next obvious thing was the need for a Mongo database! I went to cracking on a docker-compose file, which formed up to look like this.

version: '3.1'  
  
services:  
  mongo:  
    image: mongo:latest  
    container_name: mongodb_container  
    ports:  
      - "27017:27017"  
    environment:  
      MONGO_INITDB_ROOT_USERNAME: root  
      MONGO_INITDB_ROOT_PASSWORD: examplepass  
    volumes:  
      - mongo-data:/data/db  
  
volumes:  
  mongo-data:

With that server running, I went ahead and created a database called test manually. I’d just do all the work from here on out with that particular database.

Continue reading “Fruit and Snakes: Frequent Mutative Mongo User Database with Python”

How you write time elapsed stamps in Python?

A common practice in any coding is to get a time stamp for the start and stop of a process, and probably calc the elapsed time since it started (ya know, because that means we don’t have to mentally calc it ourselves the zillion times the code will run!). The following is usually an example of how I do this particular activity:

start_time = datetime.datetime.now()
print(f"Process started at: {start_time.strftime('%Y-%m-%d %H:%M:%S')}")

# ...do the process and all here that is being timed...

end_time = datetime.datetime.now()
print(f"Process ended at: {end_time.strftime('%Y-%m-%d %H:%M:%S')}")
time_elapsed = end_time - start_time
print(f"Time elapsed: {time_elapsed}")

Now, my question is, how do YOU do this in Python? Are there other tricks, cleaner ways to do it?

A Survey of 21 ETL Tools for Python

Here are summaries of each of the tools you’ve mentioned along with examples of how to implement the ETL (Extract, Transform, Load) process using each tool within a Python workflow:

  1. Apache Spark: Apache Spark is a powerful open-source cluster-computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It’s commonly used for processing large-scale data and running complex ETL pipelines. Example Implementation:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("ETLExample") \
    .getOrCreate()

# Load data from source
source_data = spark.read.csv("source_data.csv", header=True, inferSchema=True)

# Apply transformations
transformed_data = source_data.select("column1", "column2").filter(source_data["column3"] > 10)

# Write data to destination
transformed_data.write.parquet("transformed_data.parquet")

spark.stop()
  1. Apache Airflow: Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. It allows you to define complex ETL workflows as directed acyclic graphs (DAGs) and manage their execution. Example Implementation: Define a DAG in a Python script:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

def etl_process():
    # Your ETL logic here
    pass

default_args = {
    'start_date': datetime(2023, 8, 1),
    'schedule_interval': '0 0 * * *',  # Run daily at midnight
}

dag = DAG('etl_workflow', default_args=default_args)

etl_task = PythonOperator(
    task_id='etl_task',
    python_callable=etl_process,
    dag=dag,
)
Continue reading “A Survey of 21 ETL Tools for Python”

Language Stack Installation for Python & Go

Previously I’ve gone through the steps I take to get a solid development machine setup. From the base operating system load, to the browser and basic IDEs I install. Now I’ve got more videos and the respective notes and details about what two language stacks I setup next; Python and Go.

Python

In regards to the Python stack, this one can often be somewhat confusing. Depending on the operating system I setup the stack a little differently.

MacOS

For MacOS I’ve written two posts about this previously, one titled “Getting Started with Python Right!” and one “Unbreaking Python Through Virtual Environments“. Those two posts cover most of the nuance to getting a base Python stack installed on MacOS and then using virtual environments to manage project specific versions per repository.

Linux

For the Linux OS, usually a debian variant, the systems tend to have Python 3 installed by default. I then take the next step of installing pip3 and work from there. The IDE, PyCharm from Jetbrains uses virtualenv to setup virtual environments per repository from that point forward.

Python Setup && Reasons

For more details about the specific walk through, I’ve created this video to walk through setting up Python 3 on Ubuntu and verifying, and also by use of PyCharm to setup a small verification app it shows how the virtualenv sets up a specific environment for the new verification project.

The reasons for installing Python first are numerous. One of the first reasons is that Python is required for installing and using numerous Python related CLIs, such as AWS’s CLI, among many others. It pays off to just have a good install at the system level (i.e. not particular just in a virutal environment, but executable at the terminal on system) to ensure it is available for any and all CLIs that would need it. If you’re into data science work, that’s a huge second reason, because Python is used in about every aspect of data science work, machine learning, and related endeavors.

Go Setup && Reasons

The reason I go for Go as my second language stack install is driven by two primary reasons:

  1. I like writing Go and use it myself for a number of reasons. Such as, it is ridiculously quick and minimal work to build a CLI for use in systems that requires only a single binary executable for use.
  2. I use Go for a lot of other work-related efforts, around Kubernetes, Docker, Terraform, and others.

With that, here’s the quick install and initial project for verification setup.

If you’d like to take a few other quick tours of Go, here are some posts, with videos, putting together a Go module project and writing an initial test in under 3 minutes and setting up an HTTP server in about 15 minutes.

That’s it for now. However, if you’re interested in joining me for next steps, language stack setup, and more in addition to writing some JavaScript, Go, Python, Terraform, and more infrastructure, web dev, and all sorts of coding I stream regularly on Twitch at https://twitch.tv/adronhall, post the VOD’s to YouTube along with entirely new tech and metal content at https://youtube.com/c/ThrashingCode. Feel free to check out a coding session, ask questions, interject, or just come and enjoy the tunes!

For more on my open source efforts and related projects, sign up for the Thrashing Code Newsletter!

Using Python’s Flask to Build a Basic API, Creating the didactic-engine-flask

I’ve worked with Python almost entirely from the maintenance programmer perspective. That is, I take other code written already, make edits, add features, and then redeploy it. I’ve created exactly zero greenfield applications in Python. That changes today however, with the creation of the didactic-engine-flask app!

Prerequisites

The only prereq to understanding this article is knowing git if you want to get the code, but it isn’t necessary. I’m starting this from ground zero. If you’re just getting started with Python, be sure to read “Getting Started With Python Right!” and “Unbreaking Python Through Virtual Environments” about setting up your environment. These two entries cover enough to ensure you won’t end up with broken, conflicted, and convoluted Python environments.

Mission: Build a Flask based API.

This post is about a singular thing, building an API with Flask. It won’t be about data modeling, databases, or wrapping middleware into the mix. It’s pure and simple Flask, with just the bare necessities needed to get an API working and responding appropriately requests. Continue reading “Using Python’s Flask to Build a Basic API, Creating the didactic-engine-flask”