A Survey of 21 ETL Tools for Python

Here are summaries of each of the tools you’ve mentioned along with examples of how to implement the ETL (Extract, Transform, Load) process using each tool within a Python workflow:

  1. Apache Spark: Apache Spark is a powerful open-source cluster-computing framework that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It’s commonly used for processing large-scale data and running complex ETL pipelines. Example Implementation:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("ETLExample") \
    .getOrCreate()

# Load data from source
source_data = spark.read.csv("source_data.csv", header=True, inferSchema=True)

# Apply transformations
transformed_data = source_data.select("column1", "column2").filter(source_data["column3"] > 10)

# Write data to destination
transformed_data.write.parquet("transformed_data.parquet")

spark.stop()
  1. Apache Airflow: Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. It allows you to define complex ETL workflows as directed acyclic graphs (DAGs) and manage their execution. Example Implementation: Define a DAG in a Python script:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

def etl_process():
    # Your ETL logic here
    pass

default_args = {
    'start_date': datetime(2023, 8, 1),
    'schedule_interval': '0 0 * * *',  # Run daily at midnight
}

dag = DAG('etl_workflow', default_args=default_args)

etl_task = PythonOperator(
    task_id='etl_task',
    python_callable=etl_process,
    dag=dag,
)
Continue reading “A Survey of 21 ETL Tools for Python”

Language Stack Installation for Python & Go

Previously I’ve gone through the steps I take to get a solid development machine setup. From the base operating system load, to the browser and basic IDEs I install. Now I’ve got more videos and the respective notes and details about what two language stacks I setup next; Python and Go.

Python

In regards to the Python stack, this one can often be somewhat confusing. Depending on the operating system I setup the stack a little differently.

MacOS

For MacOS I’ve written two posts about this previously, one titled “Getting Started with Python Right!” and one “Unbreaking Python Through Virtual Environments“. Those two posts cover most of the nuance to getting a base Python stack installed on MacOS and then using virtual environments to manage project specific versions per repository.

Linux

For the Linux OS, usually a debian variant, the systems tend to have Python 3 installed by default. I then take the next step of installing pip3 and work from there. The IDE, PyCharm from Jetbrains uses virtualenv to setup virtual environments per repository from that point forward.

Python Setup && Reasons

For more details about the specific walk through, I’ve created this video to walk through setting up Python 3 on Ubuntu and verifying, and also by use of PyCharm to setup a small verification app it shows how the virtualenv sets up a specific environment for the new verification project.

The reasons for installing Python first are numerous. One of the first reasons is that Python is required for installing and using numerous Python related CLIs, such as AWS’s CLI, among many others. It pays off to just have a good install at the system level (i.e. not particular just in a virutal environment, but executable at the terminal on system) to ensure it is available for any and all CLIs that would need it. If you’re into data science work, that’s a huge second reason, because Python is used in about every aspect of data science work, machine learning, and related endeavors.

Go Setup && Reasons

The reason I go for Go as my second language stack install is driven by two primary reasons:

  1. I like writing Go and use it myself for a number of reasons. Such as, it is ridiculously quick and minimal work to build a CLI for use in systems that requires only a single binary executable for use.
  2. I use Go for a lot of other work-related efforts, around Kubernetes, Docker, Terraform, and others.

With that, here’s the quick install and initial project for verification setup.

If you’d like to take a few other quick tours of Go, here are some posts, with videos, putting together a Go module project and writing an initial test in under 3 minutes and setting up an HTTP server in about 15 minutes.

That’s it for now. However, if you’re interested in joining me for next steps, language stack setup, and more in addition to writing some JavaScript, Go, Python, Terraform, and more infrastructure, web dev, and all sorts of coding I stream regularly on Twitch at https://twitch.tv/adronhall, post the VOD’s to YouTube along with entirely new tech and metal content at https://youtube.com/c/ThrashingCode. Feel free to check out a coding session, ask questions, interject, or just come and enjoy the tunes!

For more on my open source efforts and related projects, sign up for the Thrashing Code Newsletter!

Using Python’s Flask to Build a Basic API, Creating the didactic-engine-flask

I’ve worked with Python almost entirely from the maintenance programmer perspective. That is, I take other code written already, make edits, add features, and then redeploy it. I’ve created exactly zero greenfield applications in Python. That changes today however, with the creation of the didactic-engine-flask app!

Prerequisites

The only prereq to understanding this article is knowing git if you want to get the code, but it isn’t necessary. I’m starting this from ground zero. If you’re just getting started with Python, be sure to read “Getting Started With Python Right!” and “Unbreaking Python Through Virtual Environments” about setting up your environment. These two entries cover enough to ensure you won’t end up with broken, conflicted, and convoluted Python environments.

Mission: Build a Flask based API.

This post is about a singular thing, building an API with Flask. It won’t be about data modeling, databases, or wrapping middleware into the mix. It’s pure and simple Flask, with just the bare necessities needed to get an API working and responding appropriately requests. Continue reading “Using Python’s Flask to Build a Basic API, Creating the didactic-engine-flask”

Unbreaking Python Through Virtual Environments

I wrote some days ago a post “Getting Started With Python Right!“. In this post I wrote about what I’d found to be the best way to setup MacOS for Python development. In it I also added links and a few details to get a Windows or Linux machine setup right for Python development.

However, there’s more, as @tlockney (Thomas Lockney) pointed out in a comment! He detailed,

  1. Since you’re using pyenv, the version of pip you use should always be the one associated with the current version of Python, which won’t be the case when you later switch versions.
  2. People who aren’t clear on what’s going on will likely copy the code you included verbatim, so instead of aliasing the version of pip referenced in the pyenv path, it’s going to always point at the brew installed version.
  3. You really should use virtualenv for pretty much everything and try to avoid ever installing libraries into your global Python environment. If you need tools accessible outside a virtualenv, check out pipx.

Continue reading “Unbreaking Python Through Virtual Environments”

Getting Started With Python Right!

UPDATE POST Added details about virtualenv to complement this post, per Thomas’ comment in the comments section below! Read this post for MacOS specific system Python setup, then read that for more options on how to set things up right!

Recently I sat down to get started on some Python work, specifically on MacOS. I’ve written code in Python before and tried out 2.x and 3.x before. I’m going with 3.x for this and upcoming articles but one thing became apparent. The Python tech stack is still ferociously fragmented in a number of ways, and this post is to provide some clarity if this is your first endeavor into the stack. Continue reading “Getting Started With Python Right!”