Building DataDiluvium: A Data Generation Tool – Part 2: Core Implementation and Schema Handling

In Part 1, we set up our development environment and outlined the project structure. Now, we’ll dive into implementing the core functionality of DataDiluvium, starting with SQL schema parsing and the user interface for schema input.

SQL Schema Parsing Implementation

1. SQL Parser Setup

The SQL parser is implemented in src/app/lib/parsers/sqlParser.ts. This module is responsible for:

  1. Parsing SQL schema definitions using the node-sql-parser library
  2. Extracting table and column information
  3. Mapping SQL data types to appropriate data generators

The parser works by:

  1. Taking a SQL string input
  2. Converting it to an Abstract Syntax Tree (AST)
  3. Traversing the AST to extract:
    • Table names
    • Column names
    • Data types
    • Default values
    • Foreign key relationships

The SchemaColumn type defines the structure of our parsed columns:

export type SchemaColumn = {
  tableName: string;
  columnName: string;
  dataType: string;
  defaultValue: string | null;
  generator?: string;
  referencedTable?: string;
  referencedColumn?: string;
};
Continue reading “Building DataDiluvium: A Data Generation Tool – Part 2: Core Implementation and Schema Handling”

Building DataDiluvium: A Data Generation Tool – Part 1: Prerequisites and Project Overview

(To read up on how to use the site, check out the previous post Effortless Data Generation for Developers.)

DataDiluvium is a web-based tool I’ve built designed to help developers, database administrators, and data engineers generate realistic test data based on SQL schema definitions. The tool takes SQL table definitions as input and produces sample data in various formats, making it easier to populate development and testing environments with meaningful data.

Project Overview

The core functionality of DataDiluvium includes:

  • SQL schema parsing and validation
  • Customizable data generation rules per column
  • Support for foreign key relationships
  • Multiple export formats (JSON, CSV, XML, Plain Text, SQL Inserts)
  • Real-time preview of generated data
  • Dark mode support
  • Responsive design
Continue reading “Building DataDiluvium: A Data Generation Tool – Part 1: Prerequisites and Project Overview”

Effortless Data Generation for Developers

DataDiluvium is a web-based tool available at datadiluvium.com that helps developers, database administrators, and data engineers generate realistic test data from SQL schema definitions. Whether you’re setting up a development environment, creating test scenarios, or preparing data for demonstrations, DataDiluvium streamlines the process of data generation.

What is DataDiluvium?

Purpose

DataDiluvium serves several key purposes:

  1. Development Environment Setup: Quickly populate development databases with meaningful test data
  2. Testing: Generate consistent test data for automated testing scenarios
  3. Demonstrations: Create realistic data sets for product demonstrations
  4. Data Migration Testing: Validate data migration scripts with generated test data
  5. Schema Validation: Test database schema designs with realistic data

Key Features

  • SQL schema parsing and validation
  • Customizable data generation rules
  • Support for foreign key relationships
  • Multiple export formats (JSON, CSV, XML, Plain Text, SQL Inserts)
  • Real-time preview of generated data
  • Dark mode support
  • Responsive design

How to Use DataDiluvium

1. Accessing the Application

  1. Visit datadiluvium.com
  2. No account required – start using immediately
  3. Your data is processed locally in your browser

2. Defining Your Schema

  1. Navigate to the Schema page

  2. Enter your SQL schema definition in the text area Example:

    CREATE TABLE users (
        id INT PRIMARY KEY,
        username VARCHAR(50) NOT NULL,
        email VARCHAR(100) NOT NULL,
        created_at DATETIME DEFAULT CURRENT_TIMESTAMP
    );
    
    CREATE TABLE orders (
        id INT PRIMARY KEY,
        user_id INT,
        total_amount DECIMAL(10,2),
        created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
        FOREIGN KEY (user_id) REFERENCES users(id)
    );
    
  3. The application will automatically:

    • Parse your schema
    • Validate the structure
    • Suggest appropriate data generators
    • Show a preview of the parsed schema

3. Configuring Data Generation

  1. For each column, you can:

    • Select a data generator
    • Set custom parameters
    • Define relationships
  2. Available generators include:

    • Sequential Numbers
    • Usernames
    • Email addresses
    • Dates
    • Foreign Keys
    • Custom text
    • And more…
  3. Set the number of rows to generate:

    • Global row count for all tables
    • Table-specific row counts
    • Preview sample data before generation

4. Generating Data

  1. Click the “Generate” button
  2. Review the generation summary
  3. Confirm the generation
  4. Wait for the process to complete

5. Exporting Data

  1. Choose your preferred export format:

    • JSON: Standard JSON format with columns and rows
    • JSON (rich): Array of objects with column names as keys
    • CSV: Comma-separated values with headers
    • XML: Structured XML format
    • Plain Text: Human-readable format with numbered rows
    • SQL Inserts: Ready-to-use SQL INSERT statements
  2. Click the “Export” button

  3. Files will be downloaded automatically:

    • One file per table
    • Named according to the table name
    • Appropriate file extension based on format

Best Practices

1. Schema Design

  • Use clear, descriptive table and column names
  • Include appropriate constraints
  • Define foreign key relationships
  • Use appropriate data types

2. Data Generation

  • Start with a small number of rows for testing
  • Use appropriate generators for each column type
  • Consider data relationships when setting up foreign keys
  • Preview data before generating large sets

3. Export Selection

  • Choose JSON for application development
  • Use CSV for spreadsheet applications
  • Select SQL Inserts for direct database population
  • Consider Plain Text for human review

Example Workflow

Scenario: Setting up a Development Environment

  1. Define Schema

    CREATE TABLE products (
        id INT PRIMARY KEY,
        name VARCHAR(100) NOT NULL,
        price DECIMAL(10,2),
        category_id INT,
        created_at DATETIME DEFAULT CURRENT_TIMESTAMP
    );
    
    CREATE TABLE categories (
        id INT PRIMARY KEY,
        name VARCHAR(50) NOT NULL
    );
    
  2. Configure Generators

    • id: Sequential Number
    • name: Product Name
    • price: Random Decimal (10-1000)
    • category_id: Foreign Key to categories
    • created_at: Current Date
  3. Generate Data

    • Set 100 rows for products
    • Set 10 rows for categories
    • Generate and review
  4. Export

    • Choose SQL Inserts format
    • Download and execute in your development database

Tips and Tricks

1. Performance

  • Generate data in smaller batches for large schemas
  • Use appropriate generators for better performance
  • Preview data before large generations

2. Data Quality

  • Use meaningful generators for each column type
  • Consider data relationships
  • Validate generated data before use

3. Export Formats

  • JSON (rich) for application development
  • CSV for data analysis
  • SQL Inserts for database population
  • Plain Text for quick review

Support and Resources

  • Visit datadiluvium.com for the latest version
  • Check the documentation for detailed guides
  • Review sample schemas in the SQL samples section
  • Contact support for questions or feedback

Conclusion

DataDiluvium provides a user-friendly and powerful solution for generating test data from SQL schemas. Whether you’re a developer setting up a new project or a database administrator preparing test environments, DataDiluvium streamlines the process of data generation and helps ensure data quality and consistency.

Writing Swift Like a Pro: Clean Code, Separation of Concerns, and AI-Assisted Development

Swift is a powerful, multi-paradigm language that blends object-oriented programming (OOP), protocol-oriented programming (POP), and functional programming (FP). With the rise of artificial intelligence (AI) tools such as GitHub Copilot, Cursor, and Tabnine, it has become easier than ever to generate code quickly. However, while AI accelerates development, it does not replace software craftsmanship. If you already have strong coding practices and discipline, AI can be a valuable assistant. If you rely on AI blindly, you risk generating unreadable, unmaintainable, and inefficient code at scale.

Writing high-quality Swift code requires discipline in applying good naming conventions, maintaining a clear separation of concerns (SoC), following the five principles of SOLID design (Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion), and ensuring maintainability through best practices. While AI-generated code can be useful, it must be refactored and reviewed for clarity and correctness.

Continue reading “Writing Swift Like a Pro: Clean Code, Separation of Concerns, and AI-Assisted Development”

Swift Coding Conventions: Writing Maintainable, Readable, and Elegant Code

Swift – Apple’s slick and safety-focused (? still not sure I’m bought in on that phrasing) programming language – has become the backbone of Apple’s development. Whether you’re hacking on an indie app or shipping production-grade enterprise software, writing consistent, readable, and idiomatic Swift is non-negotiable. Swift has a robust set of guidelines to follow, and I wrote a thing or three about it’s style of OOP, POP, and FP rolled into one language here, but let’s cut through the noise and hit the key conventions that will level up your Swift code.

Continue reading “Swift Coding Conventions: Writing Maintainable, Readable, and Elegant Code”