Building DataDiluvium: A Data Generation Tool – Part 2: Core Implementation and Schema Handling

In Part 1, we set up our development environment and outlined the project structure. Now, we’ll dive into implementing the core functionality of DataDiluvium, starting with SQL schema parsing and the user interface for schema input.

SQL Schema Parsing Implementation

1. SQL Parser Setup

The SQL parser is implemented in src/app/lib/parsers/sqlParser.ts. This module is responsible for:

Parsing SQL schema definitions using the node-sql-parser library
Extracting table and column information
Mapping SQL data types to appropriate data generators

The parser works by:

Taking a SQL string input
Converting it to an Abstract Syntax Tree (AST)
Traversing the AST to extract:
- Table names
- Column names
- Data types
- Default values
- Foreign key relationships

The SchemaColumn type defines the structure of our parsed columns:

export type SchemaColumn = {
  tableName: string;
  columnName: string;
  dataType: string;
  defaultValue: string | null;
  generator?: string;
  referencedTable?: string;
  referencedColumn?: string;
};

Continue reading →

Building DataDiluvium: A Data Generation Tool – Part 1: Prerequisites and Project Overview

(To read up on how to use the site, check out the previous post Effortless Data Generation for Developers.)

DataDiluvium is a web-based tool I’ve built designed to help developers, database administrators, and data engineers generate realistic test data based on SQL schema definitions. The tool takes SQL table definitions as input and produces sample data in various formats, making it easier to populate development and testing environments with meaningful data.

Project Overview

The core functionality of DataDiluvium includes:

SQL schema parsing and validation
Customizable data generation rules per column
Support for foreign key relationships
Multiple export formats (JSON, CSV, XML, Plain Text, SQL Inserts)
Real-time preview of generated data
Dark mode support
Responsive design

Continue reading →

Effortless Data Generation for Developers

DataDiluvium is a web-based tool available at datadiluvium.com that helps developers, database administrators, and data engineers generate realistic test data from SQL schema definitions. Whether you’re setting up a development environment, creating test scenarios, or preparing data for demonstrations, DataDiluvium streamlines the process of data generation.

What is DataDiluvium?

Purpose

DataDiluvium serves several key purposes:

Development Environment Setup: Quickly populate development databases with meaningful test data
Testing: Generate consistent test data for automated testing scenarios
Demonstrations: Create realistic data sets for product demonstrations
Data Migration Testing: Validate data migration scripts with generated test data
Schema Validation: Test database schema designs with realistic data

Key Features

SQL schema parsing and validation
Customizable data generation rules
Support for foreign key relationships
Multiple export formats (JSON, CSV, XML, Plain Text, SQL Inserts)
Real-time preview of generated data
Dark mode support
Responsive design

How to Use DataDiluvium

1. Accessing the Application

Visit datadiluvium.com
No account required – start using immediately
Your data is processed locally in your browser

2. Defining Your Schema

Navigate to the Schema page

Enter your SQL schema definition in the text area Example:

CREATE TABLE users (
    id INT PRIMARY KEY,
    username VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    id INT PRIMARY KEY,
    user_id INT,
    total_amount DECIMAL(10,2),
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES users(id)
);

The application will automatically:
- Parse your schema
- Validate the structure
- Suggest appropriate data generators
- Show a preview of the parsed schema

3. Configuring Data Generation

For each column, you can:
- Select a data generator
- Set custom parameters
- Define relationships
Available generators include:
- Sequential Numbers
- Usernames
- Email addresses
- Dates
- Foreign Keys
- Custom text
- And more…
Set the number of rows to generate:
- Global row count for all tables
- Table-specific row counts
- Preview sample data before generation

4. Generating Data

Click the “Generate” button
Review the generation summary
Confirm the generation
Wait for the process to complete

5. Exporting Data

Choose your preferred export format:
- JSON: Standard JSON format with columns and rows
- JSON (rich): Array of objects with column names as keys
- CSV: Comma-separated values with headers
- XML: Structured XML format
- Plain Text: Human-readable format with numbered rows
- SQL Inserts: Ready-to-use SQL INSERT statements
Click the “Export” button
Files will be downloaded automatically:
- One file per table
- Named according to the table name
- Appropriate file extension based on format

Best Practices

1. Schema Design

Use clear, descriptive table and column names
Include appropriate constraints
Define foreign key relationships
Use appropriate data types

2. Data Generation

Start with a small number of rows for testing
Use appropriate generators for each column type
Consider data relationships when setting up foreign keys
Preview data before generating large sets

3. Export Selection

Choose JSON for application development
Use CSV for spreadsheet applications
Select SQL Inserts for direct database population
Consider Plain Text for human review

Example Workflow

Scenario: Setting up a Development Environment

Define Schema

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2),
    category_id INT,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE categories (
    id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL
);

Configure Generators
- id: Sequential Number
- name: Product Name
- price: Random Decimal (10-1000)
- category_id: Foreign Key to categories
- created_at: Current Date
Generate Data
- Set 100 rows for products
- Set 10 rows for categories
- Generate and review
Export
- Choose SQL Inserts format
- Download and execute in your development database

Tips and Tricks

1. Performance

Generate data in smaller batches for large schemas
Use appropriate generators for better performance
Preview data before large generations

2. Data Quality

Use meaningful generators for each column type
Consider data relationships
Validate generated data before use

3. Export Formats

JSON (rich) for application development
CSV for data analysis
SQL Inserts for database population
Plain Text for quick review

Support and Resources

Visit datadiluvium.com for the latest version
Check the documentation for detailed guides
Review sample schemas in the SQL samples section
Contact support for questions or feedback

Conclusion

DataDiluvium provides a user-friendly and powerful solution for generating test data from SQL schemas. Whether you’re a developer setting up a new project or a database administrator preparing test environments, DataDiluvium streamlines the process of data generation and helps ensure data quality and consistency.

Writing Swift Like a Pro: Clean Code, Separation of Concerns, and AI-Assisted Development

Swift is a powerful, multi-paradigm language that blends object-oriented programming (OOP), protocol-oriented programming (POP), and functional programming (FP). With the rise of artificial intelligence (AI) tools such as GitHub Copilot, Cursor, and Tabnine, it has become easier than ever to generate code quickly. However, while AI accelerates development, it does not replace software craftsmanship. If you already have strong coding practices and discipline, AI can be a valuable assistant. If you rely on AI blindly, you risk generating unreadable, unmaintainable, and inefficient code at scale.

Writing high-quality Swift code requires discipline in applying good naming conventions, maintaining a clear separation of concerns (SoC), following the five principles of SOLID design (Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion), and ensuring maintainability through best practices. While AI-generated code can be useful, it must be refactored and reviewed for clarity and correctness.

Continue reading →

Swift Coding Conventions: Writing Maintainable, Readable, and Elegant Code

Swift – Apple’s slick and safety-focused (? still not sure I’m bought in on that phrasing) programming language – has become the backbone of Apple’s development. Whether you’re hacking on an indie app or shipping production-grade enterprise software, writing consistent, readable, and idiomatic Swift is non-negotiable. Swift has a robust set of guidelines to follow, and I wrote a thing or three about it’s style of OOP, POP, and FP rolled into one language here, but let’s cut through the noise and hit the key conventions that will level up your Swift code.

Continue reading →

Tag: coding

Building DataDiluvium: A Data Generation Tool – Part 2: Core Implementation and Schema Handling

SQL Schema Parsing Implementation

1. SQL Parser Setup

Like this:

Building DataDiluvium: A Data Generation Tool – Part 1: Prerequisites and Project Overview

Project Overview

Like this:

Effortless Data Generation for Developers

What is DataDiluvium?

Purpose

Key Features

How to Use DataDiluvium

1. Accessing the Application

2. Defining Your Schema

3. Configuring Data Generation

4. Generating Data

5. Exporting Data

Best Practices

1. Schema Design

2. Data Generation

3. Export Selection

Example Workflow

Scenario: Setting up a Development Environment

Tips and Tricks

1. Performance

2. Data Quality

3. Export Formats

Support and Resources

Conclusion

Like this:

Writing Swift Like a Pro: Clean Code, Separation of Concerns, and AI-Assisted Development

Like this:

Swift Coding Conventions: Writing Maintainable, Readable, and Elegant Code

Like this:

SQL Schema Parsing Implementation

1. SQL Parser Setup

Share this:

Like this:

Project Overview

Share this:

Like this:

What is DataDiluvium?

Purpose

Key Features

How to Use DataDiluvium

1. Accessing the Application

2. Defining Your Schema

3. Configuring Data Generation

4. Generating Data

5. Exporting Data

Best Practices

1. Schema Design

2. Data Generation

3. Export Selection

Example Workflow

Scenario: Setting up a Development Environment

Tips and Tricks

1. Performance

2. Data Quality

3. Export Formats

Support and Resources

Conclusion

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: