Jonathan Ellis talks about Five Lessons in Distributed Databases

Notes on the talk…

  1. If it’s not SQL it’s not a database. Watch, you’ll get to hear why… ha!

Then Jonathan covers the recent history (sort of recent, the last ~20ish years) of the industry and how we’ve gotten to this point in database technology.

  1. It takes 5+ years to build a database.

Also the tens of millions of dollars with that period of time. Both are needed, in droves, time and money.

…more below the video.

  1. The customer is always right.

Even when they’re clearly wrong, they’re largely right.

For number 4 and 5 you’ll have to watch the video. Lot’s good stuff in this video including comparisons of Cosmos, Dynamo DB, Apache Cassandra, DataStax Enterprise, and how these distributed databases work, their performance (3rd Party metrics are shown) and more details!

Distributed Systems: Cassandra, DataStax, a Short SITREP

SITREP = Situation Report. It’s military speak. 💂🏻‍♂️

Apache Cassandra is one of the most popular databases in use today. It has many characteristics and distinctive architectural details. In this post I’ll provide a description and some details for a number of these features and characteristics, divided as such. Then, after that (i.e. toward the end, so skip there if you just want to the differences) I’m doing to summarize key differences with the latest release of the DataStax Enterprise 6 version of the database.

Cassandra Characteristics

Cassandra is a linearly scalable, highly available, fault tolerant, distributed database. That is, just to name a few of the most important characteristics. The Cassandra database is also cross-platform (runs on any operating systems), multi-cloud (runs on and across multiple clouds), and can survive regional data center outages or even in multi-cloud scenarios entire cloud provider outages!

Columnar Store, Column Based, or Column Family? What? Ok, so you might have read a number of things about what Cassandra actually is. Let’s break this down. First off, a columnar or column store or column oriented database guarantees data location for a single column in a node on disk. The column may span a bunch of or all of the rows that depend on where or how you specify partitions. However, this isn’t what the Cassandra Database uses. Cassandra is a column-family database.

A column-family storage architecture makes sure the data is stored based on locality of the data at the partition level, not the column level. Cassandra partitions group rows and columns split by a partition key, then clustered together by a specified clustering column or columns. To query Cassandra, because of this, you must know the partition key in order to avoid full data scans!

Cassandra has these partitions that guarantee to be on the same node and sort strings table (referred to most commonly as an SSTable *) in the same location within that file. Even though, depending on the compaction strategy, this can change things and the partition can be split across multiple files on a disk. So really, data locality isn’t guaranteed.

Column-family stores are great for high throughput writes and the ability to linearly scale horizontally (ya know, getting lots and lots of nodes in the cloud!). Reads using the partition key are extremely fast since this key points to exactly where the data resides. However, this often – at least last I know of – leads to a full scan of the data for any type of ad-hoc query.

A sort of historically trivial but important point is the column-family term comes from the storage engine originally used based on a key value store. The value was a set of column value tuples, which where often referenced as family, and later this family was abstracted into partitions, and then the storage engine was matched to that abstraction. Whew, ok, so that’s a lot of knowledge being coagulated into a solid eh!  [scuse’ my odd artful language use if you visualized that!]

With all of this described, a that little history sprinkled in, when reading the description of Cassandra in the README.asc file of the actual Cassandra Github Repo things make just a little more sense. In the file it starts off with a description,

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

Now that I’ve covered the 101 level of what Cassandra is I’ll give a look at DataStax and their respective offering.

DataStax

DataStax Enterprise at first glance might be a bit confusing since immediate questions pop up like, “Doesn’t DataStax make Cassandra?”, “Isn’t DataStax just selling support for Cassandra?”, or “Eh, wha, who is DataStax and what does this have to do with Cassandra?”. Well, I’m gonna tell ya all about where we are today regarding all of these things fit.

Performance

DataStax provides a whole selection of amenities around a database, which is derived from the Cassandra Distributed Database System. The core product and these amenities are built into what we refer to as the “DataStax Enterprise 6“. Some of specific differences are that the database engine itself has been modified out of band and now delivers 2x the performance of the standard Cassandra implemented database engine. I was somewhat dubious when I joined but after the third party benchmarks where completed that showed the difference I grew more confident. My confidence in this speed increase grew as I’ve gotten to work with the latest version I can tell in more than a few situations that it’s faster.

Read Repair & NodeSync

If you already use Cassandra, read repair works a certain way and that still works just fine in DataStax Enterprise 6. But one also has the option of using NodeSync which can help eliminate scripting, manual intervention, and other repair operations.

Spark SQL Connectivity

There’s also an always on SQL Engine for automated uptime for apps using DataStax Enterprise Analytics. This provides a better level of analytics requests and end -user analytics. Sort of on this related note, DataStax Studio also has notebook support for Spark SQL now. Writing one’s Spark SQL gets a little easier with this option.

Multi-Cloud / Hybrid-Cloud

Another huge advantage of DataStax Enterprise is going multi-cloud or hybrid-cloud with DataStax Enterprise Cassandra. Between the Lifecycle Manager (LCM), OpsCenter, and related tooling getting up and running with a cluster across a varying range of data-centers wherever they may be is quick and easy.

Summary

I’ll be providing deeper dives into the particular technology, the specific differences, and more in the future. For now I’ll wrap up this post as I’ve got a few others coming distinctively related to distributed database systems themselves ranging from specific principles (like CAP Theorem) to operational (how to and best ways to manage) and development (patterns and practices of developing against) related topics.

Overall the solutions that DataStax offers are solid advantages if you’re stepping into any large scale data (big data or whatever one would call their plethora of data) needs. Over the coming months I’ve got a lot of material – from architectural research and guidance to tactical coding implementation work – that I’ll be blogging about and providing. I’m really looking forward to exploring these capabilities, being the developer advocate to DataStax for the community of users, and learning a thing or three million.

A Recap Of My Top 4 Tech Article Reads From Pocket

Sometimes I get overwhelmed with the number of articles that are in my pocket. I’ve got articles on livability, transit, cycling, auto issues, node.js, java, javascript, coding practices, software craftsmanship, feminism, heavy metal, death metal, black metal, jazz, progressive jazz, fusion jazz, NASA news, space discoveries, space research, Star Trek news, Star Wars news, information on sci-fi books and a slight spattering of politics and some other just interesting nonsensical stuff.

Here's a shot of Pocket on OS-X with an article about Seattle's Tech Advantage over many American cities being rooted in urban density. Which, I'd also argue, gives Seattle a unique advantage (And is a serious pain point for Microsoft's misstep into the suburbs decades ago)

Here’s a shot of Pocket on OS-X with an article about Seattle’s Tech Advantage over many American cities being rooted in urban density. Which, I’d also argue, gives Seattle a unique advantage (And is a serious pain point for Microsoft’s misstep into the suburbs decades ago)

I’ve taken the time to sort through this list of articles, pick out the top technical articles and get this down to a manageable level again. In the process I’ve created this list of solid articles that I’ve now officially read or found useful in some way and present it here for you dear reader. Enjoy, I hope they’re useful to you too.

Article Recon, The Top

  1. Zef Hemel wrote up a piece titled “Docker: Using Linux Containers to Support Portable Application Deployment“. In the article Zef delves into a number of things that are key to understanding Docker and the notion of portland application deployment. Other topics covered include isolation, security, reproducing deployments and resource constraints. The article closes with an example of  application containers and their respective deployment.
  2. 7 Javascript Basics Many Developers Aren’t Using (Properly) albeit slightly useful, I found this one more entertaining. It does give some small insight to the scope of oddities that JavaScript has and how one can easily miss the basics in JavaScript.
  3. Even though the article is from late last year, “The Premature Return to SQL” is a good read. As Alex Popescu   states it, “This pisses me off. A lot.” I too find myself pissed off a lot at the naive understanding and decisions making around SQL or alternate options. It’s almost as if some people decide to just flip a coin to make these determinations with zero insight into what they’re actually attempting to do.
  4. The article “No Deadlines for You! Software Dev Without Estimates, Specs or Other Lies” is spectacular in laying out how bullshit specs and estimates are. They’re almost entirely wasted effort on the developers part. In my own opinion it is often a failure (and yeah, I’ve been in management and leadership too, and removed these issues) of management to understand in the slightest what is actually being built or how it is being built. A lack of vision on behalf of the project is a sure fire sign that the original estimates are already completely off, the design and build out of whatever it is will likely be wrong and a host of other issues. Building software isn’t a bridge, it’s more like a painting, you decide as you go. There is no paint by numbers in software development.

Anyway, that’s my list from the 50+ tech articles that were in my Pocket app. Maybe on day I can get disciplined enough to keep the list limited to really good reads and I’ll start putting together a “My Top Pocket Reads this Month” blog entries? That sounds like it could be useful. Until then, happy coding.

Sorry Database Nerds, Nobody Actually Gives a Shit…

So I’ve been in more than a few conversations about data structures, various academic conversations and other notions about where and how data should be stored. I’ve been on projects and managed projects that involve teams of people determining how to manage data so that other people can just not manage data. They want to focus on business use and not the data mechanisms underneath. The root of everything around databases really boils down to a single thing – how can we store X and retrieve X – nobody actually trying to get business done or change the world is going to dig into the data storage mechanisms if they don’t have to. To summarize,

nobody actually gives a shit…

At least nobody does until the database breaks, or somebody has to be hired to manage or tune queries or something or some other problem comes up. In the ideal world we could just put data into the ether and have it come back when we ask for it. Unfortunately we have to keep caring for where the data is, how it’s stored, the schema (even in schema-less, you still need to know the schema of the data at some point, it’s just another abstraction to push off dealing with the database), how to backup, recover, data gravity, proximity and a host of other concerns. Wouldn’t it be cool if we could just work on our app or business? Wouldn’t it be nice to just, well, focus on things we actually give a shit about?

Managed Data Systems!

The whole *aaS and PaaS World has been pushing to simplify operations to the point that the primary, if not the only concern, is the business itself. This is a pretty big step in many ways, but holds a lot of hope and promise around fixing the data gravity, proximity, management and related concerns. One provider of services that has an interesting start around the NoSQL realm is Orchestrate.io. I’ll have more about them in the future, as I’ll actually be working on hacking on some code against their platform. They’re currently solving a number of the mentioned issues. Which is great, a solid starting point that takes us past the draconian nature of the old approach to NoSQL and Relational Databases in general.

There has been some others, such as Mongo Labs or such, that have created a sort of DBaaS. This however doesn’t fill the gap that Orchestrate.io is filling. So far almost every *aaS database or other solution has merely been a single type of database that a developer can just throw data at in a single kind of way. Not really flexible, and really only abstracting some manual work, but not providing much additional value add around using the actual data. Orchestrate.io is bridging these together with search, replication and other features to provide a platform on which multiple options are available via the API. Key value, geo, time series and others are all coming together for them nicely. Having all the options actually creates a real value add, versus just provide one single way to do one thing.

Intelligent Data Systems?

After checking out and interviewing Orchestrate.io recently I’ve stumbled into a few other ideas. It would be perfect for them to implement or for the open source community to take a stab at. What would happen if the systems storing the data knew where to put things? What would be the case for providing an intelligent indexing policy or architecture at the schema design decision layer, the area where a person usually must intervene? Could it be done?

A decision tier that scans and makes decisions on the data to revamp the way it is stored against a key value, geo, time series or other method. Could it be done in real time? Would it have to go through some type of processing system? The options around implementing something like this are numerous, but this just leaves a lot of space for providing value add around the data to reduce the complexity of this decision making.

Imagine you have key value data, that needs to be associative based on graph principles, that you must store in a highly available system with pertinent real-time data provided based on those graph relations. A decision layer, to create an intelligent data system, could monitor the data and determine the frequent query paths against the data. If the data is growing old it could move data from real-time to archival via the key value. Other decisions could be made to push up data segments into a cache tier or some other mechanism to provide realtime graph connections to client queries. These are all decisions that would need to be made by somebody working on the data, but could be put into a set of rules to allow for re-allocation of the data via automated mechanisms into better storage options. Why keep old data that isn’t queried in the active in memory graph store, push it to the distributed key store. Why keep the graph data on drive when it can be in memory with correlated keys in a key value in memory store, backed by an on drive key value? All valid decisions, all becoming better understood day by day. It’s about time some of this decision process started to be automated.

What are your thoughts? Pro-intelligent data systems or anti-intelligent data systems? Think it’ll work or is it the wrong approach? Maybe the system should approach some other zenith or axiom point to become truly abstracted and transparent?

Widgetz SQL Server DB + ASP.NET MVC 3 Web Application Scaffolding

This is a quick walk through of how to setup a database, create some model objects, and build a scaffold for CRUD (Create, Read, Update, and Delete) all using ASP.NET MVC 3 and Visual Studio 2010.

Open Visual Studio 2010 and start a database project as shown.

SQL Server 2008 Database Project in Visual Studio 2010 (Click for full size image)

SQL Server 2008 Database Project in Visual Studio 2010 (Click for full size image)

Next create an ASP.NET MVC 3 Web Application in this solution.

ASP.NET MVC 3 Web Application (Click for full size image)

ASP.NET MVC 3 Web Application (Click for full size image)

Now open the Server Explorer in Visual Studio 2010 (You might have to click on View to find it if it isn’t already available in the main IDE). Right click on the Data Connections node in the Service Explorer Tree. Select Create Database and provide a name for the database.

Create a New SQL 2008 Database

Create a New SQL 2008 Database

New Database Dialog

New Database Dialog

Next in the Server Explorer, right click on the newly created database and select New Query. Then in the query window paste this SQL create script in and run it by right clicking and selecting execute.

CREATE TABLE [dbo].[Widgetz] (
    [Id]           UNIQUEIDENTIFIER NOT NULL,
    [Name]         NVARCHAR (50)    NOT NULL,
    [Description]  NTEXT            NULL,
    [Created]      DATETIME         NOT NULL,
    [Terminated]   DATETIME         NULL,
    [Addon]        BIT              NOT NULL,
    [Active]       BIT              NOT NULL,
    [Cost]         MONEY            NULL,
    [ChildWidgetz] INT              NULL
);

ALTER TABLE [dbo].[Widgetz]
    ADD CONSTRAINT [PK_Widgetz] PRIMARY KEY CLUSTERED ([Name] ASC) WITH (ALLOW_PAGE_LOCKS = ON, ALLOW_ROW_LOCKS = ON, PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF);

ALTER TABLE [dbo].[Widgetz]
    ADD CONSTRAINT [DF_Widgetz_Created] DEFAULT (getdate()) FOR [Created];

ALTER TABLE [dbo].[Widgetz]
    ADD CONSTRAINT [DF_Widgetz_Active] DEFAULT ((1)) FOR [Active];

ALTER TABLE [dbo].[Widgetz]
    ADD CONSTRAINT [DF_Widgetz_Addon] DEFAULT ((0)) FOR [Addon];

Once the table is created bring to focus the Solution Explorer and right click on the Database Project, selecting the Import Database Objects and Settings.

Import Database Objects and Settings...

Import Database Objects and Settings...

Next select the correct database connection, and leave everything else as it is set.

Selecting the appropriate database, options, and click next... (Click for the full size image)

Selecting the appropriate database, options, and click next... (Click for the full size image)

Next the wizard will finish importing the database objects.

Finished (Click for full size image)

Finished (Click for full size image)

If you look through the folder tree of the project you’ll find individual files for the table, defaults, and other object settings.

Now right click on the Models directory of the ASP.NET MVC 3 Project that was created earlier. Select add and then new item. In that dialog add a new ADO.NET Entity Data Model.

Yup, just when you think it had gone away...  ADO.NET (Click for full size image)

Yup, just when you think it had gone away... ADO.NET (Click for full size image)

A dialog will appear, select the option to Generate from database and click next.

Entity Data Model Wizard (Click for full size image)

Entity Data Model Wizard (Click for full size image)

Next select the correct database connection again, then click on Yes to save the sensitive data (which if you’re using windows authentication it isn’t really going to save anything dangerous). Make sure the entities are named appropriately and click Next.

Options for Creating the ADO.NET Entity Model

Options for Creating the ADO.NET Entity Model

The next screen, which may take a moment or two to display the list of tables and objects, will appear. Select the table that has been created and click on

Entity Data Model Wizard (Click for full size image)

Entity Data Model Wizard (Click for full size image)

Once done and finish is clicked, the entity model file will show the display view of the entity that was just created.

Entity Model

Entity Model

Now right click on the controller folder and select to create a new controller.

Add Controller (Click for full size image)

Add Controller (Click for full size image)

Be sure to click the advanced options and select the layout page.

Advanced Options (Click for full size image)

Advanced Options (Click for full size image)

Now click on Ok and then Add. Visual Studio 2010 will not generate the scaffolding for the controller and views around CRUD operations.

The Scaffolding in ASP.NET MVC 3

The Scaffolding in ASP.NET MVC 3

This might seem like an absurdly easy thing to do, and you’d be right. There are however many steps to turn this into a feasible, well design user interface, and provide a solid and intelligent user experience (UX) to the end user. Also, this is arguably a not so maintainable bit of work. If the end user ONLY wants to operate on the data with CRUD, then this is great. But the minute something else is needed a very different, testable, and maintainable architecture should be utilized instead of this generated scaffolding. Which in coming blog entries I will be covering some of the ways to create testable, maintainable, and better designed architecture around more than CRUD.  🙂

CODE: All the code for this project is available in the Lesson 1 – ScaffoldGeneratedWidgetz.

A SQL Server .NET ASP.NET MVC RESTful Web Services Facade – Part I

Did I get enough of the acronyms and key words in the header?  It looks like soup!  :O

This is a somewhat messy project to build a prototype layer around SQL Server. The reason for this, shockingly, is to allow for a SQL Server to be used by frameworks and systems that normally don’t or can’t access the database directly. In my particular scenario we’re working on getting Ruby on Rails running with JRuby in a Windows Environment. Because we will need to utilize a lot of SQL Server Databases, it seemed like a great idea to build out a layer over the SQL Server (or Servers) so that a Ruby on Rails Web App, ASP.NET MVC, or even a PHP or pure Javascript Application could access the data in the database. What better way to do that then to create a RESTful Web Services Facade over the database.

Some of you might be thinking “Why not use RIA Services?!?!?! Are you mad!!” Well, there is a big problem, RIA Services doesn’t work against SQL 2000 or SQL 2005, which is the database technology that this particular requirement dictated. Well, now that you have context, I’ll dig straight in to what I did building this prototype out.

Kick Out a SQL Server Database Project

I need some data, and a database, with just some of the standard junk you’d expect in a production database. One of the best ways to throw together a database in a really short amount of time, with data, is to use a SQL Server Database Project.

New Database Project (Click for larger image)

New Database Project (Click for larger image)

You might see this and think, “But you said that the facade is against a SQL Server 2000 or 2005 database!” Well, it is, but to get a database running locally and have this project type work, I’m using my local SQL Server 2008 Express installation. However, I’m limiting myself to data types primarily available to SQL Server 2000 and 2005. So no worries, this works just fine against those archaic databases.  😛

First I ran the following script to create the database and some sample tables with various data types.

DROP DATABASE SomeExistingOrMigratedDatabase
GO
CREATE DATABASE SomeExistingOrMigratedDatabase
GO
USE SomeExistingOrMigratedDatabase
GO
IF  EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Person_Village]') AND parent_object_id = OBJECT_ID(N'[dbo].[Person]'))
ALTER TABLE [dbo].[Person] DROP CONSTRAINT [FK_Person_Village]
GO
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Person]') AND type in (N'U'))
DROP TABLE [dbo].[Person]
GO
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SomeFlatDenormalizedDataTable]') AND type in (N'U'))
DROP TABLE [dbo].[SomeFlatDenormalizedDataTable]
GO
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Village]') AND type in (N'U'))
DROP TABLE [dbo].[Village]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Village]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Village](
	[Id] [uniqueidentifier] NOT NULL,
	[Village] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_Village] PRIMARY KEY CLUSTERED
(
	[Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
)
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SomeFlatDenormalizedDataTable]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[SomeFlatDenormalizedDataTable](
	[Id] [uniqueidentifier] NOT NULL,
	[StarzDate] [datetime] NOT NULL,
	[Numerals] [int] NULL,
	[Numberals] [int] NULL,
	[Monies] [decimal](14, 4) NOT NULL,
	[Day] [int] NOT NULL,
	[Month] [int] NOT NULL,
	[Year] [int] NOT NULL,
	[BigNonsense] [ntext] COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[Flotsam] [float] NULL,
	[Jetsam] [float] NULL,
	[SmallishText] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[BiggishText] [nvarchar](2999) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_SomeFlatDenormalizedDataTable] PRIMARY KEY CLUSTERED
(
	[Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
)
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Person]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Person](
	[Id] [uniqueidentifier] NOT NULL,
	[Name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
	[DateOfBirth] [datetime] NOT NULL,
	[VillageId] [uniqueidentifier] NULL,
 CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED
(
	[Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
)
END
GO
IF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Person_Village]') AND parent_object_id = OBJECT_ID(N'[dbo].[Person]'))
ALTER TABLE [dbo].[Person]  WITH CHECK ADD  CONSTRAINT [FK_Person_Village] FOREIGN KEY([VillageId])
REFERENCES [dbo].[Village] ([Id])
GO
IF  EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Person_Village]') AND parent_object_id = OBJECT_ID(N'[dbo].[Person]'))
ALTER TABLE [dbo].[Person] CHECK CONSTRAINT [FK_Person_Village]

Once the database and tables are created, import the database into the database project. To do this select the “Import Database Objects and Settings…” by right clicking the context menu on the Database Project.

Import Database Objects and Settings...

Import Database Objects and Settings...

Select the database just created and click on start. Once the script generation is done, navigate into the project directories and you will see the following scripts have been created.

Generated Scripts (click for larger image)

Generated Scripts (click for larger image)

Next create a new data generation plan in the Data Generation Plans folder (notice I already cheated and have one in the above image).

Creating a Data Generation Plan

Creating a Data Generation Plan

Open up the file this creates (I called mine BuildSomeData.dgen). In the file, note I selected the relationship between the Village and People Tables, and set the ratio to 60:1. When you change the data in the Village table it then automatically updates how much data will be generated for the People Table.

Data Generation Plan

Data Generation Plan

When all that is done, hit F5, select the database and the data will be generated. That gets us a database with data to use as an existing source. From here I’ll jump into creating the actual Facade Layer.

NOTES: Once you generate data, depending on how much you decided to generate, you may want to see how big your database is by using the sp_dbhelp stored procedure. I am however, unsure which versions of SQL Server this stored procedure is available in.

Code for this project is available here: https://github.com/Adron/ExistingSqlServerProject