I was talking with Tory Adams @BEZEI2K about working with Orchestrate‘s Services. We’re totally sold on what they offer and are looking forward to a lot of the technology that is in the works. The day to day building against Orchestrate is super easy, and setting up collections for dev or test or whatever are so easy nothing has stood in our way. Except one thing…
Every once in a while we have to work disconnected. For whatever the reason might be; Comcast cable goes out, we decide to jump on a train or one of us ends up on one of those Q400 puddle jumpers that doesn’t have wifi! But regardless of being disconnected from wifi, cable or internet connectivity we still want to be able to code and test!
In Memory Orchestrate Wrapper
Enter the idea of creating an in memory Orchestrate database wrapper. Using something like convict.js one could easily redirect all the connections as necessary when developing locally. That way development continues right along and when the application is pushed live, it’s redirected to the appropriate Orchestrate connections and keys!
This in memory “fake” or “mock” would need to have the key value, events, and graph store setup just like Orchestrate. With the possibility of having this in memory one could also easily write tests against a real fake and be able to test connected or disconnected without mocking. Not to say that’s a good or bad idea, but just one more tool in the tool chest doesn’t hurt!
If something like this doesn’t pop up in the next week or three, I might just have to kick off this project myself! If anybody is interested please reach out to me and let’s discuss! I’m open to writing it in JavaScript, C#, Java or whatever poison pill you’d prefer. (I’m not polyglot to limit my options!!)
Other Ideas, Development Shop Swap
Another idea that I’ve been pondering is setting up a development shop swap. I’ll leave the reader to determine what that means! 😉 Feel free to throw down ideas that this might bring up and I’ll incorporate that into the soon to be implementation. I’ll have more information about that idea right here once the project gets rolling. In the meantime, happy coding!
NOTE: If you just want to check out the code bits, scroll down to the sub-title #symphonize #hacking. Also important to note I’m putting the library through a fairly big refactor at the moment so that everything aligns with the documentation that I’ve recently created. So many things may not be implemented, but we’re moving toward v0.1.0, which will be a functional implementation of the library available via npm based entirely on the documentation and specs that I outline after the history.
There are two main reasons why I chose Orchestrate.io and a data generation library as the two things I wanted to combine. The first, is I knew the orchestrate.io team and really dug what they were building. I wanted to work with it and check out how well it would work for my use cases in the future. The ability to go sit down, discuss with them what they were building was great (which I interviewed Matt Heitzenroder @roder that you can watch Orchestrate.io, Stop Dealing With the Database Infrastructure!) The second reason is that my own startup that I’m co-founding with Aaron Gray (@agray) needed to use key value and graph data storage of some type, somewhere. Orchestrate.io looked like a perfect fit. After some research, giving it a go, it fit very well into what we are building.
December then rolled into the standard holiday doldrums and slowdowns. So fast forward to January post a few rounds of beer and good tidings and I got the 3rd in the series published titled Getting Serious With Symphony.js – JavaScript TDD/BDD Coding Practices (3/3). The post doesn’t speak too much to symphony.js usage but instead my efforts to use TDD or BDD practices in trying to write the library.
Slowly I made progress in building the library and finally it’s in a mostly releasable state now. I use this library daily in working with the code base for Deconstructed and imagine I’ll use it ongoing for many other projects. I hope others might be able to find uses for it too and maybe even add capabilities or ideas. Just ping me via Twitter @adron or Github @adron, add an issue on Github and I’ll be happy to accept pull requests for new features, code refactoring, add you to the project or whatever else you’re interested in.
#symphonize #hacking
Now for the nitty gritty. If you’re up for using or contributing to the project check out the symphonize.js github pages site first. It’s got all the information to help get you kick started. However, you can keep reading as I’ve included much of the information there along with the examples from the README.md below.
NOTE: As I mentioned at the top of this blog entry, the funcitonal implementation of code isn’t available via npm just yet, myself and some others are ripping through a good refactor to align the implementation fo the library with the rewritten and newly available documentation – included blow and at the github pages.
[sourcecode language=”javascript”]
git clone git@github.com:YourUserName/symphonize.git
cd symphonize
npm install
[/sourcecode]
Using The Library
The intended usage is to invocate the JavaScript object and then call generate. That’s it, a super simple process. The code would look like this:
[sourcecode language=”javascript”]var Symphonize = require(‘../bin/symphonize’);
var symphonize = new Symphonize();
[/sourcecode]
The basic constructor invocation like this utilizes the generate.json file to generate data from. To inject the json configuration programmatically just inject the json configuration information via the constructor.
[sourcecode language=”javascript”]
var configJson = {"schema":"keyvalue"};
var Symphonize = require(‘../bin/symphonize’);
var symphonize = new Symphonize();
[/sourcecode]
Once the Symphonize data generator has been created call the generate() method as shown.
That’s basically it. But you say, it’s supposed to do X, Y or Z. Well that’s where the json configuration data comes into play. In the configuration data you can set the data fields and what they’ll generate, what type of data will be generated, the specific schema, how many records to create and more.
generate.json
The library comes with the generate.json file already setup with a working example. Currently the generation file looks like this:
[sourcecode language=”javascript”]
{
"schema": "keyvalue", /* keyvalue, graph, event, geo */
"count": 20, /* X values to generate. */
"write_source": "console", /* console, orchestrateio and whatever other data sources that might come up. */
"fields": {
/* generates a random name. */
"fieldName": "name",
/* generates a random dice roll of a d20. */
"fieldTwo": "d20",
/* A single lorum ipsum random statement is genereated. */
"fieldSentence": "sentence",
/* A random guid is generated. */
"fieldGuid": "guid" }
}
[/sourcecode]
Configuration File Definitions
Each of the configuration options that are available have a default in the configuration file. The default is listed in italics with each definition of the configuration option listed below.
“schema” : This is used to select what type of data structure type is going to be generated. The default iskeyvalue for this option.
“count” : This provides the total records that are to be generated by the library. The default is 1 for this option.
“write_source” : This provides the location to output the generated data to. The default is console for this option.
“fields” : This is a JSON field within the JSON configuration file that provides configuration options around the fields, number of fields and their respective data to generate. The default is one field, with a default data type of guid. Each of the respective entries in this JSON option is a self contained JSON name and value pair. This then looks simply like this (which is also shown above in part):[sourcecode language=”javascript”]{
"someBoolean": "boolean",
"someChar": "character",
"aFloat": "float",
"GetAnInt": "integer",
"fieldTwo": "d20",
"diceRollD10": "d10",
"_string": {
"fieldName": "NameOfFieldForString",
"length": 5,
"pool": "abcdefgh"
},
"_sentence": {
"fieldName": "NameOfFiledOfSentences",
"sentence": "5"
},
"fieldGuid": "guid"
}
[/sourcecode]
Fields Configuration: For each of the fields you can either set the field to a particular data type or leave it empty. If the field name and value pair is left empty then the field defaults to guid. The types of data to generate for fields are listed below. These listed are all simple field and data generation types. More complex nested generation types are listed below under Complex Field Configuration below the simple section.
“boolean“: This generates a boolean value of true or false.
“character“: This generates a single character, such as ‘1’, ‘g’ or ‘N’.
“float“: This generates a float value, similar to something like -211920142886.5024.
“integer“: This generates an integer value, similar to something like 1, 14 or 24032.
“d4“: This generates a random integer value based on a dice roll of one four sided dice. The integer range being 1-10.
“d6“: This generates a random integer value based on a dice roll of one six sided dice. The integer range being 1-10.
“d8“: This generates a random integer value based on a dice roll of one eight sided dice. The integer range being 1-10.
“d10“: This generates a random integer value based on a dice roll of one ten sided dice. The integer range being 1-10.
“d12“: This generates a random integer value based on a dice roll of one twelve sided dice. The integer range being 1-10.
“d20“: This generates a random integer value based on a dice roll of one twenty sided dice. The integer range being 1-20.
“d30“: This generates a random integer value based on a dice roll of one thirty sided dice. The integer range being 1-10.
“d100“: This generates a random integer value based on a dice roll of one hundred sided dice. The integer range being 1-10.
“guid“: This generates a random globally unique identifier. This value would be similar to ‘F0D8368D-85E2-54FB-73C4-2D60374295E3’, ‘e0aa6c0d-0af3-485d-b31a-21db00922517’ or ‘1627f683-efeb-4db8-8174-a5f2e3378c87’.
Complex Field Configuration: Some fields require more complex configuration for data generation, simply because the data needs some baseline of what the range or length of the values need to be. The following list details each of these. It is also important to note that these complex field configurations do not have defaults, each value must be set in the JSON configuration or an error will be thrown detailing that a complex field type wasn’t designated. Each of these complex field types is a JSON name and value parameter. The name is the passed in data type with a preceding underscore ‘_’ to generate with the value having the configuration parameters for that particular data type.
“_string“: This generates string data based on a length and pool parameters. Required fields for this include fieldName, length and pool. The JSON would look like this:[sourcecode language=”javascript”]"_string": {
"fieldName": "NameOfFieldForString",
"length": 5,
"pool": "abcdefgh"
}
[/sourcecode]
Samples of the result would look like this for the field; ‘abdef’, ‘hgcde’ or ‘ahdfg’.
“_hash“: This generates a hash based on the length and upper parameters. Required fields for this included fieldName, length and upper. The JSON would look like this:[sourcecode language=”javascript”]"_hash": {
"fieldName": "HashFieldName",
"length": 25,
"casing": ‘upper’
}
[/sourcecode]
Samples of the result would look like this for the field: ‘e5162f27da96ed8e1ae51def1ba643b91d2581d8’ or ‘3F2EB3FB85D88984C1EC4F46A3DBE740B5E0E56E’.
“_name”: This generates a name based on the middle, *middleinitial* and prefix parameters. Required fields for this included fieldName, middle, middle_initial and prefix. The JSON would look like this:[sourcecode language=”javascript”]"_name": {
"fieldName": "nameFieldName",
"middle": true,
"middle_initial": true,
"prefix": true
}
[/sourcecode]
Samples of the result would look like this for the field: ‘Dafi Vatemi’, ‘Nelgatwu Powuku Heup’, ‘Ezme I Iza’, ‘Doctor Suosat Am’, ‘Mrs. Suosat Am’ or ‘Mr. Suosat Am’.
So that covers the kick start of how eventually you’ll be able to setup, use and generate data. Until then, jump into the project and give us a hand.
So I’ve been in more than a few conversations about data structures, various academic conversations and other notions about where and how data should be stored. I’ve been on projects and managed projects that involve teams of people determining how to manage data so that other people can just not manage data. They want to focus on business use and not the data mechanisms underneath. The root of everything around databases really boils down to a single thing – how can we store X and retrieve X – nobody actually trying to get business done or change the world is going to dig into the data storage mechanisms if they don’t have to. To summarize,
nobody actually gives a shit…
At least nobody does until the database breaks, or somebody has to be hired to manage or tune queries or something or some other problem comes up. In the ideal world we could just put data into the ether and have it come back when we ask for it. Unfortunately we have to keep caring for where the data is, how it’s stored, the schema (even in schema-less, you still need to know the schema of the data at some point, it’s just another abstraction to push off dealing with the database), how to backup, recover, data gravity, proximity and a host of other concerns. Wouldn’t it be cool if we could just work on our app or business? Wouldn’t it be nice to just, well, focus on things we actually give a shit about?
Managed Data Systems!
The whole *aaS and PaaS World has been pushing to simplify operations to the point that the primary, if not the only concern, is the business itself. This is a pretty big step in many ways, but holds a lot of hope and promise around fixing the data gravity, proximity, management and related concerns. One provider of services that has an interesting start around the NoSQL realm is Orchestrate.io. I’ll have more about them in the future, as I’ll actually be working on hacking on some code against their platform. They’re currently solving a number of the mentioned issues. Which is great, a solid starting point that takes us past the draconian nature of the old approach to NoSQL and Relational Databases in general.
There has been some others, such as Mongo Labs or such, that have created a sort of DBaaS. This however doesn’t fill the gap that Orchestrate.io is filling. So far almost every *aaS database or other solution has merely been a single type of database that a developer can just throw data at in a single kind of way. Not really flexible, and really only abstracting some manual work, but not providing much additional value add around using the actual data. Orchestrate.io is bridging these together with search, replication and other features to provide a platform on which multiple options are available via the API. Key value, geo, time series and others are all coming together for them nicely. Having all the options actually creates a real value add, versus just provide one single way to do one thing.
Intelligent Data Systems?
After checking out and interviewing Orchestrate.io recently I’ve stumbled into a few other ideas. It would be perfect for them to implement or for the open source community to take a stab at. What would happen if the systems storing the data knew where to put things? What would be the case for providing an intelligent indexing policy or architecture at the schema design decision layer, the area where a person usually must intervene? Could it be done?
A decision tier that scans and makes decisions on the data to revamp the way it is stored against a key value, geo, time series or other method. Could it be done in real time? Would it have to go through some type of processing system? The options around implementing something like this are numerous, but this just leaves a lot of space for providing value add around the data to reduce the complexity of this decision making.
Imagine you have key value data, that needs to be associative based on graph principles, that you must store in a highly available system with pertinent real-time data provided based on those graph relations. A decision layer, to create an intelligent data system, could monitor the data and determine the frequent query paths against the data. If the data is growing old it could move data from real-time to archival via the key value. Other decisions could be made to push up data segments into a cache tier or some other mechanism to provide realtime graph connections to client queries. These are all decisions that would need to be made by somebody working on the data, but could be put into a set of rules to allow for re-allocation of the data via automated mechanisms into better storage options. Why keep old data that isn’t queried in the active in memory graph store, push it to the distributed key store. Why keep the graph data on drive when it can be in memory with correlated keys in a key value in memory store, backed by an on drive key value? All valid decisions, all becoming better understood day by day. It’s about time some of this decision process started to be automated.
What are your thoughts? Pro-intelligent data systems or anti-intelligent data systems? Think it’ll work or is it the wrong approach? Maybe the system should approach some other zenith or axiom point to become truly abstracted and transparent?
You must be logged in to post a comment.