Java Time with Introspective GraphQL on Chaos Database AKA Pre- Refactor Prototype Mutating Database Spring Boot Java Hack App

With the previous work to get a testing environment built and running done (in Python), I was ready to get started on the GraphQL API as previously described. As a refresher, this description,

singular mission to build a GraphQL API against a Mongo database where the idea is, one could query the underlying collections, documents, and fields with the assumption that users would be adding or possibly removing said collections, documents, and fields as they needed.

My intent is to build with with a Java + Spring stack. Just like with the Python app in the previous post, the first thing I like to do is just get the baseline GraphQL API “Hello World” app up and running.

At the end of this post I’ll include/link the Github repository.

Phase 1: Getting the Initial GraphQL API Compiling & Running with a “Hello World”.

Prerequisites & Setup

  • The post previous to this “Fruit and Snakes: Frequent Mutative Mongo User Database with Python” I created the Mongo Database and setup the app that would create, every few seconds, new collections, documents, and other collateral to put into a Mongo database for the sole purpose of creating this GraphQL API.
  • I’ll be using Java 17 for this work, so to ensure the least risk of versioning issues, get Java 17. The same goes for Spring 3. I’ve shown my selections from the Spring Initializr (not using Intellij? Cool, get a start with the Spring Initializr Site) in the screenshots that follow.
Continue reading “Java Time with Introspective GraphQL on Chaos Database AKA Pre- Refactor Prototype Mutating Database Spring Boot Java Hack App”

Top 3 Ways to Make Sausage with MongoDB & Java

I’ve been working with Java a ton this year, more so than the previous years, so I decided to put together the top three Java libraries for accessing MongoDB that I’ve been using. That top 3 list shapes up like this.

  1. MongoDB Java Driver: This is the official Java driver provided by MongoDB. It allows Java applications to connect to MongoDB and work with data. The driver supports synchronous and asynchronous interaction with MongoDB and provides a rich set of features for database operations.
    • Key Methods:
      • MongoClients.create(): To create a new client connection to the database.
      • MongoDatabase.getCollection(): To access a collection from the database.
      • MongoCollection.find(): To find documents within a collection.
  2. Morphia: Morphia is an Object-Document Mapper (ODM) for MongoDB and Java. It provides a higher-level, object-oriented API to interact with MongoDB, and maps Java objects to MongoDB documents.
    • Key Methods:
      • Datastore.createQuery(): To create a query for the type of entity you want to retrieve.
      • Datastore.save(): To save an entity to the database.
      • Query.asList(): To execute a query and get the results as a list.
  3. Spring Data MongoDB: Part of the larger Spring Data project, Spring Data MongoDB provides integration with MongoDB to work with the data as easily as if it were a relational database. It’s a popular choice for Spring-based applications.
    • Key Methods:
      • MongoRepository.findAll(): To find all documents in a collection.
      • MongoRepository.save(): To save a given entity.
      • MongoRepository.findById(): To find a document by its ID.

These libraries offer comprehensive methods for connecting to and working with MongoDB from Java applications. They are widely used in the Java community and are supported by a large number of developers and organizations. Let’s dive deeper with each, and also more specifically talk about some of their respective query methods.

Continue reading “Top 3 Ways to Make Sausage with MongoDB & Java”

Fruit and Snakes: Frequent Mutative Mongo User Database with Python

The end results of this code base are used in another post I made, titled “Java Time with Introspective GraphQL on Chaos Database AKA Pre- Refactor Prototype Mutating Database Spring Boot Java Hack App“.

Recently I had a singular mission to build a GraphQL API against a Mongo database where the idea is, one could query the underlying collections, documents, and fields with the assumption that users would be adding or possibly removing said collections, documents, and fields as they needed.

That sounds somewhat straight forward enough, but before even getting started with the GraphQL API I really needed some type of environment that would mimic this process. That is what this article is about, creating a test bed for this criteria.

The Mongo Database & Environment

First thing I did was setup a new Python environment using virtualenv. I wrote about that a bit in the past if you want to dig into that deeper, the post is available here.

virtualenv fruit_schema_watcher

Next up I created a git repo with git init then added a README.md, LICENSE (MIT), and .gitignore file. The next obvious thing was the need for a Mongo database! I went to cracking on a docker-compose file, which formed up to look like this.

version: '3.1'  
  
services:  
  mongo:  
    image: mongo:latest  
    container_name: mongodb_container  
    ports:  
      - "27017:27017"  
    environment:  
      MONGO_INITDB_ROOT_USERNAME: root  
      MONGO_INITDB_ROOT_PASSWORD: examplepass  
    volumes:  
      - mongo-data:/data/db  
  
volumes:  
  mongo-data:

With that server running, I went ahead and created a database called test manually. I’d just do all the work from here on out with that particular database.

Continue reading “Fruit and Snakes: Frequent Mutative Mongo User Database with Python”

Shortlist of Database as a Service Providers

Some top database providers for various open source databases like MariaDB, PostgreSQL, MongoDB, Apache Cassandra, Redis, Elasticsearch, and Neo4j:

  1. MariaDB:
  2. PostgreSQL:
  3. MongoDB:
  4. Apache Cassandra:
  5. Redis:
  6. Elasticsearch:
  7. Neo4j (Graph Database):

Sorry Database Nerds, Nobody Actually Gives a Shit…

So I’ve been in more than a few conversations about data structures, various academic conversations and other notions about where and how data should be stored. I’ve been on projects and managed projects that involve teams of people determining how to manage data so that other people can just not manage data. They want to focus on business use and not the data mechanisms underneath. The root of everything around databases really boils down to a single thing – how can we store X and retrieve X – nobody actually trying to get business done or change the world is going to dig into the data storage mechanisms if they don’t have to. To summarize,

nobody actually gives a shit…

At least nobody does until the database breaks, or somebody has to be hired to manage or tune queries or something or some other problem comes up. In the ideal world we could just put data into the ether and have it come back when we ask for it. Unfortunately we have to keep caring for where the data is, how it’s stored, the schema (even in schema-less, you still need to know the schema of the data at some point, it’s just another abstraction to push off dealing with the database), how to backup, recover, data gravity, proximity and a host of other concerns. Wouldn’t it be cool if we could just work on our app or business? Wouldn’t it be nice to just, well, focus on things we actually give a shit about?

Managed Data Systems!

The whole *aaS and PaaS World has been pushing to simplify operations to the point that the primary, if not the only concern, is the business itself. This is a pretty big step in many ways, but holds a lot of hope and promise around fixing the data gravity, proximity, management and related concerns. One provider of services that has an interesting start around the NoSQL realm is Orchestrate.io. I’ll have more about them in the future, as I’ll actually be working on hacking on some code against their platform. They’re currently solving a number of the mentioned issues. Which is great, a solid starting point that takes us past the draconian nature of the old approach to NoSQL and Relational Databases in general.

There has been some others, such as Mongo Labs or such, that have created a sort of DBaaS. This however doesn’t fill the gap that Orchestrate.io is filling. So far almost every *aaS database or other solution has merely been a single type of database that a developer can just throw data at in a single kind of way. Not really flexible, and really only abstracting some manual work, but not providing much additional value add around using the actual data. Orchestrate.io is bridging these together with search, replication and other features to provide a platform on which multiple options are available via the API. Key value, geo, time series and others are all coming together for them nicely. Having all the options actually creates a real value add, versus just provide one single way to do one thing.

Intelligent Data Systems?

After checking out and interviewing Orchestrate.io recently I’ve stumbled into a few other ideas. It would be perfect for them to implement or for the open source community to take a stab at. What would happen if the systems storing the data knew where to put things? What would be the case for providing an intelligent indexing policy or architecture at the schema design decision layer, the area where a person usually must intervene? Could it be done?

A decision tier that scans and makes decisions on the data to revamp the way it is stored against a key value, geo, time series or other method. Could it be done in real time? Would it have to go through some type of processing system? The options around implementing something like this are numerous, but this just leaves a lot of space for providing value add around the data to reduce the complexity of this decision making.

Imagine you have key value data, that needs to be associative based on graph principles, that you must store in a highly available system with pertinent real-time data provided based on those graph relations. A decision layer, to create an intelligent data system, could monitor the data and determine the frequent query paths against the data. If the data is growing old it could move data from real-time to archival via the key value. Other decisions could be made to push up data segments into a cache tier or some other mechanism to provide realtime graph connections to client queries. These are all decisions that would need to be made by somebody working on the data, but could be put into a set of rules to allow for re-allocation of the data via automated mechanisms into better storage options. Why keep old data that isn’t queried in the active in memory graph store, push it to the distributed key store. Why keep the graph data on drive when it can be in memory with correlated keys in a key value in memory store, backed by an on drive key value? All valid decisions, all becoming better understood day by day. It’s about time some of this decision process started to be automated.

What are your thoughts? Pro-intelligent data systems or anti-intelligent data systems? Think it’ll work or is it the wrong approach? Maybe the system should approach some other zenith or axiom point to become truly abstracted and transparent?