Java Time with Introspective GraphQL on Chaos Database AKA Pre- Refactor Prototype Mutating Database Spring Boot Java Hack App

With the previous work to get a testing environment built and running done (in Python), I was ready to get started on the GraphQL API as previously described. As a refresher, this description,

singular mission to build a GraphQL API against a Mongo database where the idea is, one could query the underlying collections, documents, and fields with the assumption that users would be adding or possibly removing said collections, documents, and fields as they needed.

My intent is to build with with a Java + Spring stack. Just like with the Python app in the previous post, the first thing I like to do is just get the baseline GraphQL API “Hello World” app up and running.

Phase 1: Getting the Initial GraphQL API Compiling & Running with a “Hello World”.

Prerequisites & Setup

  • I’ll be using Java 17 for this work, so to ensure the least risk of versioning issues, get Java 17. The same goes for Spring 3. I’ve shown my selections from the Spring Initializr (not using Intellij? Cool, get a start with the Spring Initializr Site) in the screenshots that follow.

Next I add the configuration and schema. The GraphQL schema goes in the /src/main/java/resources/graphql directory. I named the file specifically schema.graphqls and added the following. NOTE: The file, by convention of GraphQL for Spring is named schema.graphqls.

type Query {
    hello: String
}

Next up I want to add two configuration sections to the application.properties file. First I always, add a section to turn off the white labeling as a precursor to setting up proper error handling but also so that any errors that pop up don’t start off as covered up white label pages. Second is the GraphiQL interface to make it easy to test out the API once it is up and running.

server.error.whitelabel.enabled=false
server.error.include-stacktrace=always
server.error.include-message=always
server.error.include-binding-errors=always

spring.graphql.graphiql.enabled=true
spring.graphql.graphiql.path=graphiql

If you’re familiar with Spring Boot and the convention based approach combined with attribute based annotations, dependency injection, and all of that then a lot of this and the respective *magic* that happens makes sense. For example, the above two file; the schema.graphqls and the application.properties files are used to spin up the respective configuration and other dependency injected elements of the application without any need to codify their inception. It is all happening and connected behind the scenes. As I move through this material I’ll make a point to describe some of those inner workings – but if anything isn’t clear, leave a comment and I’ll elaborate and update the post with further details. In some cases, I may write up a whole additional post just to make sure I’ve covered all the details to understand all that Spring Boot *magic*.

Writing the First Code

The first code that I have in the project is the FruitCollectorApplication class, annotated with @SpringBootApplication as the main method starting point. This won’t need changed and can be left as the starting point of the API. Upon this initial setup that code looks like this.

package com.orchard.fruitcollector;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class FruitcollectorApplication {

    public static void main(String[] args) {
        SpringApplication.run(FruitcollectorApplication.class, args);
    }

}

To get the most basic of things running, next I’ll get a QueryResolver class written up. This class will get annotated with @Controller to designate it will be the core controller of the API, and I’ll add a hello() method annotated with the @QueryMapping attribute. This will map this method to the hello query in the GraphQL Schema.

package com.orchard.fruitcollector;

import org.springframework.graphql.data.method.annotation.QueryMapping;
import org.springframework.stereotype.Controller;

@Controller
public class QueryResolver {

    @QueryMapping
    public String hello() {
        return "Hello World!";
    }
}

With that, I can now execute the API and everything should build, and I can spin up GraphiQL via http://localhost:8080/graphiql.

Phase 2: Writing Up Basic GraphQL Queries for Mongo with Spring

The first query that I want is a query that will return a list of the available collections. It’ll connect to the particular database Mongo and return the collections in that particular database. So no need for any parameters passed in or other criteria, just a clean “up “give me everything” query. The first addition I’ll make before any code, is to add the query type in the GraphQL schema. The full schema for the app now looks like this:

type Query {
    hello: String
    collections: [String]
}

Implementing the query map will then follow this flow. First, add the @QueryMapping annotated method of public List<String> collections() {} and add the appropriate import import java.util.List; for that method. Next add an Iterable<String> object to receive the Mongo Template getCollectionNames(); results.

In the animated video I use the refactoring and code generation options in Intellj to put together the MongoTemplate field, constructor code that builds the field all from the collections method implementation. Intellij is super useful like that. Altogether the additional code in the QueryResolve looks like this, with the hello method included for completeness.

package com.orchard.fruitcollector;

import org.springframework.data.mongodb.core.MongoTemplate;
import org.springframework.graphql.data.method.annotation.QueryMapping;
import org.springframework.stereotype.Controller;

import java.util.ArrayList;
import java.util.List;

@Controller
public class QueryResolver {

private MongoTemplate mongoTemplate;

public QueryResolver(MongoTemplate mongoTemplate) {
this.mongoTemplate = mongoTemplate;
}

@QueryMapping
public String hello() {
return "Hello World!";
}

@QueryMapping
public List<String> collections() {
Iterable<String> mongoCollection = mongoTemplate.getCollectionNames();

List<String> collections = new ArrayList<>();
for (String collectionName : mongoCollection) {
collections.add(collectionName);
}
return collections;
}
}

The last addition, is to add the connection string to the properties file of the project. In the application.properties file add the following line for the Mongo Database. Of course, replace “root” and “examplepass” and “test” with your respective user, password, and database to access in Mongo.

spring.data.mongodb.uri=mongodb://root:examplepass@localhost:27017/test

That is all that is needed for implementation. However, know full well a solid refactor and abstraction of this code into various layers to keep the code manageable, scalable for ongoing implementation, and related criteria needs to be done. This is, an *example* at most that I’m building out to get started with this API. More on refactoring to something that is better ready for deployment as I move on – so keep reading.

With the implementation in place, the collections query and results against the database available via the Fruit and Snakes: Frequent Mutative Mongo User Database with Python looks something like this.

The next query I want is something that will just give the the document results of a collection. No paging just the full results. A simple enough request, the schema addition would look like docsByCollectionName(collectionName: String!): [String] for the GraphQL Schema. The complete schema now looks like this.

type Query {
    hello: String
    collections: [String]
    docsByCollectionName(collectionName: String!): [String]
}

For the Java implementation there is a new element I need to add for this method, a parameter! This is easily done with marking the parameter with an @Argument attribute. The method would then follow the flow that the previous method I wrote did.

@QueryMapping
public List<String> docsByCollectionName(@Argument String collectionName) {
    List<String> documents = new ArrayList<>();

    MongoCollection<Document> collection = mongoTemplate.getCollection(collectionName);
    FindIterable<Document> docs = collection.find();

    for (Document doc : docs) {
        documents.add(doc.toString());
    }

    return documents;
}

In this method, I’ve used the MongoCollection<Document> to store documents, but then still copy that, by toString, over to the documents object to return.

As you can see in the short vid, the query now grabs the document results and you get the raw strings returned for processing via the API call.

Thoughts & Strategy

At this point I’ll add a few more query calls just to provide a few options on how to query and what ways to use Spring to get these queries. However, I want to dive into some of the concepts and issues around querying against a Mongo DB instance like this.

  1. Data Query Execution Time: As you might have noticed in some of the queries already created, they lazy execute. You initiate the “query” but then setup the criteria of the query and then it executes. Because of the convention based approach to Spring for GraphQL and Spring Data for Mongo it isn’t immediately clear when the query is going to be executed. It’s however safe to assume that the query is generally executing upon the beginning of the manipulation of the results. It isn’t always the case, but it is often the case that this initiates the query to get the results. This specific tenant of operation for the particular libraries is important to note for troubleshooting purposes and also reasoning about building criteria and executing the queries within the code itself.
  2. REGEX Efficiency in Mongo: Regular expressions in Mongo are a special and very powerful tool. Regular express / REGEX are very powerful to start with, albeit often considered cryptic and difficult to deal with. However with Mongo, since there isn’t particularly a schema or where clauses to work with for queries, it is often desirable to use something like regex to query against cumbersome – or disparate Mongo Documents. The documents themselves, being they don’t hold to a schema could hold any number of elements and using Regex to query agains the full document itself can be a life saver when one doesn’t know a specific field or element of a document to query for. Implementation details on regex a little later in the post.
  3. One very significant problem with querying a Mongo database as I am doing in this particular scenario is that GraphQL’s features and specification capabilities aren’t really used. The data consumer must know the collection, field, or other part fo the document to query against and its exact spelling. Where as, GraphQL per specification should have objects, nested objects, types, and related elements one would query against defined in the schema. In this particular API however that isn’t available. Ideally the API system would be able to dynamically update the schema but that is cumbersome and shifts functionality to an infrastructure level, breaching separation of concerns and introducing difficulty that often isn’t easily overcome in organizations. i.e. the skills aren’t always available or time available to the team to design around this complexity and build a respective solution.

Alright, just a few thoughts at this point, onward to the last few method/query implementations and I’ll wrap up this post.

Phase 3: The final Query Methods: Search String, Regex, & By Field

Next query is the docsBySearchString query. I’ll add the query to the GraphQL Schema, which gives me a complete schema that looks like this.

type Query {
    hello: String
    collections: [String]
    docsByCollectionName(collectionName: String!): [String]
    docsBySearchString(collectionName: String!, fieldToSearch: String!, searchString: String!): [String]
}

Implementing that back on the Java side would then look like this. This method will now return results based on a string being found in a particular field.

@QueryMapping
public List<String> docsBySearchString(@Argument String searchString, @Argument String collectionName, @Argument String fieldToSearch) {

    MongoCollection<Document> collection = mongoTemplate.getCollection(collectionName);

    Document query = new Document("$or", Arrays.asList(new Document(fieldToSearch, searchString)));

    FindIterable<Document> result = collection.find(query);

    List<String> documents = new ArrayList<>();
    for (Document doc : result) {
        documents.add(doc.toString());
    }

    return documents;
}

Next up is the regular expressions. This is a powerful query as regex can cover a wide range of functionality for the query. The GraphQL Schema with this addition would now look like this.

type Query {
    hello: String
    collections: [String]
    docsByCollectionName(collectionName: String!): [String]
    docsBySearchString(collectionName: String!, fieldToSearch: String!, searchString: String!): [String]
    docsByRegex(collectionName: String!, fieldToSearch: String!, regex: String!): [String]
}

The Java side would look like this.

@QueryMapping
public List<String> docsByRegex(@Argument String regex, @Argument String collectionName, @Argument String fieldToSearch) {

    MongoCollection<Document> collection = mongoTemplate.getCollection(collectionName);

    Document query = new Document("$or", Arrays.asList(regex(fieldToSearch, regex, "i")));

    FindIterable<Document> result = collection.find(query);

    List<String> documents = new ArrayList<>();
    for (Document doc : result) {
        documents.add(doc.toString());
    }

    return documents;
}

Finally, I’ll wrap this up with a query by field. The completed schema now includes all of these queries.

type Query {
    hello: String
    collections: [String]
    docsByCollectionName(collectionName: String!): [String]
    docsBySearchString(collectionName: String!, fieldToSearch: String!, searchString: String!): [String]
    docsByRegex(collectionName: String!, fieldToSearch: String!, regex: String!): [String]
    docsFindByField(collectionName: String!, fieldToFind: String!): [String]
}

Flipping over to the Java side I’ve got this for implementation.

@QueryMapping
public List<String> docsFindByField(@Argument String collectionName, @Argument String fieldToFind) {
    List<String> documents = new ArrayList<>();

    MongoCollection<Document> collection = mongoTemplate.getCollection(collectionName);

    Document query = new Document(fieldToFind, new Document("$exists", true));

    FindIterable<Document> result = collection.find(query);

    for (Document doc : result ) {
        documents.add(doc.toString());
    }

    return documents;
}

Summary

In summary I’ve now got a GraphQL API that queries against this Mongo database. As the database collections change or new collections are added I can easily query against those and get new documents. In addition, in regular Mongo fashion if the document schemas change I can simply query against them in whatever particular way they’ve changed and get results.

Coming Soon!

The next steps in this project will be to clean up, appropriately refactor with abstractions to help move the project to the next stage. For example, refactoring the singular controller class. Possibly adding a service class that handles the database connection and retrieval of things from the database, to make it database independent at the API controller level could be a good refactoring. There are many other things I could do, and many I should do. But for now, this is a functional API of basic capability and I’ll work up the refactor and additional GraphQL API additions in another post.

Top 3 Ways to Make Sausage with MongoDB & Java

I’ve been working with Java a ton this year, more so than the previous years, so I decided to put together the top three Java libraries for accessing MongoDB that I’ve been using. That top 3 list shapes up like this.

  1. MongoDB Java Driver: This is the official Java driver provided by MongoDB. It allows Java applications to connect to MongoDB and work with data. The driver supports synchronous and asynchronous interaction with MongoDB and provides a rich set of features for database operations.
    • Key Methods:
      • MongoClients.create(): To create a new client connection to the database.
      • MongoDatabase.getCollection(): To access a collection from the database.
      • MongoCollection.find(): To find documents within a collection.
  2. Morphia: Morphia is an Object-Document Mapper (ODM) for MongoDB and Java. It provides a higher-level, object-oriented API to interact with MongoDB, and maps Java objects to MongoDB documents.
    • Key Methods:
      • Datastore.createQuery(): To create a query for the type of entity you want to retrieve.
      • Datastore.save(): To save an entity to the database.
      • Query.asList(): To execute a query and get the results as a list.
  3. Spring Data MongoDB: Part of the larger Spring Data project, Spring Data MongoDB provides integration with MongoDB to work with the data as easily as if it were a relational database. It’s a popular choice for Spring-based applications.
    • Key Methods:
      • MongoRepository.findAll(): To find all documents in a collection.
      • MongoRepository.save(): To save a given entity.
      • MongoRepository.findById(): To find a document by its ID.

These libraries offer comprehensive methods for connecting to and working with MongoDB from Java applications. They are widely used in the Java community and are supported by a large number of developers and organizations. Let’s dive deeper with each, and also more specifically talk about some of their respective query methods.

Continue reading “Top 3 Ways to Make Sausage with MongoDB & Java”

Fruit and Snakes: Frequent Mutative Mongo User Database with Python

Recently I had a singular mission to build a GraphQL API against a Mongo database where the idea is, one could query the underlying collections, documents, and fields with the assumption that users would be adding or possibly removing said collections, documents, and fields as they needed.

That sounds somewhat straight forward enough, but before even getting started with the GraphQL API I really needed some type of environment that would mimic this process. That is what this article is about, creating a test bed for this criteria.

The Mongo Database & Environment

First thing I did was setup a new Python environment using virtualenv. I wrote about that a bit in the past if you want to dig into that deeper, the post is available here.

virtualenv fruit_schema_watcher

Next up I created a git repo with git init then added a README.md, LICENSE (MIT), and .gitignore file. The next obvious thing was the need for a Mongo database! I went to cracking on a docker-compose file, which formed up to look like this.

version: '3.1'  
  
services:  
  mongo:  
    image: mongo:latest  
    container_name: mongodb_container  
    ports:  
      - "27017:27017"  
    environment:  
      MONGO_INITDB_ROOT_USERNAME: root  
      MONGO_INITDB_ROOT_PASSWORD: examplepass  
    volumes:  
      - mongo-data:/data/db  
  
volumes:  
  mongo-data:

With that server running, I went ahead and created a database called test manually. I’d just do all the work from here on out with that particular database.

Continue reading “Fruit and Snakes: Frequent Mutative Mongo User Database with Python”

Shortlist of Database as a Service Providers

Some top database providers for various open source databases like MariaDB, PostgreSQL, MongoDB, Apache Cassandra, Redis, Elasticsearch, and Neo4j:

  1. MariaDB:
  2. PostgreSQL:
  3. MongoDB:
  4. Apache Cassandra:
  5. Redis:
  6. Elasticsearch:
  7. Neo4j (Graph Database):

Sorry Database Nerds, Nobody Actually Gives a Shit…

So I’ve been in more than a few conversations about data structures, various academic conversations and other notions about where and how data should be stored. I’ve been on projects and managed projects that involve teams of people determining how to manage data so that other people can just not manage data. They want to focus on business use and not the data mechanisms underneath. The root of everything around databases really boils down to a single thing – how can we store X and retrieve X – nobody actually trying to get business done or change the world is going to dig into the data storage mechanisms if they don’t have to. To summarize,

nobody actually gives a shit…

At least nobody does until the database breaks, or somebody has to be hired to manage or tune queries or something or some other problem comes up. In the ideal world we could just put data into the ether and have it come back when we ask for it. Unfortunately we have to keep caring for where the data is, how it’s stored, the schema (even in schema-less, you still need to know the schema of the data at some point, it’s just another abstraction to push off dealing with the database), how to backup, recover, data gravity, proximity and a host of other concerns. Wouldn’t it be cool if we could just work on our app or business? Wouldn’t it be nice to just, well, focus on things we actually give a shit about?

Managed Data Systems!

The whole *aaS and PaaS World has been pushing to simplify operations to the point that the primary, if not the only concern, is the business itself. This is a pretty big step in many ways, but holds a lot of hope and promise around fixing the data gravity, proximity, management and related concerns. One provider of services that has an interesting start around the NoSQL realm is Orchestrate.io. I’ll have more about them in the future, as I’ll actually be working on hacking on some code against their platform. They’re currently solving a number of the mentioned issues. Which is great, a solid starting point that takes us past the draconian nature of the old approach to NoSQL and Relational Databases in general.

There has been some others, such as Mongo Labs or such, that have created a sort of DBaaS. This however doesn’t fill the gap that Orchestrate.io is filling. So far almost every *aaS database or other solution has merely been a single type of database that a developer can just throw data at in a single kind of way. Not really flexible, and really only abstracting some manual work, but not providing much additional value add around using the actual data. Orchestrate.io is bridging these together with search, replication and other features to provide a platform on which multiple options are available via the API. Key value, geo, time series and others are all coming together for them nicely. Having all the options actually creates a real value add, versus just provide one single way to do one thing.

Intelligent Data Systems?

After checking out and interviewing Orchestrate.io recently I’ve stumbled into a few other ideas. It would be perfect for them to implement or for the open source community to take a stab at.¬†What would happen if the systems storing the data knew where to put things? What would be the case for providing an intelligent indexing policy or architecture at the schema design decision layer, the area where a person usually must intervene? Could it be done?

A decision tier that scans and makes decisions on the data to revamp the way it is stored against a key value, geo, time series or other method. Could it be done in real time? Would it have to go through some type of processing system? The options around implementing something like this are numerous, but this just leaves a lot of space for providing value add around the data to reduce the complexity of this decision making.

Imagine you have key value data, that needs to be associative based on graph principles, that you must store in a highly available system with pertinent real-time data provided based on those graph relations. A decision layer, to create an intelligent data system, could monitor the data and determine the frequent query paths against the data. If the data is growing old it could move data from real-time to archival via the key value. Other decisions could be made to push up data segments into a cache tier or some other mechanism to provide realtime graph connections to client queries. These are all decisions that would need to be made by somebody working on the data, but could be put into a set of rules to allow for re-allocation of the data via automated mechanisms into better storage options. Why keep old data that isn’t queried in the active in memory graph store, push it to the distributed key store. Why keep the graph data on drive when it can be in memory with correlated keys in a key value in memory store, backed by an on drive key value? All valid decisions, all becoming better understood day by day. It’s about time some of this decision process started to be automated.

What are your thoughts? Pro-intelligent data systems or anti-intelligent data systems? Think it’ll work or is it the wrong approach? Maybe the system should approach some other zenith or axiom point to become truly abstracted and transparent?