DataLoader for GraphQL Implementations

A popular library used in GraphQL implementations is called DataLoader, and in many ways the name is somewhat descriptive of its purpose. As described in the JavaScript repo for the Node.js implementation for GraphQL

“DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching.”

The DataLoader solvers the N+1 problem that otherwise requires a resolver to make multiple individual requests to a database (or data source, i.e. another API), resulting in inefficient and slow data retrieval.

A DataLoader serves as a batching and caching layer for combining multiple requests int a single request. Grouping together identical requests and executing them more efficiently, thus minimizing the number of database or API round trips.

DataLoader Operation:

  1. Create a new instance of DataLoader, specifying a batch loading function. This function would define how to load the data for a given set of keys.
  2. The resolver iterates through the collection and instead of fetching the related data adds the keys for the data to be fetched to the DataLoader instance.
  3. The DataLoader collects the keys and for multiple keys, deduplicates the request and executes.
  4. Once the batch is executed DataLoader returns the results associating them with their respective keys.
  5. The resolver can then access the response data and resolve the field or relationships as needed.

DataLoader also caches the results of the previous requests so if the same key is requested again DataLoader retrieves from cache instead of making another request. This caching further improves performance and reduces redundant fetching.

DataLoader Implementation Examples

JavaScript & Node.js

The following is a basic implementation using Apollo Server of DataLoader for GraphQL.

const { ApolloServer, gql } = require("apollo-server");
const { DataLoader } = require("dataloader");

// Simulated data source
const db = {
  users: [
    { id: 1, name: "John" },
    { id: 2, name: "Jane" },
  ],
  posts: [
    { id: 1, userId: 1, title: "Post 1" },
    { id: 2, userId: 2, title: "Post 2" },
    { id: 3, userId: 1, title: "Post 3" },
  ],
};

// Simulated asynchronous data loader function
const batchPostsByUserIds = async (userIds) => {
  console.log("Fetching posts for user ids:", userIds);
  const posts = db.posts.filter((post) => userIds.includes(post.userId));
  return userIds.map((userId) => posts.filter((post) => post.userId === userId));
};

// Create a DataLoader instance
const postsLoader = new DataLoader(batchPostsByUserIds);

const resolvers = {
  Query: {
    getUserById: (_, { id }) => {
      return db.users.find((user) => user.id === id);
    },
  },
  User: {
    posts: (user) => {
      // Use DataLoader to load posts for the user
      return postsLoader.load(user.id);
    },
  },
};

// Define the GraphQL schema
const typeDefs = gql`
  type User {
    id: ID!
    name: String!
    posts: [Post]
  }

  type Post {
    id: ID!
    title: String!
  }

  type Query {
    getUserById(id: ID!): User
  }
`;

// Create Apollo Server instance
const server = new ApolloServer({ typeDefs, resolvers });

// Start the server
server.listen().then(({ url }) => {
  console.log(`Server running at ${url}`);
});

This example I created a DataLoader instance postsLoader using the DataLoader class from the dataloader package. I define a batch loading function batchPostsByUserIds that takes an array of user IDs and retrieves the corresponding posts for each user from the db.posts array. The function returns an array of arrays, where each sub-array contains the posts for a specific user.

In the User resolver I user the load method of DataLoader to load the posts for a user. The load method handles batching and caching behind the scenes, ensuring that redundant requests are minimized and results are cached for subsequent requests.

When the GraphQL server receives a query for the posts field of a User the DataLoader automatically batches the requests for multiple users and executes the batch loading function to retrieve the posts.

This example demonstrates a very basic implementation of DataLoader in a GraphQL server. In a real-world scenario there would of course be a number of additional capabilities and implementation details that you’d need to work on for your particular situation.

Spring Boot Java Implementation

Just furthering the kinds of examples, the following is a Spring Boot example.

First add the dependencies.

<dependencies>
  <!-- GraphQL for Spring Boot -->
  <dependency>
    <groupId>com.graphql-java</groupId>
    <artifactId>graphql-spring-boot-starter</artifactId>
    <version>5.0.2</version>
  </dependency>
  
  <!-- DataLoader -->
  <dependency>
    <groupId>org.dataloader</groupId>
    <artifactId>dataloader</artifactId>
    <version>3.4.0</version>
  </dependency>
</dependencies>

Next create the components and configure DataLoader.

import com.graphql.spring.boot.context.GraphQLContext;
import graphql.servlet.context.DefaultGraphQLServletContext;
import org.dataloader.BatchLoader;
import org.dataloader.DataLoader;
import org.dataloader.DataLoaderRegistry;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.web.context.request.WebRequest;

import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;
import java.util.stream.Collectors;

@SpringBootApplication
public class DataLoaderExampleApplication {

  // Simulated data source
  private static class Db {
    List<User> users = List.of(
        new User(1, "John"),
        new User(2, "Jane")
    );

    List<Post> posts = List.of(
        new Post(1, 1, "Post 1"),
        new Post(2, 2, "Post 2"),
        new Post(3, 1, "Post 3")
    );
  }

  // User class
  private static class User {
    private final int id;
    private final String name;

    User(int id, String name) {
      this.id = id;
      this.name = name;
    }

    int getId() {
      return id;
    }

    String getName() {
      return name;
    }
  }

  // Post class
  private static class Post {
    private final int id;
    private final int userId;
    private final String title;

    Post(int id, int userId, String title) {
      this.id = id;
      this.userId = userId;
      this.title = title;
    }

    int getId() {
      return id;
    }

    int getUserId() {
      return userId;
    }

    String getTitle() {
      return title;
    }
  }

  // DataLoader batch loading function
  private static class BatchPostsByUserIds implements BatchLoader<Integer, List<Post>> {
    private final Db db;

    BatchPostsByUserIds(Db db) {
      this.db = db;
    }

    @Override
    public CompletionStage<List<List<Post>>> load(List<Integer> userIds) {
      System.out.println("Fetching posts for user ids: " + userIds);
      List<List<Post>> result = userIds.stream()
          .map(userId -> db.posts.stream()
              .filter(post -> post.getUserId() == userId)
              .collect(Collectors.toList()))
          .collect(Collectors.toList());
      return CompletableFuture.completedFuture(result);
    }
  }

  // GraphQL resolver
  private static class UserResolver implements GraphQLResolver<User> {
    private final DataLoader<Integer, List<Post>> postsDataLoader;

    UserResolver(DataLoader<Integer, List<Post>> postsDataLoader) {
      this.postsDataLoader = postsDataLoader;
    }

    List<Post> getPosts(User user) {
      return postsDataLoader.load(user.getId()).join();
    }
  }

  // GraphQL configuration
  @Bean
  public GraphQLSchemaProvider graphQLSchemaProvider() {
    return (graphQLSchemaBuilder, environment) -> {
      // Define the GraphQL schema
      GraphQLObjectType userObjectType = GraphQLObjectType.newObject()
          .name("User")
          .field(field -> field.name("id").type(Scalars.GraphQLInt))
          .field(field -> field.name("name").type(Scalars.GraphQLString))
          .field(field -> field.name("posts").type(new GraphQLList(postObjectType)))
          .build();

      GraphQLObjectType postObjectType = GraphQLObjectType.newObject()
          .name("Post")
          .field(field -> field.name("id").type(Scalars.GraphQLInt))
          .field(field -> field.name("title").type(Scalars.GraphQLString))
          .build();

      GraphQLObjectType queryObjectType = GraphQLObjectType.newObject()
          .name("Query")
          .field(field -> field.name("getUserById")
              .type(userObjectType)
              .argument(arg -> arg.name("id").type(Scalars.GraphQLInt))
              .dataFetcher(environment -> {
                // Retrieve the requested user ID
                int userId = environment.getArgument("id");
                // Fetch the user by ID from the data source
                Db db = new Db();
                return db.users.stream()
                    .filter(user -> user.getId() == userId)
                    .findFirst()
                    .orElse(null);
              }))
          .build();

      return graphQLSchemaBuilder.query(queryObjectType).build();
    };
  }

  // DataLoader registry bean
  @Bean
  public DataLoaderRegistry dataLoaderRegistry() {
    DataLoaderRegistry dataLoaderRegistry = new DataLoaderRegistry();
    Db db = new Db();
    dataLoaderRegistry.register("postsDataLoader", DataLoader.newDataLoader(new BatchPostsByUserIds(db)));
    return dataLoaderRegistry;
  }

  // GraphQL context builder
  @Bean
  public GraphQLContext.Builder graphQLContextBuilder(DataLoaderRegistry dataLoaderRegistry) {
    return new GraphQLContext.Builder().dataLoaderRegistry(dataLoaderRegistry);
  }

  public static void main(String[] args) {
    SpringApplication.run(DataLoaderExampleApplication.class, args);
  }
}

This example I define the Db class as a simulated data source with users and posts lists. I create a BatchPostsByUserIds class that implements the BatchLoader interface from DataLoader for batch loading of posts based on user IDs.

The UserResolver class is a GraphQL resolver that uses the postsDataLoader to load posts for a specific user.

For the configuration I define the schema using GraphQLSchemaProvider and create GraphQLObjectType for User and Post, and Query object type with a resolver for the getUserById field.

The dataLoaderRegistry bean registers the postsDataLoader with the DataLoader registry.

This implementation will efficiently batch and cache requests for loading posts based on user IDs.

References

Other GraphQL Standards, Practices, Patterns, & Related Posts

Single Responsibility Principle for GraphQL API Resolvers

The Single Responsibility Principle (SRP) states that a class or module should have only one reason to change. It emphasizes the importance of keeping modules or components focused on a single task, reducing their complexity, and increasing maintainability.

SRP defined.

In GraphQL API development the importance, and need, of maintaining code quality and scalability is of utmost importance. A powerful principle that can help achieve these goals when developing your API’s resolvers is the Single Responsibility Principle (SRP). I’m not always a die hard when it comes to SRP – there are always situations that may need to skip out on some of the concept – but in general it helps tremendously over time.

By adhering the SRP coders can more easily avoid the pitfalls of large monolithic resolvers that end up doing spurious processing outside of their intended scope. Let’s explore some of the SRP and I’ll provide three practice examples of how to implement some simple SRP use with GraphQL.

When applying SRP in GraphQL the aim is to ensure that each resolver handles a specific data type or field, thereby avoiding scope bloat and convoluted resolvers that handle unrelated responsibilities.

  1. User Resolvers:
    • Imagine a scenario where a GraphQL schema includes a User type with fields like id, name, email, and posts. Instead of writing a single resolver for the User type that fetches and processes all of the data we can adopt SRP by creating separate resolvers for each field. For instance we would have resolvers.getUserById to fetch user details, resolvers.getUserName to retrieve the respective user’s name, and a resolvers.getUserPosts to fetch the user’s posts. In doing so we keep each resolver focused on a specific field and in turn keep the codebase simplified.
  2. Product Resolvers:
    • Another example might be a product object within an e-commerce application. It would contain fields like is, name, price, and reviews. With SRP we’d have resolvers for resolvers.getProductById, resolvers.getProductName, resolvers.getProductPrice, and resolvers.getProductReviews. The naming, by use of SRP, can be utilized to describe what each of these functions do and what one can expect in response. This again, makes the codebase dramatically easier to maintain over time.
  3. Comment Resolvers:
    • Last example, imagine a blog, with a comment type consisting of id, content, and author. This would break out to resolvers.getCommentContent, resolvers.getCommentAuthor, and resolvers.getCommentById. This adheres to SRP and keeps things simple, just like the previous two examples.

Prerequisite: The examples below assume the apollo-server and graphql are installed and available.

User Resolvers Example

A more thorough working example of the user resolvers described above would look something like this. I’ve included the data in a variable to act as the database, but the inferred premise there would be an underlying database should be obvious.

// Assuming you have a database or data source that provides user information
const db = {
  users: [
    { id: 1, name: "John Doe", email: "john@example.com", posts: [1, 2] },
    { id: 2, name: "Jane Smith", email: "jane@example.com", posts: [3] },
  ],
  posts: [
    { id: 1, title: "GraphQL Basics", content: "Introduction to GraphQL" },
    { id: 2, title: "GraphQL Advanced", content: "Advanced GraphQL techniques" },
    { id: 3, title: "GraphQL Best Practices", content: "Tips for GraphQL development" },
  ],
};

const resolvers = {
  Query: {
    getUserById: (_, { id }) => {
      return db.users.find((user) => user.id === id);
    },
  },
  User: {
    name: (user) => {
      return user.name;
    },
    posts: (user) => {
      return db.posts.filter((post) => user.posts.includes(post.id));
    },
  },
};

// Assuming you have a GraphQL server setup with Apollo Server
const { ApolloServer, gql } = require("apollo-server");

const typeDefs = gql`
  type User {
    id: ID!
    name: String!
    email: String!
    posts: [Post!]!
  }

  type Post {
    id: ID!
    title: String!
    content: String!
  }

  type Query {
    getUserById(id: ID!): User
  }
`;

const server = new ApolloServer({ typeDefs, resolvers });

server.listen().then(({ url }) => {
  console.log(`Server running at ${url}`);
});

In this code we have the db object acting as our database that we’ll interact with. Then the resolvers and the GraphQL schema is included inline to show the relationship of the data and how it would love per GraphQL. For this example I’m also, for simplicity, using Apollo server to build this example.

Product Resolvers Example

I’ve also included an example of the product resolvers. Very similar, but has some minor nuance to show how it would coded up. For this example however, to draw more context I’ve added an authors table/entity, and respective fields for authors, as per related to reviews.

// Assuming you have a database or data source that provides product, review, and author information
const db = {
  products: [
    { id: 1, name: "Product 1", price: 19.99, reviews: [1, 2] },
    { id: 2, name: "Product 2", price: 29.99, reviews: [3] },
  ],
  reviews: [
    { id: 1, rating: 4, comment: "Great product!", authorId: 1 },
    { id: 2, rating: 5, comment: "Excellent quality!", authorId: 2 },
    { id: 3, rating: 3, comment: "Average product.", authorId: 1 },
  ],
  authors: [
    { id: 1, name: "John Doe", karmaPoints: 100, details: "Product enthusiast" },
    { id: 2, name: "Jane Smith", karmaPoints: 150, details: "Tech lover" },
  ],
};

const resolvers = {
  Query: {
    getProductById: (_, { id }) => {
      return db.products.find((product) => product.id === id);
    },
  },
  Product: {
    name: (product) => {
      return product.name;
    },
    price: (product) => {
      return product.price;
    },
    reviews: (product) => {
      return db.reviews.filter((review) => product.reviews.includes(review.id));
    },
  },
  Review: {
    author: (review) => {
      return db.authors.find((author) => author.id === review.authorId);
    },
  },
};

// Assuming you have a GraphQL server setup with Apollo Server
const { ApolloServer, gql } = require("apollo-server");

const typeDefs = gql`
  type Product {
    id: ID!
    name: String!
    price: Float!
    reviews: [Review!]!
  }

  type Review {
    id: ID!
    rating: Int!
    comment: String!
    author: Author!
  }

  type Author {
    id: ID!
    name: String!
    karmaPoints: Int!
    details: String!
  }

  type Query {
    getProductById(id: ID!): Product
  }
`;

const server = new ApolloServer({ typeDefs, resolvers });

server.listen().then(({ url }) => {
  console.log(`Server running at ${url}`);
});

Comment Resolvers Example

Starting right off with this example, here’s what the code would look like.

// Assuming you have a database or data source that provides comment and author information
const db = {
  comments: [
    { id: 1, content: "Great post!", authorId: 1 },
    { id: 2, content: "Nice article!", authorId: 2 },
  ],
  authors: [
    { id: 1, name: "John Doe", karmaPoints: 100, details: "Product enthusiast" },
    { id: 2, name: "Jane Smith", karmaPoints: 150, details: "Tech lover" },
  ],
};

const resolvers = {
  Query: {
    getCommentById: (_, { id }) => {
      return db.comments.find((comment) => comment.id === id);
    },
  },
  Comment: {
    content: (comment) => {
      return comment.content;
    },
    author: (comment) => {
      return db.authors.find((author) => author.id === comment.authorId);
    },
  },
};

// Assuming you have a GraphQL server setup with Apollo Server
const { ApolloServer, gql } = require("apollo-server");

const typeDefs = gql`
  type Comment {
    id: ID!
    content: String!
    author: Author!
  }

  type Author {
    id: ID!
    name: String!
    karmaPoints: Int!
    details: String!
  }

  type Query {
    getCommentById(id: ID!): Comment
  }
`;

const server = new ApolloServer({ typeDefs, resolvers });

server.listen().then(({ url }) => {
  console.log(`Server running at ${url}`);
});

The concept of the Single Responsibility Principle (SRP) and provided code examples using JavaScript demonstrate its application in GraphQL resolvers. The SRP advocates for keeping code modules focused on a specific data type or field, avoiding large and monolithic resolvers that handle multiple unrelated responsibilities. By adhering to the SRP, software developers can build better software that is modular, maintainable, and easier to understand. By dividing functionality into smaller, well-defined units, developers can enhance code reusability, improve testability, and promote better collaboration among team members. Embracing the SRP helps create codebases that are more scalable, extensible, and adaptable to changing requirements, ultimately leading to higher-quality software solutions.

Other GraphQL Standards, Practices, Patterns, & Related Posts

The Fastest Way to Build a Quick Starter App with Express.js

Over the years I’ve used Express.js many times as a quick getting started example app. Since I often reference it I wanted to provide a short post that shows exactly what I do 99.9% of the time to start one of these quick Express.js reference apps. I’ve detailed in this post how to get started with Express.js the fastest way I know. There is one prerequisite, I’m assuming in this post you’ve already got Node.js installed. With that in mind, check out my installation suggestions for Node.js if you need to get that installed still. The other thing, is you’ll need to have git installed. On MacOS and Linux git is most likely installed already, if you’re on Windows I’ll leave that googling exercise up to you.

Create a directory and navigate into the directory.

mkdir quick-start-express
cd quick-start-express

Now in that directory execute the following command. Note, this command is available as of node.js 8.2.0.

npx express-generator
npm install

Inside that directory that you’ve navigated to, you’ll now have an Express.js skeleton app setup to run with the dependencies now downloaded with npm install. On MacOS or Linux run the following command to start the web app.

DEBUG=quick-start-express:* npm start

If you’re on Windows run the following command.

set DEBUG=quick-start-express:* & npm start

That’s it, one of the quickest ways to get a Node.js site up and running to start developing against!

If you’d like to dig in a bit deeper, here’s a great follow up post on creating APIs with Express. Give it a read, it’ll give you some great next steps to try out!

Cheers, and happy thrashing code!

Cassandra Datacenter & Racks

The last post in this series is Distributed Database Things to Know: Consistent Hashing.

Let’s talk about the analogy of Apache Cassandra Datacenter & Racks to actual datacenter and racks. I kind of enjoy the use of the terms datacenter and racks to describe architectural elements of Cassandra. However, as time moves on the relationship between these terms and why they’re called datacenter and racks can be obfuscated.

Take for instance, a datacenter could just be a cloud provider, an actual physical datacenter location, a zone in Azure, or region in some other provider. What an actual Datacenter in Cassandra parlance actually is can vary, but the origins of why it’s called a Datacenter remains the same. The elements of racks also can vary, but also remain the same.

Origins: Racks & Datacenters?

Let’s cover the actual things in this industry we call datacenter and racks first, unrelated to Apache Cassandra terms.

Racks: The easiest way to describe a physical rack is to show pictures of datacenter racks via the ole’ Google images.

racks.png

A rack is something that is located in a data-center, or even just someone’s garage in some odd scenarios. Ya know, if somebody wants serious hardware to work with. The rack then has a number of servers, often various kinds, within that rack itself. As you can see from the images above there’s a wide range of these racks.

Datacenter: Again the easiest way to describe a datacenter is to just look at a bunch of pictures of datacenter, albeit you see lots of racks again. But really, that’s what a datacenter is, is a building that has lots and lots of racks.

data-center.png

However in Apache Cassandra (and respectively DataStax Enterprise products) a datacenter and rack do not directly correlate to a physical rack or datacenter. The idea is more of an abstraction than hard mapping to the physical realm. In turn it is better to think of datacenter and racks as a way to structure and organize your DataStax Enterprise or Apache Cassandra architecture. From a tree perspective of organizing your cluster, think of things in this hierarchy.

  • Cluster
    • Datacenter(s)
      • Rack(s)
        • Server(s)
          • Node (vnode)

Apache Cassandra Datacenter

An Apache Cassandra Datacenter is a group of nodes, related and configured within a cluster for replication purposes. Setting up a specific set of related nodes into a datacenter helps to reduce latency, prevent transactions from impact by other workloads, and related effects. The replication factor can also be setup to write to multiple datacenter, providing additional flexibility in architectural design and organization. One specific element of datacenter to note is that they must contain only one node type:

Depending on the replication factor, data can be written to multiple datacenters. Datacenters must never span physical locations.Each datacenter usually contains only one node type. The node types are:

  • Transactional: Previously referred to as a Cassandra node.
  • DSE Graph: A graph database for managing, analyzing, and searching highly-connected data.
  • DSE Analytics: Integration with Apache Spark.
  • DSE Search: Integration with Apache Solr. Previously referred to as a Solr node.
  • DSE SearchAnalytics: DSE Search queries within DSE Analytics jobs.

Apache Cassandra Racks

An Apache Cassandra Rack is a grouped set of servers. The architecture of Cassandra uses racks so that no replica is stored redundantly inside a singular rack, ensuring that replicas are spread around through different racks in case one rack goes down. Within a datacenter there could be multiple racks with multiple servers, as the hierarchy shown above would dictate.

To determine where data goes within a rack or sets of racks Apache Cassandra uses what is referred to as a snitch. A snitch determines which racks and datacenter a particular node belongs to, and by respect of that, determines where the replicas of data will end up. This replication strategy which is informed by the snitch can take the form of numerous kinds of snitches, some examples include;

  • SimpleSnitch – this snitch treats order as proximity. This is primarily only used when in a single-datacenter deployment.
  • Dynamic Snitching – the dynamic snitch monitors read latencies to avoid reading from hosts that have slowed down.
  • RackInferringSnitch – Proximity is determined by rack and datacenter, assumed corresponding to 3rd and 2nd octet of each node’s IP address. This particular snitch is often used as an example for writing a custom snitch class since it isn’t particularly useful unless it happens to match one’s deployment conventions.

In the future I’ll outline a few more snitches, how some of them work with more specific detail, and I’ll get into a whole selection of other topics. Be sure to subscribe to the blog, the ole’ RSS feed works great too, and follow @CompositeCode for blog updates. For discourse and hot takes follow me @Adron.

Distributed Database Things to Know Series

  1. Consistent Hashing
  2. Apache Cassandra Datacenter & Racks (this post)

 

The Method I Use Setting Up a Dev Machine for Node.js

UPDATED: April 4th, 2019 and again on March 17th, 2022

It seems every few months setup of whatever tech stack is always tweaked a bit. This is a collection of information I used to setup my box recently. First off, for the development box I always use nvm as it is routine to need a different version of Node.js installed for various repositories and such. The best article I’ve found that is super up to date for Ubuntu 18.04 is Digital Ocean’s article (kind of typical for them to have the best article, as their blog is exceptionally good). In it the specific installation of nvm I’ve noticed has changed since I last worked with it some many months ago.

Continue reading “The Method I Use Setting Up a Dev Machine for Node.js”