GraphQL Nested Queries, Relationships, and Different Data Sources Practices

When building a GraphQL API with nested queries and relationships – specifically when you’re using a relational database – it’s important to follow best practices to attain efficient and performant data retrieval while preventing overly nested queries. From the GraphQL API perspective, here are some practices to follow:

  1. Use GraphQL Fragments: Fragments allow you to define reusable sets of fields that can be included in multiple queries. This helps avoid duplicating nested fields and keeps queries concise and readable.
  2. Resolve Nested Data Efficiently: Use efficient data fetching techniques to resolve nested data. Techniques like batch loading and data loaders can help avoid the N+1 query problem, where multiple database queries are triggered for each item in a list.
  3. Limit Depth of Nested Queries: Consider setting a maximum allowed depth for nested queries. In some tools this can be set via configuration, and in most language stacks the libraries focused on GraphQL also support various features and capabilities to get this limitation in place. This helps prevent clients from making excessively deep queries that can lead to performance issues that would, for example, incur a 4, 5 or more tables in a single query into the database!
  4. Pagination: For lists of data, use pagination to limit the amount of data returned in a single query. This prevents queries from becoming overly large and ensures efficient data retrieval.
  5. Use Aliases: Aliases allow clients to request the same field multiple times with different arguments. This can help reduce nesting by fetching data for related entities in a single query.
  6. Avoid Deep Nesting: Strive to keep your GraphQL queries shallow and avoid excessive nesting. If a query becomes too nested, it may be an indication that the schema design needs improvement.
  7. Encourage Specific Queries: Instead of relying solely on generic queries, encourage clients to use specific queries tailored to their needs. This can prevent unnecessary data retrieval and reduce the chance of overly nested queries.
  8. Provide Field Arguments: Offer field arguments to allow clients to customize the shape of the data they retrieve. This way, clients can request only the data they need, reducing the risk of getting overly nested responses.
  9. Use @defer and @stream: GraphQL supports deferred and streamed responses. By using these features, you can provide more fine-grained control over data retrieval and prevent unnecessary waiting for nested data.
  10. Educate API Consumers: If you are building a public API, provide clear documentation and examples on how to use the API efficiently. Educate API consumers on best practices for querying data and avoiding overly nested queries.
  11. Performance Testing: Conduct performance testing on your GraphQL API to identify potential bottlenecks and areas of improvement. This can help you optimize the data fetching process and avoid performance issues due to nested queries.

By following these practices, you can ensure that your GraphQL API provides a smooth and efficient experience for clients, while also preventing the negative impact of overly nested queries on server performance.

But what about situations you’re building a GraphQL API that isn’t going to be built on a relational database? Well you’re in luck, because I’ve done this more than once and I’ve got a few patterns you can use to help ensure your services stay up to snuff.

Apache Cassandra & Mongo DB

When you’re using databases like Apache Cassandra (a wide-column store) or MongoDB (a document-oriented database), there are some additional concerns related to nested queries and data modeling that should be taken into account. For example, Mongo can have nesting in the document itself – and it could go deep – while the document could hold significant nesting, depending on how data is stored and modeled in the underlying BSON (Binary JSON). This can add complexities and the data being queried needs to be understood to realize the implications of querying from something like GraphQL.

  1. Data Modeling for Query Support: Unlike relational databases, Cassandra and MongoDB do not support complex JOIN operations, making it essential to design the data model to support the required queries efficiently. This may involve denormalizing data and duplicating information to facilitate query patterns.
  2. No Transactions: Both Cassandra and MongoDB are NoSQL databases and do not support full ACID transactions across multiple documents or rows. As a result, handling complex nested queries across multiple entities may require careful consideration of eventual consistency and data integrity.
  3. Data Duplication for Performance: To optimize queries, you may need to denormalize and duplicate data, leading to increased storage requirements. Balancing query performance with storage efficiency becomes crucial in such cases.
  4. Aggregation Pipeline (MongoDB): When using MongoDB, the Aggregation Pipeline can be powerful for handling complex data processing and nested queries. Understanding and leveraging the aggregation framework effectively can be essential for optimal performance.
  5. Limitations on Nested Arrays: While both databases support nested data structures (arrays or maps), deeply nested arrays can become challenging to query efficiently. Be cautious when modeling highly nested structures, as it can lead to performance issues.
  6. Data Distribution (Cassandra): In Cassandra, data is distributed across nodes based on the partition key. Designing a proper partitioning strategy is crucial to avoid hotspots and ensure even data distribution for queries.
  7. Secondary Indexes (Cassandra): In Cassandra, using secondary indexes to query nested data can be inefficient. It’s generally recommended to design the schema to support the required queries without relying heavily on secondary indexes.
  8. Data Access Patterns: Understand the common access patterns of your application and design the data model accordingly. The database schema should cater to the specific needs of the queries your application will perform most frequently.
  9. Avoiding Unbounded Queries: In NoSQL databases, unbounded queries can lead to performance issues. Consider using pagination or other query optimizations to limit the amount of data retrieved in a single query.
  10. Sharding and Replication: Both Cassandra and MongoDB are designed to scale horizontally. Consider the implications of sharding and replication when dealing with nested queries, as they can impact query performance and data consistency.
  11. Query Modeling: Model your queries to take advantage of database-specific features, like secondary indexes, compound keys, or materialized views, to optimize performance for specific access patterns.

In conclusion, when you’re using databases like Apache Cassandra or MongoDB the flexibility and scalability force a required and careful consideration of data modeling and query design to efficiently handle nested queries. The complexity can often be more extensive than that of a relational database, but the advantages can be compounded by the very nature of the underlying systems. By understanding these database limitations and optimizing the data model to suit the application’s query patterns, you can make the most of these NoSQL databases while mitigating potential performance bottlenecks.

Elasticsearch

Elasticsearch, important to note it not being a database, but more specifically a search engine with respective distributed storage capabilities introduces a whole new realm of considerations. Here are a few I’ve bumped into over the years of implementing GraphQL APIs on engines like Elasticsearch.

  1. Data Indexing: Elasticsearch requires data to be indexed before it can be searched. Designing a proper indexing strategy is crucial to ensure that the data is organized and optimized for search queries, including nested queries.
  2. Nested Documents: Elasticsearch supports nested documents, allowing for complex data structures. However, keep in mind that nested queries can be more resource-intensive than regular queries, so optimizing the data model to minimize unnecessary nesting is important.
  3. Query Complexity: Complex nested queries in Elasticsearch can result in more processing overhead. Strive to keep your queries as simple as possible to improve search performance.
  4. Document Size: Elasticsearch performs best with reasonably sized documents. If your documents are too large or too nested, it can negatively impact performance. Consider flattening nested data if possible.
  5. Index Mapping: Define explicit mappings for your Elasticsearch indices to specify how fields should be indexed and queried. This can help optimize query performance and avoid unexpected behavior.
  6. Filter vs. Query Context: Understand the difference between filter context and query context in Elasticsearch queries. Filters are more efficient for simple binary decisions, while queries are better for scoring and relevance.
  7. Aggregations: Elasticsearch provides powerful aggregation capabilities to analyze and summarize data. However, complex aggregations can be resource-intensive, so use them judiciously.
  8. Scoring and Relevance: Elasticsearch uses scoring algorithms to rank search results based on relevance. Ensure that your queries and data model align with the desired relevance of search results.
  9. Pagination and Sorting: Plan for efficient pagination and sorting of search results. Avoid deep pagination, as it can lead to performance issues.
  10. Sharding and Replication: Elasticsearch is a distributed system that uses sharding and replication to achieve scalability and fault tolerance. Be mindful of the impact of sharding and replication on query performance and data consistency.
  11. Tuning Index Settings: Elasticsearch provides various index-level settings that can affect search performance. Tuning these settings based on your application’s needs can significantly impact query execution times.
  12. Data Modeling for Search: Design the data model in a way that aligns with the search use cases of your application. Consider the types of queries you will be performing frequently and optimize the data model accordingly.
  13. Cluster Health and Monitoring: Keep an eye on the cluster health and performance metrics. Monitor and optimize the performance of your Elasticsearch cluster regularly.
  14. Indexing and Search Performance Trade-offs: The indexing and search performance of Elasticsearch can be influenced by various factors. Understanding the trade-offs between indexing speed and query performance is crucial when designing your application.

Apache Kafka What?

Finally, there is Apache Kafka that comes up every now and again. Even though I haven’t implemented a GraphQL API on Kafka yet, it’s been done and I’ve been privy of the implications. Here are a few best practices I’ve picked up for implementing against Kafka.

  1. Data Synchronization: Decide on the data synchronization approach between Kafka and your GraphQL API. Will your GraphQL API act as a producer, a consumer, or both? Plan how data flows between the two systems to maintain consistency.
  2. Message Format: Define a standardized message format for data exchanged between Kafka and the GraphQL API. This format should be easily interpretable by both systems and include all necessary information for processing.
  3. Schema Evolution: Consider how schema changes in Kafka messages are handled by the GraphQL API. Plan for backward and forward compatibility to avoid breaking the API when message schemas evolve.
  4. Consumer Groups: When consuming data from Kafka, decide on appropriate consumer group configurations to manage the processing of messages efficiently and in parallel.
  5. Event Deduplication: Ensure that your GraphQL API can handle duplicate events from Kafka gracefully to avoid processing the same data multiple times.
  6. Error Handling: Implement robust error handling and retry mechanisms when processing Kafka messages. Handle failures gracefully and avoid data loss.
  7. Message Ordering: Be aware that Kafka does not guarantee strict message ordering across different partitions. Consider how this might impact the ordering of data processed by the GraphQL API.
  8. Throttling and Backpressure: Plan for throttling and backpressure mechanisms to control the rate at which data is consumed from Kafka to prevent overwhelming the GraphQL API with incoming messages.
  9. Security: Secure your Kafka system and the GraphQL API to prevent unauthorized access. Use appropriate authentication and authorization mechanisms to protect data integrity and confidentiality.
  10. Performance Optimization: Optimize the performance of your Kafka consumer and GraphQL API to handle high loads efficiently. Consider batching messages and implementing caching mechanisms when applicable.
  11. Monitoring and Logging: Implement monitoring and logging for both Kafka and the GraphQL API. Track message processing times, error rates, and system health to identify and resolve potential issues.
  12. Integration Testing: Conduct integration testing to ensure seamless communication between Kafka and the GraphQL API. Test different scenarios, such as handling delayed messages and high loads, to validate the system’s behavior.
  13. Versioning and Compatibility: Plan for versioning in both Kafka messages and GraphQL schema. This helps maintain compatibility and allows for smooth changes in both systems over time.
  14. Infrastructure Scalability: Design your Kafka and GraphQL systems with scalability in mind to handle future growth and increased data volumes.

Summary

Alright, that’s a boat load of practices for the top databases I’ve worked with to implement GraphQL against. I have tons more to add, but that’s enough detail for a single post! Suffice it to say, GraphQL can provide extensive capabilities with these various data sources.

DataLoader for GraphQL Implementations

A popular library used in GraphQL implementations is called DataLoader, and in many ways the name is somewhat descriptive of its purpose. As described in the JavaScript repo for the Node.js implementation for GraphQL

“DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching.”

The DataLoader solvers the N+1 problem that otherwise requires a resolver to make multiple individual requests to a database (or data source, i.e. another API), resulting in inefficient and slow data retrieval.

A DataLoader serves as a batching and caching layer for combining multiple requests int a single request. Grouping together identical requests and executing them more efficiently, thus minimizing the number of database or API round trips.

DataLoader Operation:

  1. Create a new instance of DataLoader, specifying a batch loading function. This function would define how to load the data for a given set of keys.
  2. The resolver iterates through the collection and instead of fetching the related data adds the keys for the data to be fetched to the DataLoader instance.
  3. The DataLoader collects the keys and for multiple keys, deduplicates the request and executes.
  4. Once the batch is executed DataLoader returns the results associating them with their respective keys.
  5. The resolver can then access the response data and resolve the field or relationships as needed.

DataLoader also caches the results of the previous requests so if the same key is requested again DataLoader retrieves from cache instead of making another request. This caching further improves performance and reduces redundant fetching.

DataLoader Implementation Examples

JavaScript & Node.js

The following is a basic implementation using Apollo Server of DataLoader for GraphQL.

const { ApolloServer, gql } = require("apollo-server");
const { DataLoader } = require("dataloader");

// Simulated data source
const db = {
  users: [
    { id: 1, name: "John" },
    { id: 2, name: "Jane" },
  ],
  posts: [
    { id: 1, userId: 1, title: "Post 1" },
    { id: 2, userId: 2, title: "Post 2" },
    { id: 3, userId: 1, title: "Post 3" },
  ],
};

// Simulated asynchronous data loader function
const batchPostsByUserIds = async (userIds) => {
  console.log("Fetching posts for user ids:", userIds);
  const posts = db.posts.filter((post) => userIds.includes(post.userId));
  return userIds.map((userId) => posts.filter((post) => post.userId === userId));
};

// Create a DataLoader instance
const postsLoader = new DataLoader(batchPostsByUserIds);

const resolvers = {
  Query: {
    getUserById: (_, { id }) => {
      return db.users.find((user) => user.id === id);
    },
  },
  User: {
    posts: (user) => {
      // Use DataLoader to load posts for the user
      return postsLoader.load(user.id);
    },
  },
};

// Define the GraphQL schema
const typeDefs = gql`
  type User {
    id: ID!
    name: String!
    posts: [Post]
  }

  type Post {
    id: ID!
    title: String!
  }

  type Query {
    getUserById(id: ID!): User
  }
`;

// Create Apollo Server instance
const server = new ApolloServer({ typeDefs, resolvers });

// Start the server
server.listen().then(({ url }) => {
  console.log(`Server running at ${url}`);
});

This example I created a DataLoader instance postsLoader using the DataLoader class from the dataloader package. I define a batch loading function batchPostsByUserIds that takes an array of user IDs and retrieves the corresponding posts for each user from the db.posts array. The function returns an array of arrays, where each sub-array contains the posts for a specific user.

In the User resolver I user the load method of DataLoader to load the posts for a user. The load method handles batching and caching behind the scenes, ensuring that redundant requests are minimized and results are cached for subsequent requests.

When the GraphQL server receives a query for the posts field of a User the DataLoader automatically batches the requests for multiple users and executes the batch loading function to retrieve the posts.

This example demonstrates a very basic implementation of DataLoader in a GraphQL server. In a real-world scenario there would of course be a number of additional capabilities and implementation details that you’d need to work on for your particular situation.

Spring Boot Java Implementation

Just furthering the kinds of examples, the following is a Spring Boot example.

First add the dependencies.

<dependencies>
  <!-- GraphQL for Spring Boot -->
  <dependency>
    <groupId>com.graphql-java</groupId>
    <artifactId>graphql-spring-boot-starter</artifactId>
    <version>5.0.2</version>
  </dependency>
  
  <!-- DataLoader -->
  <dependency>
    <groupId>org.dataloader</groupId>
    <artifactId>dataloader</artifactId>
    <version>3.4.0</version>
  </dependency>
</dependencies>

Next create the components and configure DataLoader.

import com.graphql.spring.boot.context.GraphQLContext;
import graphql.servlet.context.DefaultGraphQLServletContext;
import org.dataloader.BatchLoader;
import org.dataloader.DataLoader;
import org.dataloader.DataLoaderRegistry;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.web.context.request.WebRequest;

import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;
import java.util.stream.Collectors;

@SpringBootApplication
public class DataLoaderExampleApplication {

  // Simulated data source
  private static class Db {
    List<User> users = List.of(
        new User(1, "John"),
        new User(2, "Jane")
    );

    List<Post> posts = List.of(
        new Post(1, 1, "Post 1"),
        new Post(2, 2, "Post 2"),
        new Post(3, 1, "Post 3")
    );
  }

  // User class
  private static class User {
    private final int id;
    private final String name;

    User(int id, String name) {
      this.id = id;
      this.name = name;
    }

    int getId() {
      return id;
    }

    String getName() {
      return name;
    }
  }

  // Post class
  private static class Post {
    private final int id;
    private final int userId;
    private final String title;

    Post(int id, int userId, String title) {
      this.id = id;
      this.userId = userId;
      this.title = title;
    }

    int getId() {
      return id;
    }

    int getUserId() {
      return userId;
    }

    String getTitle() {
      return title;
    }
  }

  // DataLoader batch loading function
  private static class BatchPostsByUserIds implements BatchLoader<Integer, List<Post>> {
    private final Db db;

    BatchPostsByUserIds(Db db) {
      this.db = db;
    }

    @Override
    public CompletionStage<List<List<Post>>> load(List<Integer> userIds) {
      System.out.println("Fetching posts for user ids: " + userIds);
      List<List<Post>> result = userIds.stream()
          .map(userId -> db.posts.stream()
              .filter(post -> post.getUserId() == userId)
              .collect(Collectors.toList()))
          .collect(Collectors.toList());
      return CompletableFuture.completedFuture(result);
    }
  }

  // GraphQL resolver
  private static class UserResolver implements GraphQLResolver<User> {
    private final DataLoader<Integer, List<Post>> postsDataLoader;

    UserResolver(DataLoader<Integer, List<Post>> postsDataLoader) {
      this.postsDataLoader = postsDataLoader;
    }

    List<Post> getPosts(User user) {
      return postsDataLoader.load(user.getId()).join();
    }
  }

  // GraphQL configuration
  @Bean
  public GraphQLSchemaProvider graphQLSchemaProvider() {
    return (graphQLSchemaBuilder, environment) -> {
      // Define the GraphQL schema
      GraphQLObjectType userObjectType = GraphQLObjectType.newObject()
          .name("User")
          .field(field -> field.name("id").type(Scalars.GraphQLInt))
          .field(field -> field.name("name").type(Scalars.GraphQLString))
          .field(field -> field.name("posts").type(new GraphQLList(postObjectType)))
          .build();

      GraphQLObjectType postObjectType = GraphQLObjectType.newObject()
          .name("Post")
          .field(field -> field.name("id").type(Scalars.GraphQLInt))
          .field(field -> field.name("title").type(Scalars.GraphQLString))
          .build();

      GraphQLObjectType queryObjectType = GraphQLObjectType.newObject()
          .name("Query")
          .field(field -> field.name("getUserById")
              .type(userObjectType)
              .argument(arg -> arg.name("id").type(Scalars.GraphQLInt))
              .dataFetcher(environment -> {
                // Retrieve the requested user ID
                int userId = environment.getArgument("id");
                // Fetch the user by ID from the data source
                Db db = new Db();
                return db.users.stream()
                    .filter(user -> user.getId() == userId)
                    .findFirst()
                    .orElse(null);
              }))
          .build();

      return graphQLSchemaBuilder.query(queryObjectType).build();
    };
  }

  // DataLoader registry bean
  @Bean
  public DataLoaderRegistry dataLoaderRegistry() {
    DataLoaderRegistry dataLoaderRegistry = new DataLoaderRegistry();
    Db db = new Db();
    dataLoaderRegistry.register("postsDataLoader", DataLoader.newDataLoader(new BatchPostsByUserIds(db)));
    return dataLoaderRegistry;
  }

  // GraphQL context builder
  @Bean
  public GraphQLContext.Builder graphQLContextBuilder(DataLoaderRegistry dataLoaderRegistry) {
    return new GraphQLContext.Builder().dataLoaderRegistry(dataLoaderRegistry);
  }

  public static void main(String[] args) {
    SpringApplication.run(DataLoaderExampleApplication.class, args);
  }
}

This example I define the Db class as a simulated data source with users and posts lists. I create a BatchPostsByUserIds class that implements the BatchLoader interface from DataLoader for batch loading of posts based on user IDs.

The UserResolver class is a GraphQL resolver that uses the postsDataLoader to load posts for a specific user.

For the configuration I define the schema using GraphQLSchemaProvider and create GraphQLObjectType for User and Post, and Query object type with a resolver for the getUserById field.

The dataLoaderRegistry bean registers the postsDataLoader with the DataLoader registry.

This implementation will efficiently batch and cache requests for loading posts based on user IDs.

References

Other GraphQL Standards, Practices, Patterns, & Related Posts

GraphQL Error Handling

The following post is based on some of the common error handling techniques I’ve seen in use when implementing GraphQL APIs. The following examples include;

  1. Objects in the Response.
  2. Union Types.
  3. Middleware
  4. Custom Error Types
  5. Extensions
  6. Bubbling & Partial Results

To elaborate, a basic definition of each of these follows with a slightly deeper dive into the details of each example.

Error Objects in the Response

GraphQL allows you to define an error object structure within the response payload. When an error occurs during the execution of a GraphQL query, you can include relevant error information such as error codes, messages, and additional data in the response. This approach ensures that clients receive detailed error information and can handle errors appropriately.

It’s important to note that the approaches to error handling in GraphQL can vary depending on the specific GraphQL implementation or framework being used. These approaches are not mutually exclusive and can be combined to fit the needs of a particular application or organization. In JavaScript, an example of an error object in a GraphQL response might look like this:

{
  "data": null,
  "errors": [
    {
      "message": "Invalid argument value",
      "locations": [
        {
          "line": 3,
          "column": 7
        }
      ],
      "path": [
        "user",
        "name"
      ],
      "extensions": {
        "code": "INVALID_ARGUMENT",
        "details": {
          "minLength": 5,
          "maxLength": 20
        }
      }
    }
  ]
}

In this example, the response object has a data field set to null indicating that there was an error during the execution of the query. The errors field is an array containing an object representing the specific error that occurred.The error object includes the following properties:

  • message: A human-readable error message describing the issue.
  • locations: An array indicating the location of the error within the GraphQL query. Each location object contains the line and column where the error occurred.
  • path: An array representing the field path that caused the error. It helps identify the specific field that generated the error.
  • extensions: An optional field that can include additional information about the error. In this example, it includes the code field with a custom error code (INVALID_ARGUMENT) and a details object with specific details related to the error.

Please note that the structure and specific properties of error objects can vary depending on the GraphQL server implementation or framework being used. The example provided above showcases a common structure used to convey error information in a GraphQL response.

Union Types for Errors

GraphQL supports union types, which allow you to define a type that can represent multiple possible types. You can leverage this feature to create a union type that includes both successful responses and error responses. By defining such a type, clients can anticipate and handle errors as part of the normal response flow.

In GraphQL, a union type allows you to define a type that can represent multiple possible types. It’s a way to indicate that a field in a response can have different types of values. This concept is useful for error handling when you want to include both successful responses and error responses in the same field.To create a union type for errors, you can define a new GraphQL type that represents an error and include it as one of the possible types within the union type. This allows the field to return either a successful response or an error response, depending on the situation.Here’s an example to illustrate this concept further:

union ResponseType = SuccessResponse | ErrorResponse

type SuccessResponse {
  data: String
}

type ErrorResponse {
  error: String
}

In this example, the ResponseType is a union type that can represent either a SuccessResponse or an ErrorResponse. The SuccessResponse type has a data field that holds the successful response data, while the ErrorResponse type has an error field that contains the error message. Now, let’s say you make a GraphQL query and receive a response in JavaScript using this union type:

{
  "data": {
    "responseField": {
      "__typename": "SuccessResponse",
      "data": "Some data"
    }
  }
}

In this example response, the responseField returns a SuccessResponse object. The __typename field indicates the specific type of the returned value. Here, it is set to "SuccessResponse". Along with the data field containing the successful response data.Now, let’s consider an example where an error occurs:

{
  "data": {
    "responseField": {
      "__typename": "ErrorResponse",
      "error": "An error occurred"
    }
  }
}

In this case, the responseField returns an ErrorResponse object. The __typename field is set to "ErrorResponse", and the error field contains the error message. By utilizing a union type for errors, the client can anticipate the possible response types and handle both successful responses and error responses accordingly. It provides a unified way to structure and handle different types of responses within the same field.

Error Middleware

Middleware functions can be used in GraphQL servers to intercept and handle errors before they reach the resolver functions. Error middleware can perform tasks such as logging errors, transforming error messages, or enriching error data. It provides a centralized way to handle errors and can be customized based on specific application requirements.

Error middleware in the context of GraphQL refers to a mechanism where middleware functions are used to intercept and handle errors before they reach the resolver functions. It allows you to centralize error handling logic and perform tasks such as logging errors, transforming error messages, or enriching error data. Error middleware sits between the incoming request and the execution of the resolver functions, providing an opportunity to handle errors at a higher level.

In JavaScript, when implementing error middleware for a GraphQL server, you can use middleware functions provided by frameworks such as Express or Apollo Server. These middleware functions are executed in the order they are registered, allowing you to define custom error handling logic.Here’s an example of how error middleware could be implemented in JavaScript using Express:

const express = require('express');
const { ApolloServer, gql } = require('apollo-server-express');

// Define your GraphQL schema
const typeDefs = gql`
  type Query {
    hello: String
  }
`;

// Define your resolvers
const resolvers = {
  Query: {
    hello: () => {
      throw new Error('Something went wrong!');
    },
  },
};

// Create an ApolloServer instance
const server = new ApolloServer({ typeDefs, resolvers });

// Create an Express application
const app = express();

// Register error middleware
app.use((err, req, res, next) => {
  // Handle the error and send a custom error response
  res.status(500).json({ message: 'Internal Server Error' });
});

// Apply the Apollo Server middleware to the Express app
server.applyMiddleware({ app });

// Start the server
app.listen({ port: 4000 }, () => {
  console.log(`Server running at http://localhost:4000${server.graphqlPath}`);
});

In this example, we define a simple GraphQL schema with a single hello query that always throws an error. The error middleware function is registered using app.use() in Express. It takes four parameters: errreqres, and next. When an error occurs during the execution of a resolver, the error middleware is invoked with the error object (err), the request object (req), the response object (res), and the next function.Inside the error middleware function, you can handle the error as per your requirements. In this example, we simply send a custom error response with a status code of 500 and a JSON payload containing an error message. By using error middleware, you can implement custom error handling logic, such as logging errors to a central system, translating error messages based on the client’s preferred language, or modifying the error response structure. This approach helps centralize error handling and keeps the resolver functions focused on their core responsibilities.

Custom Error Types

GraphQL allows you to define custom scalar types, object types, and enum types. Similarly, you can define custom error types that encapsulate specific error scenarios in your application. By utilizing custom error types, you can provide more structured error responses, including standardized fields like error codes, error messages, and additional metadata.

Custom error types in GraphQL refer to defining your own error-specific types that encapsulate specific error scenarios in your application. By creating custom error types, you can provide more structured error responses, including standardized fields like error codes, error messages, and additional metadata.To define a custom error type in GraphQL, you can extend the built-in Error type or create a new object type specifically for handling errors. By extending the Error type, you inherit its fields and can add custom fields and metadata specific to your application’s error handling needs. Here is an example to illustrate the concept of custom error types in GraphQL:

type CustomError implements Error {
  code: String!
  message: String!
  additionalData: JSON
}

type Query {
  getUser(id: ID!): User
}

type User {
  id: ID!
  name: String!
}

In this example, we define a custom error type called CustomError, which implements the built-in Error interface. The CustomError type includes fields such as codemessage, and additionalData. These fields provide standardized information about the error, such as an error code, an error message, and any additional data that might be relevant to the error.Now, let’s consider an example of implementing a custom error type in JavaScript using a resolver function:

const { ApolloServer, gql, ApolloError } = require('apollo-server');

// Define your GraphQL schema
const typeDefs = gql`
  type Query {
    getUser(id: ID!): User
  }

  type User {
    id: ID!
    name: String!
  }
`;

// Define your resolvers
const resolvers = {
  Query: {
    getUser: (_, { id }) => {
      if (id !== '1') {
        throw new ApolloError('User not found', 'USER_NOT_FOUND', {
          invalidId: id,
        });
      }

      return { id: '1', name: 'John Doe' };
    },
  },
};

// Create an ApolloServer instance
const server = new ApolloServer({ typeDefs, resolvers });

// Start the server
server.listen().then(({ url }) => {
  console.log(`Server running at ${url}`);
});

In this example, we define a resolver for the getUser query. If the provided id is not '1', we throw an ApolloError with a custom error message and additional metadata (invalidId). The ApolloError is a pre-defined error class provided by Apollo Server that allows you to create custom errors.By throwing a custom error, we can leverage the error handling mechanisms in GraphQL and ensure that the client receives structured error responses. The client can then handle these errors based on the provided error code, message, and additional data. Using custom error types helps maintain consistency in error responses, allows for better error categorization, and provides a clear structure for conveying error information to clients consuming your GraphQL API.

Error Extensions

GraphQL allows extensions to be added to the response payload. You can leverage this feature to include additional information with error responses. For example, you can include debugging information, stack traces, or links to relevant documentation within the response extensions. This approach enhances the debugging experience and provides developers with valuable context when troubleshooting issues.

In the context of GraphQL, error extensions refer to a mechanism that allows you to include additional information or metadata with error responses. It extends the standard error response by providing a way to attach custom fields or data to the error object. Error extensions are particularly useful for enriching the error response with debugging information, stack traces, or links to relevant documentation.When an error occurs during the execution of a GraphQL query, you can include an extensions field within the error object to provide additional data specific to that error. This field can contain any JSON-serializable data, allowing you to customize the error response with relevant information for debugging or error handling purposes.Here’s an example to illustrate the concept of error extensions in GraphQL:

{
  "data": null,
  "errors": [
    {
      "message": "Invalid argument value",
      "locations": [
        {
          "line": 3,
          "column": 7
        }
      ],
      "path": [
        "user",
        "name"
      ],
      "extensions": {
        "code": "INVALID_ARGUMENT",
        "details": {
          "minLength": 5,
          "maxLength": 20
        }
      }
    }
  ]
}

In this example, the error response includes an extensions field within the error object. The extensions field contains custom data related to the error, such as an error code (code) and specific details (details) about the error, such as the minimum and maximum length allowed for the argument value.Now, let’s consider an example of implementing error extensions in JavaScript:


// Define your GraphQL schema
const typeDefs = gql`
  type Query {
    getUser(id: ID!): User
  }

  type User {
    id: ID!
    name: String!
  }
`;

// Define your resolvers
const resolvers = {
  Query: {
    getUser: (_, { id }) => {
      if (id !== '1') {
        const error = new Error('User not found');
        error.extensions = {
          code: 'USER_NOT_FOUND',
          invalidId: id,
        };
        throw error;
      }

      return { id: '1', name: 'John Doe' };
    },
  },
};

// Create an ApolloServer instance
const server = new ApolloServer({ typeDefs, resolvers });

// Start the server
server.listen().then(({ url }) => {
  console.log(`Server running at ${url}`);
});

In this example, within the resolver function for the getUser query, we create a custom error using the Error class. We then attach the error extensions by assigning a custom extensions object to the error.extensions property. In this case, the extensions include an error code (code) and the invalidId that caused the error. By utilizing error extensions, you can enrich the error response with custom fields or metadata that provides additional context to clients consuming your GraphQL API. Clients can access and utilize these extensions to enhance error handling, error logging, or for implementing specific error-related behaviors in their applications.

Error Bubbling and Partial Results

GraphQL supports error bubbling, which means that even if errors occur during the execution of a query, the server can continue executing the remaining parts of the query and return a partial result. This allows clients to receive as much data as possible while still being aware of the occurred errors. By leveraging this behavior, clients can handle partial results gracefully and make informed decisions based on the available data.

Error bubbling refers to the propagation of errors through the GraphQL resolver chain. When an error occurs in a resolver, it can be propagated up to higher-level resolvers or the root resolver. This allows for a hierarchical error handling approach, where errors can be caught and processed at different levels of the resolver hierarchy. By bubbling up errors, you can handle and modify the error response based on the specific context or requirements of each resolver.

Partial results, in the context of GraphQL, refer to the concept of returning a mixture of successfully resolved data and errors in a single response. When executing a GraphQL query, if an error occurs during the resolution of a field, it doesn’t necessarily mean the entire query execution should fail. Partial results allow you to still return the successfully resolved data while indicating the presence of errors in the response. This enables clients to process and display the available data while being aware of any errors that occurred during the query execution.Here’s an example in JavaScript to demonstrate error bubbling and partial results in GraphQL:


// Define your GraphQL schema
const typeDefs = gql`
  type Query {
    user(id: ID!): User
  }

  type User {
    id: ID!
    name: String!
    email: String!
    posts: [Post]
  }

  type Post {
    id: ID!
    title: String!
    content: String!
  }
`;

// Define your resolvers
const resolvers = {
  Query: {
    user: (_, { id }) => {
      if (id !== '1') {
        throw new Error('User not found');
      }

      return {
        id: '1',
        name: 'John Doe',
        email: 'john@example.com',
        posts: [
          { id: '1', title: 'First Post', content: 'This is the first post' },
          { id: '2', title: 'Second Post', content: 'This is the second post' },
        ],
      };
    },
  },
  User: {
    posts: (user) => {
      if (user.id !== '1') {
        throw new Error('User ID not found');
      }

      return user.posts;
    },
  },
};

// Create an ApolloServer instance
const server = new ApolloServer({ typeDefs, resolvers });

// Start the server
server.listen().then(({ url }) => {
  console.log(`Server running at ${url}`);
});

In this example, we have a GraphQL schema with a user query that retrieves user information and their associated posts. The user resolver throws an error if the provided id is not '1'. Similarly, the posts resolver for the User type throws an error if the user ID is not '1'. When executing a query like this:

query {
  user(id: "1") {
    id
    name
    email
    posts {
      id
      title
      content
    }
  }
}

TheΒ userΒ resolver executes successfully and returns the user data along with the associated posts. However, if theΒ idΒ provided is notΒ '1', an error is thrown and propagated up the resolver chain. The error is then included in the response, indicating the specific error that occurred during the resolution of the field.This demonstrates error bubbling, as the error from the inner resolver propagates up to the parent resolver and eventually to the root resolver. It allows for handling errors at different levels and providing a response that includes both successfully resolved data and error information.Partial results come into play when an error occurs during the resolution of a specific field. In the example, if the user ID is not found, theΒ userΒ resolver throws an error, but the response still contains the successfully resolved fields (id,Β name, andΒ email), indicating a partial result. Clients can handle the available data while being aware of the error in the response.

Other GraphQL Standards, Practices, Patterns, & Related Posts

A Hasura Quick Start with Remote Schema, Remote Joins

I’ve been building GraphQL APIs for a number of years now – of along side RESTful, gRPC, XML, and other API styles I won’t even bring up right now – and so far GraphQL APIs have been great to work with. The libraries in different languages form .NET’s Hot Chocolate, Go’s graphql-go, Apollo’s JavaScript based tooling and servers, to Java’s GraphQL for Spring have worked great.

Sometimes you’re in the fortunate situation where you’re using PostgreSQL or SQL Server, or other supported database for a tool like Hasura. Being able to get a full GraphQL (with REST options too) API running in seconds is pretty impressive. From a development perspective it is a massive boost. As Hasura adds more database connectors as they have with Snowflake and Amazon Athena, the server and tooling becomes even more powerful.

With that I wanted to show a N+1 demo where N is day 1 with Hasura. The idea is what do you do immediately after you get a sample service running with Hasura. How do you integrate it with other services, or more specifically how do you integrate your Hasura API along side APIs you’ve written yourself, such as an enterprise GraphQL for Spring based API running against Mongo or other data source? This repo is the basis for several demonstration repositories I am building that will show how you can setup – generally for local development – Hasura + X API with Y Language stack.

This is the Hasura quick start repository here, with migrations and metadata for a local setup. The first demonstration repo for a peripheral GraphQL API will be a Spring based API in this repository. The following steps will get the quick start repository up and running.

  1. Clone this repo git clone git@github.com:Adron/hasura-quick-start.git.
  2. From the root (where the docker-compose.yml file is located) execute docker compose up -d.
  3. Navigate into the hasura directory.
  4. Execute hasura metadata apply, then hasura migrate apply, and then hasura metadata apply. Just do it, it’s a strange workflow thing.
  5. Navigate now into the `hasura` directory and execute hasura console.

These steps are demonstrated in this video from 48 seconds.

What do you get once deployed?

The following are some of the core capabilities of Hasura and showcase what you can get up and running in a matter of seconds, even when you start from a completely empty database! First off you’ll find the database now has 3 tables along with their pertinent schema built out in PostgreSQL and available via Hasura, as shown here under the Data tab of the console.

I also created a schema diagram just to provide a visual of how these tables are designed.

For the remote schema, the Spring API, the following steps will get it cloned and running locally.

  1. Clone this repo git clone git@github.com:Adron/hasura-spring-boot-graphql.git.
  2. Execute ./gradlew build to get the jar file build. It will then be located in the build/libs directory of the project.
  3. Next build the docker image with docker build -t adron/hasura-spring-boot-graphql . to build the docker image locally.
  4. Now you can either start this container with docker compose up -d using the docker-compose.yml in the project or you can run the image with Docker specifically with docker run -p 8081:8080 adron/hasura-spring-boot-graphql.

For a walkthrough of getting the Spring API running, check out 2:28 onward in this video.

Now both of these instances are running locally and you can test each out respectively, but not specifically together. I’ll have probably write up another post on how to get services that spin up separately to run together for localized development. However, with the way things are setup in the two repos, it’s as if one team is the Hasura team building a GraphQL API and another is a Spring Java GraphQL API team, and they’re working autonomously of each other just based on contract of the APIs themselves.

Remote Schema

With that being the scenario, I’ve deployed the Spring API out remotely so that I could show how to put together a remote schema connection and then a remote join query, i.e. nested query in GraphQL speak, across these two APIs.

To add the remote schema, click on the remote schemas tab on the console. Add a name (1), then the URI (2), and optionally if needed add appropriate headers (3) or forward all headers from client requests.

Once that’s added, navigate to the relationships tab of the new remote schema and click on add. Then for this example, select remote database (1), then add a name (4) (Customer in the example) and then for type choose object (3) (per the example).

Then scroll down on that console screen and choose sales_data (1) and default, public, and users (2) under the reference database, schema, and table. Next up choose the source field (3) and reference column (4).

Once added it will look like this in the console.

This creates a relationship to be able to make nested queries against these sources with GraphQL. If it were a single contiguous database the schema would look like this. I’ve color coded the sales_data table as red, to signify it is the table we know is in another database (or, specifically, provided via another hosted API). However, as stated, in a single database the relationships would now look like this. The relationship however, isn’t in a database, but stored in the Hasura metadata between users and sales_data.

Now writing a query across this data would shape up like this. Because of the way the relationship was drawn via the remote schema, the path to get the nested object Customer (2) for the sales data is to start with the sales_data (1) entity. As shown.

sales_data {
  sales_number
  updated_at
  Customer {
    name
  }
}

Now we want to add more details about the particular customer like their email and details. To do this we’ll utilize another nesting level within this query that delves into relationships that are in the PostgreSQL database itself.

sales_data {
  sales_number
  updated_at
  Customer {
    name
    emails {
      email
    }
    details {
      details
    }
  }
}

With this the nested details email (3) and details (4) will be provided, which is foreign key relationships to the primary key table users in the underlying database, made available by Hasura’s relationships in metadata.

Boom! That’s it. Pretty easy setup if the databases and APIs have Hasura available to connect them in this way. Otherwise, this is a huge challenge to develop against if you’re just using solely a tech stack like Apollo, Spring Boot, or Hot Chocolate. Often something along federation and more complexities would come into play. But more on that later, I’ve got a piece coming on federation, stitching, remote schemas, and gateway – among various ways – to get multiple GraphQL, or GraphQL and RESTful APIs together into a singular, or singularly managed, API end point.

Hope that was useful, if you’ve got comments, questions, or curiosities let me know in the comments here, or pop over to the video and leave a comment there.

References:

The full video of setup and how the remote schema & joins work in Hasura.

Terrazura – A Build Out of an Azure based, Hasura GraphQL API on Postgres

I created this repo https://github.com/Adron/terrazura​ during a live stream on my Twitch Thrashing Code Channel 🀘 at 10am on the 30th of December, 2020. The VOD is now available on my YouTube Thrashing Code Channel https://youtube.com/thrashingcode​. A rough as hell year, but wanted to wrap it up with some solid content. In this stream I tackled a ton of specifics, in detail about getting Hasura deployed in Azure, Postgres backed, a database schema designed and created, using database schema migrations, and all sorts of tips n’ tricks along the way. 3 hours of solid how to get shit done material!

For live streams, check out and follow at https://www.twitch.tv/thrashingcode​ πŸ‘ŠπŸ» or for VOD viewing check out https://youtube.com/thrashingcode

A point in coding during the video!

02:49​ – Shout out to the stream sponsor, Azure, and links to some collateral material.
14:50​ – In this first segment, I start but run into some troubleshooting needs around the provider versions for Terraform in regards to Azure. You can skip this part unless you want to see what issue I ran into.
18:24​ – Since I ran into issues with the current version of Terraform I had installed, at this time I show a quick upgrade to the latest version.
27:22​ – After upgrading and fighting through trial and error execution of Terraform until I finally get the right combination of provider and Terraform versions.
27:53​ – Adding the first Terraform resource, the Azure resource group.
29:47​ – Azure Portal oddness, just to take note off if/when you’re working through this. Workaround later in the stream.
32:00​ – Adding the Postgres server resource.
44:43​ – In this segment I switched over to Jetbrain’s Intellij to do the rest of the work. I also tweak the IDE to re-add the plugin for the material design themes and icons. If you use this IDE, it’s very much IMHO worth getting this to switch between themes.
59:32​ – After getting leveled-up with the IDE, I wrap up the #Postgres​ server resource and #terraform​ apply it the overall set of resources. At this point I also move forward with the infrastructure as code, with emphasis on additive changes to the immutable infrastructure by emphasizing use of terraform apply and minimizing any terraform destroy use.
1:02:07​ – At this time, I try figuring out the portal issue by az logout and logging back in az login to Azure Still no resources shown but…
1:08:47​ – eventually I realize I have to use the hack solution of pasting the subscription ID into the
@Azure portal to get resources for the particular subscription account which seems highly counter intuitive since its the ONLY account. 🧐
1:22:54​ – The next thing I setup, now that I have variables that need passed in on every terraform execution, I add a script to do this for me.
1:29:35​ – Next up is adding the database to the database server and firewall rule. Also we get to see Jetbrains #Intellij​ HCL plugin introspection at work adding required properties to the firewall resource! A really useful feature.
1:38:24​ – Next up, creating the Azure container to deploy our Hasura GraphQL API for #Postgres​ to!
1:51:42​ – BAM! API Server is done and launched! I’ve got a live #GraphQL​ API up and running in Azure and we’re ready to start building a data model!
1:56:22​ – In this segment I show how to turn off the public facing console and shift one’s development workflow to the local Hasura console working against – local OR your live dev environment.
1:58:29​ – Next segment I get into schema migrations, initializing a directory structure for Hasura CLI use, and metadata, migrations, and related data. Including an update to the latest CLI so you can see how to do that, after a run into a slight glitch. 😬
2:23:02​ – I also shift over to dbdiagram to graphically build out some of the schema via their markdown, then use the SQL export option for #postgres​ combined with Hasura’s option to execute plain ole SQL via migrations…
2:31:48​ – Getting a bit more in depth in this segment, I delve through – via the Hasura console – to build out relationships between the tables and data so the graphql queries can introspect accordingly.
2:40:30​ – Next segment, graphql time! I show some of the options of what is available immediately for queries and mutations via the console.
2:50:36​ – Then some more details about metadata. I’m going to do a stream with further details, since I was a little fuzzy on some of those details myself, in the very very near future. However a good introduction to what the metadata does for the #graphql​ API.
2:59:07​ – Then as a wrap up to all of this… I nuke EVERYTHING and deploy it all out to Azure again inclusive of schema migrations, metadata, etc. 🀘🏻
3:16:30​ – Final segment, I add some data to the database and get into a few basic queries and mutations in #graphql​ via the #graphiql​ console interface in #Hasura​.