Engineering API Governance

I’ve written this post to provide a detailed look at what a role around API Governance, or simply the functions of API Governance, would be centered around.

Why does this role exist?

An API Governance function or role would be responsible for overseeing the governance of APIs within an organization. It would center around ensuring that APIs are developed, implemented, and maintained according to established standards, best practices, and business requirements.

Here are the primary reasons why this type of work and role exist:

  1. Growing Importance of APIs: APIs have become the backbone of modern software development and integration. They enable different applications and systems to communicate and share data seamlessly. As organizations increasingly adopt microservices architecture and cloud-based solutions, the number and complexity of APIs they use grow significantly.
  2. Consistency and Standardization: In large organizations with multiple development teams and projects, maintaining consistency in API design, implementation, and usage becomes challenging. An API governance role ensures that APIs adhere to standardized guidelines, leading to better interoperability and reusability.
  3. Security and Compliance: APIs expose access points to an organization’s systems and data. Ensuring the security of these access points is critical to prevent data breaches, unauthorized access, and other security risks. An API governance role focuses on implementing robust security measures and compliance with relevant regulations.
  4. Efficient Collaboration: When different teams create APIs independently, they may not be aware of existing solutions or may duplicate efforts. An API governance function facilitates collaboration, knowledge sharing, and reuse of APIs, leading to more efficient development processes.
  5. Scalability and Performance: Proper governance includes performance monitoring and optimization of APIs. This helps identify bottlenecks, improve response times, and ensure that APIs can handle increasing workloads as the organization grows.
  6. User Experience: APIs are used by developers, both internally and externally, to build applications. A well-governed API ecosystem includes clear and comprehensive documentation, which enhances the developer experience and enables faster integration with APIs.
  7. Risk Mitigation: By setting up a structured governance process, organizations can identify and address potential risks associated with APIs early on, reducing the chances of costly issues arising in production.
  8. API Lifecycle Management: APIs have a lifecycle that includes planning, development, versioning, and deprecation. Proper governance ensures that APIs are maintained and updated according to their lifecycle stages, preventing the accumulation of outdated or obsolete APIs.
  9. Compliance with Business Strategy: API governance aligns API development and management with the organization’s overall business strategy, ensuring that APIs support the company’s goals and objectives effectively.
  10. Adoption of Best Practices: An API governance role stays updated with industry trends and best practices, continuously improving the organization’s API strategy and development processes.

Engineering API Governance Primary Functions

The API Governance function or role is responsible for overseeing and managing the governance of Application Programming Interfaces (APIs) within an organization. It ensures that APIs are developed, implemented, and maintained according to established standards, best practices, and business requirements. Here are some key aspects of an API Governance function or role:

  1. Defining API Standards: The API Governance team establishes and documents API design standards, naming conventions, versioning practices, security guidelines, documentation requirements, and other relevant policies. These standards promote consistency and facilitate seamless integration among different APIs.
  2. API Lifecycle Management: The API Governance function oversees the entire lifecycle of APIs. This includes planning and design, development, testing, deployment, monitoring, version control, and eventual retirement or deprecation when necessary.
  3. Security and Compliance: Ensuring the security of APIs is a critical aspect of the API Governance role. The team establishes security protocols, access controls, authentication mechanisms, and data protection measures to safeguard APIs and the systems they interact with. Additionally, they ensure compliance with relevant industry regulations and data privacy laws.
  4. Documentation and Communication: API Governance teams create comprehensive and easily accessible documentation for APIs. This documentation helps internal and external developers understand how to use the APIs effectively and provides details about their capabilities, limitations, and potential use cases.
  5. Monitoring and Performance: The API Governance function sets up monitoring systems to track API usage, performance metrics, and error rates. This data helps identify potential issues, bottlenecks, and areas for improvement. It allows the team to optimize APIs to meet service-level requirements and user expectations.
  6. Collaboration and Coordination: The API Governance role involves collaborating with various teams, including development, product management, security, and operations. Effective coordination ensures that APIs are aligned with business objectives and developed in a way that meets the needs of different stakeholders.
  7. Review and Approval Process: The API Governance team establishes a review and approval process for new APIs and changes to existing ones. This process ensures that APIs meet the required standards, security measures, and compliance before they are deployed.
  8. Education and Training: The API Governance function may conduct training sessions for developers and other stakeholders to promote best practices in API development, usage, and maintenance.
  9. Adoption of API Management Tools: API Governance teams often utilize API management tools to facilitate API governance tasks. These tools can assist in monitoring, versioning, security, analytics, and documentation management.
  10. Continuous Improvement: The API Governance function remains updated with industry trends and best practices to continuously improve the organization’s API strategy and governance approach.

In a large enough organization even these individual functions become individually staffed work. However, a single API Governance staff member often would be tasked with these items and would need to delegate and organize the priority of the various functions throughout the organization.

In following posts (which I’ll include here once they’re posted) I’ll write up some scenarios, from working as an individual contributor and leader (i.e. managing staff) as well as hiring for API Governance roles. I’ll get into the nitty gritty of how process, practice, and patterns can be used to elaborate on things like education, adoption of API management tools, continuous improvement and the many other functions.

With that, subscribe for updates, and you’ll get any new posts direct to your inbox! (check side bar for subscribing)

GraphQL Nested Queries, Relationships, and Different Data Sources Practices

When building a GraphQL API with nested queries and relationships – specifically when you’re using a relational database – it’s important to follow best practices to attain efficient and performant data retrieval while preventing overly nested queries. From the GraphQL API perspective, here are some practices to follow:

  1. Use GraphQL Fragments: Fragments allow you to define reusable sets of fields that can be included in multiple queries. This helps avoid duplicating nested fields and keeps queries concise and readable.
  2. Resolve Nested Data Efficiently: Use efficient data fetching techniques to resolve nested data. Techniques like batch loading and data loaders can help avoid the N+1 query problem, where multiple database queries are triggered for each item in a list.
  3. Limit Depth of Nested Queries: Consider setting a maximum allowed depth for nested queries. In some tools this can be set via configuration, and in most language stacks the libraries focused on GraphQL also support various features and capabilities to get this limitation in place. This helps prevent clients from making excessively deep queries that can lead to performance issues that would, for example, incur a 4, 5 or more tables in a single query into the database!
  4. Pagination: For lists of data, use pagination to limit the amount of data returned in a single query. This prevents queries from becoming overly large and ensures efficient data retrieval.
  5. Use Aliases: Aliases allow clients to request the same field multiple times with different arguments. This can help reduce nesting by fetching data for related entities in a single query.
  6. Avoid Deep Nesting: Strive to keep your GraphQL queries shallow and avoid excessive nesting. If a query becomes too nested, it may be an indication that the schema design needs improvement.
  7. Encourage Specific Queries: Instead of relying solely on generic queries, encourage clients to use specific queries tailored to their needs. This can prevent unnecessary data retrieval and reduce the chance of overly nested queries.
  8. Provide Field Arguments: Offer field arguments to allow clients to customize the shape of the data they retrieve. This way, clients can request only the data they need, reducing the risk of getting overly nested responses.
  9. Use @defer and @stream: GraphQL supports deferred and streamed responses. By using these features, you can provide more fine-grained control over data retrieval and prevent unnecessary waiting for nested data.
  10. Educate API Consumers: If you are building a public API, provide clear documentation and examples on how to use the API efficiently. Educate API consumers on best practices for querying data and avoiding overly nested queries.
  11. Performance Testing: Conduct performance testing on your GraphQL API to identify potential bottlenecks and areas of improvement. This can help you optimize the data fetching process and avoid performance issues due to nested queries.

By following these practices, you can ensure that your GraphQL API provides a smooth and efficient experience for clients, while also preventing the negative impact of overly nested queries on server performance.

But what about situations you’re building a GraphQL API that isn’t going to be built on a relational database? Well you’re in luck, because I’ve done this more than once and I’ve got a few patterns you can use to help ensure your services stay up to snuff.

Apache Cassandra & Mongo DB

When you’re using databases like Apache Cassandra (a wide-column store) or MongoDB (a document-oriented database), there are some additional concerns related to nested queries and data modeling that should be taken into account. For example, Mongo can have nesting in the document itself – and it could go deep – while the document could hold significant nesting, depending on how data is stored and modeled in the underlying BSON (Binary JSON). This can add complexities and the data being queried needs to be understood to realize the implications of querying from something like GraphQL.

  1. Data Modeling for Query Support: Unlike relational databases, Cassandra and MongoDB do not support complex JOIN operations, making it essential to design the data model to support the required queries efficiently. This may involve denormalizing data and duplicating information to facilitate query patterns.
  2. No Transactions: Both Cassandra and MongoDB are NoSQL databases and do not support full ACID transactions across multiple documents or rows. As a result, handling complex nested queries across multiple entities may require careful consideration of eventual consistency and data integrity.
  3. Data Duplication for Performance: To optimize queries, you may need to denormalize and duplicate data, leading to increased storage requirements. Balancing query performance with storage efficiency becomes crucial in such cases.
  4. Aggregation Pipeline (MongoDB): When using MongoDB, the Aggregation Pipeline can be powerful for handling complex data processing and nested queries. Understanding and leveraging the aggregation framework effectively can be essential for optimal performance.
  5. Limitations on Nested Arrays: While both databases support nested data structures (arrays or maps), deeply nested arrays can become challenging to query efficiently. Be cautious when modeling highly nested structures, as it can lead to performance issues.
  6. Data Distribution (Cassandra): In Cassandra, data is distributed across nodes based on the partition key. Designing a proper partitioning strategy is crucial to avoid hotspots and ensure even data distribution for queries.
  7. Secondary Indexes (Cassandra): In Cassandra, using secondary indexes to query nested data can be inefficient. It’s generally recommended to design the schema to support the required queries without relying heavily on secondary indexes.
  8. Data Access Patterns: Understand the common access patterns of your application and design the data model accordingly. The database schema should cater to the specific needs of the queries your application will perform most frequently.
  9. Avoiding Unbounded Queries: In NoSQL databases, unbounded queries can lead to performance issues. Consider using pagination or other query optimizations to limit the amount of data retrieved in a single query.
  10. Sharding and Replication: Both Cassandra and MongoDB are designed to scale horizontally. Consider the implications of sharding and replication when dealing with nested queries, as they can impact query performance and data consistency.
  11. Query Modeling: Model your queries to take advantage of database-specific features, like secondary indexes, compound keys, or materialized views, to optimize performance for specific access patterns.

In conclusion, when you’re using databases like Apache Cassandra or MongoDB the flexibility and scalability force a required and careful consideration of data modeling and query design to efficiently handle nested queries. The complexity can often be more extensive than that of a relational database, but the advantages can be compounded by the very nature of the underlying systems. By understanding these database limitations and optimizing the data model to suit the application’s query patterns, you can make the most of these NoSQL databases while mitigating potential performance bottlenecks.

Elasticsearch

Elasticsearch, important to note it not being a database, but more specifically a search engine with respective distributed storage capabilities introduces a whole new realm of considerations. Here are a few I’ve bumped into over the years of implementing GraphQL APIs on engines like Elasticsearch.

  1. Data Indexing: Elasticsearch requires data to be indexed before it can be searched. Designing a proper indexing strategy is crucial to ensure that the data is organized and optimized for search queries, including nested queries.
  2. Nested Documents: Elasticsearch supports nested documents, allowing for complex data structures. However, keep in mind that nested queries can be more resource-intensive than regular queries, so optimizing the data model to minimize unnecessary nesting is important.
  3. Query Complexity: Complex nested queries in Elasticsearch can result in more processing overhead. Strive to keep your queries as simple as possible to improve search performance.
  4. Document Size: Elasticsearch performs best with reasonably sized documents. If your documents are too large or too nested, it can negatively impact performance. Consider flattening nested data if possible.
  5. Index Mapping: Define explicit mappings for your Elasticsearch indices to specify how fields should be indexed and queried. This can help optimize query performance and avoid unexpected behavior.
  6. Filter vs. Query Context: Understand the difference between filter context and query context in Elasticsearch queries. Filters are more efficient for simple binary decisions, while queries are better for scoring and relevance.
  7. Aggregations: Elasticsearch provides powerful aggregation capabilities to analyze and summarize data. However, complex aggregations can be resource-intensive, so use them judiciously.
  8. Scoring and Relevance: Elasticsearch uses scoring algorithms to rank search results based on relevance. Ensure that your queries and data model align with the desired relevance of search results.
  9. Pagination and Sorting: Plan for efficient pagination and sorting of search results. Avoid deep pagination, as it can lead to performance issues.
  10. Sharding and Replication: Elasticsearch is a distributed system that uses sharding and replication to achieve scalability and fault tolerance. Be mindful of the impact of sharding and replication on query performance and data consistency.
  11. Tuning Index Settings: Elasticsearch provides various index-level settings that can affect search performance. Tuning these settings based on your application’s needs can significantly impact query execution times.
  12. Data Modeling for Search: Design the data model in a way that aligns with the search use cases of your application. Consider the types of queries you will be performing frequently and optimize the data model accordingly.
  13. Cluster Health and Monitoring: Keep an eye on the cluster health and performance metrics. Monitor and optimize the performance of your Elasticsearch cluster regularly.
  14. Indexing and Search Performance Trade-offs: The indexing and search performance of Elasticsearch can be influenced by various factors. Understanding the trade-offs between indexing speed and query performance is crucial when designing your application.

Apache Kafka What?

Finally, there is Apache Kafka that comes up every now and again. Even though I haven’t implemented a GraphQL API on Kafka yet, it’s been done and I’ve been privy of the implications. Here are a few best practices I’ve picked up for implementing against Kafka.

  1. Data Synchronization: Decide on the data synchronization approach between Kafka and your GraphQL API. Will your GraphQL API act as a producer, a consumer, or both? Plan how data flows between the two systems to maintain consistency.
  2. Message Format: Define a standardized message format for data exchanged between Kafka and the GraphQL API. This format should be easily interpretable by both systems and include all necessary information for processing.
  3. Schema Evolution: Consider how schema changes in Kafka messages are handled by the GraphQL API. Plan for backward and forward compatibility to avoid breaking the API when message schemas evolve.
  4. Consumer Groups: When consuming data from Kafka, decide on appropriate consumer group configurations to manage the processing of messages efficiently and in parallel.
  5. Event Deduplication: Ensure that your GraphQL API can handle duplicate events from Kafka gracefully to avoid processing the same data multiple times.
  6. Error Handling: Implement robust error handling and retry mechanisms when processing Kafka messages. Handle failures gracefully and avoid data loss.
  7. Message Ordering: Be aware that Kafka does not guarantee strict message ordering across different partitions. Consider how this might impact the ordering of data processed by the GraphQL API.
  8. Throttling and Backpressure: Plan for throttling and backpressure mechanisms to control the rate at which data is consumed from Kafka to prevent overwhelming the GraphQL API with incoming messages.
  9. Security: Secure your Kafka system and the GraphQL API to prevent unauthorized access. Use appropriate authentication and authorization mechanisms to protect data integrity and confidentiality.
  10. Performance Optimization: Optimize the performance of your Kafka consumer and GraphQL API to handle high loads efficiently. Consider batching messages and implementing caching mechanisms when applicable.
  11. Monitoring and Logging: Implement monitoring and logging for both Kafka and the GraphQL API. Track message processing times, error rates, and system health to identify and resolve potential issues.
  12. Integration Testing: Conduct integration testing to ensure seamless communication between Kafka and the GraphQL API. Test different scenarios, such as handling delayed messages and high loads, to validate the system’s behavior.
  13. Versioning and Compatibility: Plan for versioning in both Kafka messages and GraphQL schema. This helps maintain compatibility and allows for smooth changes in both systems over time.
  14. Infrastructure Scalability: Design your Kafka and GraphQL systems with scalability in mind to handle future growth and increased data volumes.

Summary

Alright, that’s a boat load of practices for the top databases I’ve worked with to implement GraphQL against. I have tons more to add, but that’s enough detail for a single post! Suffice it to say, GraphQL can provide extensive capabilities with these various data sources.

GraphQL Schema Standards, Patterns, & Practices

When planning a GraphQl Schema design, choosing appropriate type names and casing conventions is essential for creating a clear and consistent API. With no industry-wide standard, there are some common practices and recommendations that I tend to follow when setting up new schema and related project assets:

  1. Type Names: Use descriptive and meaningful names for GraphQL types. Type names should represent the data they hold or the entities they represent. For example, if you have a type to represent a user, name it User.
  2. Nest Objects/Types: When naming nested types the naming can become more complex. In this case it is sometimes important to put a nested objects type name on the parent type or vice versa to signify where it sits within a conceptual structure. For example:
type User {
  id: ID!
  name: String!
  address1: Address!  // Notice there is a 1 and 2 address, to which the name differentiates since it is the same nested object, but clearly two differnt addresses.
  address2: Address
  account: Account
}

type Address {
  id: ID!
  address: String!
  userId: ID  // This would be the user the address is related to. If empty, the address wouldn't be related to a specific user.
}

type Account {
  id: ID!
  title: String!
  associatedAccount: Account  // Notice this account name is a compound name, making it more complex, to differentiate it clearly from the "account" that a user might have.
}
  1. Naming Conventions: Stick to a consistent naming convention throughout your schema. There are two popular conventions that I always stick to. Even more specifically I tend to follow whatever – if known – the database naming convention follows, like I detailed here and here. The two top choices:
    • PascalCase: Capitalize the first letter of each word, including the first word. For example: UserProductCategoryOrderDetails.
    • camelCase: Start with a lowercase letter and capitalize the first letter of each subsequent word. For example: userproductCategoryorderDetails. Choose the convention that aligns better with your team’s preferences and the overall codebase.
  2. Avoid Abbreviations: Try to avoid abbreviations in type names unless they are widely known and commonly used. Clear and readable type names make it easier for other developers to understand your schema. I mean, abbreviations died out with Visual Basic pre-.NET right? Ok, I guess that hangs on like all upper COBOL but you get the point, just cut out abbreviations in names across the board! 👍🏻
  3. Singular vs. Plural: Use singular names for types representing single entities (e.g., UserProductCategory) and plural names for types representing collections of those entities (e.g., usersproductscategories).
  4. Enumeration Types: For enumeration types (enums), use singular names, and use uppercase letters for the values. For example:
enum UserRole {
  ADMIN
  CUSTOMER
  GUEST
}
  1. Interfaces and Unions: For interfaces and unions, use descriptive and noun-based names that indicate the common traits of the types they include. For example:
interface Vehicle {
  id: ID!
  brand: String!
}

type Car implements Vehicle {
  id: ID!
  brand: String!
  model: String!
}

type Bike implements Vehicle {
  id: ID!
  brand: String!
  color: String!
}
  1. Field Names: Follow similar naming conventions for field names within types. Use descriptive and concise names that indicate the data they represent. For example, prefer firstName over fn or fName.
  2. Boolean Fields: For boolean fields, use names that suggest a yes/no question and use words like ishas, or should. For example: isActivehasPermissionshouldProcess.

These guidelines have provided a helpful starting point for me, but the most crucial aspect is to maintain consistency across the entire schema and within the team’s development practices. By creating a well-designed and consistently named schema, you make it easier for other developers to understand, maintain, and extend the API efficiently.

For additional ideas and standards around GraphQL development, check out the following posts I’ve written.

W Edwards Deming’s 14 Points

W. Edwards Deming, an electrical engineer, statistician (i.e. an AI expert!), professor, author, lecturer, and management consultant wrote some extremely prescient and wise things in his day. He became pivotal in how statistical analysis could be used to get better quality control. In the 1930s, and in turn these methods helped immensely in post- World War II Japan to rebuild it’s devastated economy.

Of the many things he write, one was a listing of 14 points. Which people today really ought to take a good long hard comprehensive read of. We routinely mess up and fail in so many ways by ignoring or just being oblivious to many of the lessons learned, and lessons Deming would teach us to avoid those failures.

  1. Create a constant purpose toward improvement.
    • Plan for quality in the long term.
    • Resist reacting with short-term solutions.
    • Don’t just do the same things better – find better things to do.
    • Predict and prepare for future challenges, and always have the goal of getting better.
  2. Adopt the new philosophy.
    • Embrace quality throughout the organization.
    • Put your customers’ needs first, rather than react to competitive pressure – and design products and services to meet those needs.
    • Be prepared for a major change in the way business is done. It’s about leading, not simply managing.
    • Create your quality vision, and implement it.
  3. Stop depending on inspections.
    • Inspections are costly and unreliable – and they don’t improve quality, they merely find a lack of quality.
    • Build quality into the process from start to finish.
    • Don’t just find what you did wrong – eliminate the “wrongs” altogether.
    • Use statistical control methods – not physical inspections alone – to prove that the process is working.
  4. Use a single supplier for any one item.
    • Quality relies on consistency – the less variation you have in the input, the less variation you’ll have in the output.
    • Look at suppliers as your partners in quality. Encourage them to spend time improving their own quality – they shouldn’t compete for your business based on price alone.
    • Analyze the total cost to you, not just the initial cost of the product.
    • Use quality statistics to ensure that suppliers meet your quality standards.
  5. Improve constantly and forever.
    • Continuously improve your systems and processes. Deming promoted the Plan-Do-Check-Act  approach to process analysis and improvement.
    • Emphasize training and education so everyone can do their jobs better.
    • Use kaizen as a model to reduce waste and to improve productivity, effectiveness, and safety.
  6. Use training on the job.
    • Train for consistency to help reduce variation.
    • Build a foundation of common knowledge.
    • Allow workers to understand their roles in the “big picture.”
    • Encourage staff to learn from one another, and provide a culture and environment for effective teamwork.
  7. Implement leadership.
    • Expect your supervisors and managers to understand their workers and the processes they use.
    • Don’t simply supervise – provide support and resources so that each staff member can do his or her best. Be a coach instead of a policeman.
    • Figure out what each person actually needs to do his or her best.
    • Emphasize the importance of participative management and transformational leadership.
    • Find ways to reach full potential, and don’t just focus on meeting targets and quotas.
  8. Eliminate fear.
    • Allow people to perform at their best by ensuring that they’re not afraid to express ideas or concerns.
    • Let everyone know that the goal is to achieve high quality by doing more things right – and that you’re not interested in blaming people when mistakes happen.
    • Make workers feel valued, and encourage them to look for better ways to do things.
    • Ensure that your leaders are approachable and that they work with teams to act in the company’s best interests.
    • Use open and honest communication to remove fear from the organization.
  9. Break down barriers between departments.
    • Build the “internal customer” concept – recognize that each department or function serves other departments that use their output.
    • Build a shared vision.
    • Use cross-functional teamwork to build understanding and reduce adversarial relationships.
    • Focus on collaboration and consensus instead of compromise.
  10. Get rid of unclear slogans.
    • Let people know exactly what you want – don’t make them guess. “Excellence in service” is short and memorable, but what does it mean? How is it achieved? The message is clearer in a slogan like “You can do better if you try.”
    • Don’t let words and nice-sounding phrases replace effective leadership. Outline your expectations, and then praise people face-to-face for doing good work.
  11. Eliminate management by objectives.
    • Look at how the process is carried out, not just numerical targets. Deming said that production targets encourage high output and low quality.
    • Provide support and resources so that production levels and quality are high and achievable.
    • Measure the process rather than the people behind the process.
  1. Remove barriers to pride of workmanship.
    • Allow everyone to take pride in their work without being rated or compared.
    • Treat workers the same, and don’t make them compete with other workers for monetary or other rewards. Over time, the quality system will naturally raise the level of everyone’s work to an equally high level.
  2. Implement education and self-improvement.
    • Improve the current skills of workers.
    • Encourage people to learn new skills to prepare for future changes and challenges.
    • Build skills to make your workforce more adaptable to change, and better able to find and achieve improvements.
  3. Make “transformation” everyone’s job.
    • Improve your overall organization by having each person take a step toward quality.
    • Analyze each small step, and understand how it fits into the larger picture.
    • Use effective change management principles to introduce the new philosophy and ideas in Deming’s 14 points.

I’ve reposted these here, as in some forthcoming posts I’ll be referring back to these and wanted to ensure that the reference was easily navigable to.

GraphQL Pagination with Java Spring Boot’s “GraphQL for Spring”

There are several different approaches for implementing pagination in GraphQL, and specifically with Java Spring Boot. Here are these commonly used patterns for paging in APIs:

  1. Offset Pagination:
    • This pattern uses an offset and limit approach, where you specify the starting offset (number of records to skip) and the maximum number of records to return.
    • Example parameters: offset=0 and limit=10
  2. Cursor-based Pagination:
    • This pattern uses a cursor (typically an encoded value representing a record) to determine the position in the dataset.
    • The cursor can be an ID, a timestamp, or any other value that uniquely identifies a record.
    • Example parameters: cursor=eyJpZCI6MX0= and limit=10
  3. Page-based Pagination:
    • This pattern divides the dataset into pages, each containing a fixed number of records.
    • It uses page numbers to navigate through the dataset, typically with links or metadata indicating the previous, next, and current pages.
    • Example parameters: page=1 and size=10
  4. Time-based Pagination:
    • This pattern uses time-based boundaries, such as a start and end timestamps, to fetch records within a specific time range.
    • It is commonly used in scenarios where the dataset is time-ordered, such as logs or social media posts.
    • Example parameters: start_time=1621234567 and end_time=1622345678
  5. Keyset Pagination:
    • This pattern relies on ordering the dataset by one or more columns and using the column values as the paging keys.
    • Each page request includes the last record’s key from the previous page, and the API returns records greater than that key.
    • It provides efficient pagination for large datasets with indexed columns.
    • Example parameters: last_key=12345 and limit=10
  6. Combination of Patterns:
    • You can also combine different pagination patterns based on the requirements of your API and the nature of the data being paginated.
    • For example, you might use cursor-based pagination for real-time updates and keyset pagination for efficient retrieval of large datasets.

The type of pattern to use depends on numerous factors like the size of the dataset, ordering requirements, and related performance characteristics. This post doesn’t cover the logic or details needed to determine the type of paging to use, just the options that are available. With that time to get into paging! 👊🏻

Here’s an example of how you can write – generally – a Java Spring Boot GraphQL API with paging for a “Customer” object:

  1. Set up the project:
    • Create a new Spring Boot project in your preferred IDE.
    • Add the necessary dependencies to your pom.xml file:
      • Spring Boot Starter Web
      • Spring Boot Starter Data JPA
      • GraphQL Java Tools
      • GraphQL Java Spring Boot Starter
  2. Define the Customer entity:
    • Create a new class named Customer with the following fields:
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;

@Entity
public class Customer {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long customerId;
    private String firstName;
    private String lastName;
    private String customerDetails;
    private Integer customerAccountId;
    private Integer customerSalesId;
    private Long engId;
    private Long forgoId;

    // Constructors, getters, and setters
}
  1. Set up the Customer repository:
  • Create a new interface named CustomerRepository that extends JpaRepository<Customer, Long>.
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface CustomerRepository extends JpaRepository<Customer, Long> {
}
  1. Create the GraphQL schema:
  • Create a new file named schema.graphqls under the resources directory.
  • Define the GraphQL schema with the required types, queries, and mutations:
type Customer {
  customerId: ID!
  firstName: String!
  lastName: String!
  customerDetails: String!
  customerAccountId: Int!
  customerSalesId: Int!
  engId: ID!
  forgoId: ID!
}

type Query {
  getCustomers(page: Int!): [Customer!]!
}

schema {
  query: Query
}
  1. Implement the GraphQL resolver:
  • Create a new class named GraphQLResolver and define the resolver methods.
import com.coxautodev.graphql.tools.GraphQLQueryResolver;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.util.List;

@Component
public class GraphQLResolver implements GraphQLQueryResolver {
    private final CustomerRepository customerRepository;

    @Autowired
    public GraphQLResolver(CustomerRepository customerRepository) {
        this.customerRepository = customerRepository;
    }

    public List<Customer> getCustomers(int page) {
        int pageSize = 42;
        int offset = (page - 1) * pageSize;
        return customerRepository.findAll(PageRequest.of(offset, pageSize)).getContent();
    }
}
  1. Run the application:
    • Run the Spring Boot application.
    • Navigate to http://localhost:8080/graphql to access the GraphQL Playground.
  2. Testing the API:
    • Use the following query in the GraphQL Playground to fetch customers with pagination:
query {
  getCustomers(page: 1) {
    customerId
    firstName
    lastName
    customerDetails
    customerAccountId
    customerSalesId
    engId
    forgoId
  }
}

Replace page: 1 with the desired page number to retrieve different sets of customers.


Page-based + Caching

Previous Page, Next Page, and Current Page Model

  1. Modify the GraphQL schema:
    • Update the getCustomers query in the schema.graphqls file to include the new pagination fields:
type CustomerConnection {
  pageInfo: PageInfo!
  edges: [CustomerEdge!]!
}

type CustomerEdge {
  cursor: ID!
  node: Customer!
}

type PageInfo {
  startCursor: ID
  endCursor: ID
  hasPreviousPage: Boolean!
  hasNextPage: Boolean!
}

type Query {
  getCustomers(page: Int!): CustomerConnection!
}

schema {
  query: Query
}
  1. Update the GraphQL resolver:
    • Modify the GraphQLResolver class to include the new pagination logic and return the CustomerConnection type:
import com.coxautodev.graphql.tools.GraphQLQueryResolver;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.PageRequest;
import org.springframework.stereotype.Component;

import java.util.List;
import java.util.stream.Collectors;

@Component
public class GraphQLResolver implements GraphQLQueryResolver {
    private final CustomerRepository customerRepository;

    @Autowired
    public GraphQLResolver(CustomerRepository customerRepository) {
        this.customerRepository = customerRepository;
    }

    public CustomerConnection getCustomers(int page) {
        int pageSize = 42;
        int offset = (page - 1) * pageSize;

        List<Customer> customers = customerRepository.findAll(PageRequest.of(offset, pageSize)).getContent();
        List<CustomerEdge> customerEdges = customers.stream()
                .map(customer -> new CustomerEdge(String.valueOf(customer.getCustomerId()), customer))
                .collect(Collectors.toList());

        boolean hasPreviousPage = page > 1;
        boolean hasNextPage = customers.size() == pageSize;

        String startCursor = customerEdges.isEmpty() ? null : customerEdges.get(0).getCursor();
        String endCursor = customerEdges.isEmpty() ? null : customerEdges.get(customerEdges.size() - 1).getCursor();

        PageInfo pageInfo = new PageInfo(startCursor, endCursor, hasPreviousPage, hasNextPage);
        return new CustomerConnection(pageInfo, customerEdges);
    }
}
  1. Define additional classes:
    • Create the following additional classes to support the new pagination model:
public class CustomerConnection {
    private final PageInfo pageInfo;
    private final List<CustomerEdge> edges;

    public CustomerConnection(PageInfo pageInfo, List<CustomerEdge> edges) {
        this.pageInfo = pageInfo;
        this.edges = edges;
    }

    public PageInfo getPageInfo() {
        return pageInfo;
    }

    public List<CustomerEdge> getEdges() {
        return edges;
    }
}

public class CustomerEdge {
    private final String cursor;
    private final Customer node;

    public CustomerEdge(String cursor, Customer node) {
        this.cursor = cursor;
        this.node = node;
    }

    public String getCursor() {
        return cursor;
    }

    public Customer getNode() {
        return node;
    }
}

public class PageInfo {
    private final String startCursor;
    private final String endCursor;
    private final boolean hasPreviousPage;
    private final boolean hasNextPage;

    public PageInfo(String startCursor, String endCursor, boolean hasPreviousPage, boolean hasNextPage) {
        this.startCursor = startCursor;
        this.endCursor = endCursor;
        this.hasPreviousPage = hasPreviousPage;
        this.hasNextPage = hasNextPage;
    }

    public String getStartCursor() {
        return startCursor;
    }

    public String getEndCursor() {
        return endCursor;
    }

    public boolean isHasPreviousPage() {
        return hasPreviousPage;
    }

    public boolean isHasNextPage() {
        return hasNextPage;
    }
}
  • Inspect the pageInfo field to access the pagination information.
  • The edges field contains the list of customers with their respective cursors.

Offset Pagination

  1. Modify the GraphQL resolver:
  • Update the getCustomers method in the GraphQLResolver class to accept an additional parameter for the page size and offset:
import com.coxautodev.graphql.tools.GraphQLQueryResolver;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.util.List;

@Component
public class GraphQLResolver implements GraphQLQueryResolver {
    private final CustomerRepository customerRepository;

    @Autowired
    public GraphQLResolver(CustomerRepository customerRepository) {
        this.customerRepository = customerRepository;
    }

    public List<Customer> getCustomers(int pageSize, int offset) {
        return customerRepository.findAll(PageRequest.of(offset, pageSize)).getContent();
    }
}
  1. Update the GraphQL schema:
  • Modify the getCustomers query in the schema.graphqls file to include the additional parameters for page size and offset:
type Query {
  getCustomers(pageSize: Int!, offset: Int!): [Customer!]!
}

schema {
  query: Query
}
  1. Run the application and test the API:
  • Run the Spring Boot application.
  • Use the following query in the GraphQL Playground to fetch customers with offset pagination:
query {
  getCustomers(pageSize: 42, offset: 0) {
    customerId
    firstName
    lastName
    customerDetails
    customerAccountId
    customerSalesId
    engId
    forgoId
  }
}
  • Adjust the values of pageSize and offset as needed to navigate through the dataset.
  • For example, to retrieve the next page, set offset to 42 (assuming pageSize is 42).

Page-based Pagination

  1. Modify the GraphQL resolver:
  • Update the getCustomers method in the GraphQLResolver class to accept an additional parameter for the page number and page size:
import com.coxautodev.graphql.tools.GraphQLQueryResolver;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.stereotype.Component;

@Component
public class GraphQLResolver implements GraphQLQueryResolver {
    private final CustomerRepository customerRepository;

    @Autowired
    public GraphQLResolver(CustomerRepository customerRepository) {
        this.customerRepository = customerRepository;
    }

    public Page<Customer> getCustomers(int pageNumber, int pageSize) {
        return customerRepository.findAll(PageRequest.of(pageNumber - 1, pageSize));
    }
}
  1. Update the GraphQL schema:
  • Modify the getCustomers query in the schema.graphqls file to include the additional parameters for page number and page size:
type CustomerConnection {
  pageInfo: PageInfo!
  edges: [CustomerEdge!]!
}

type CustomerEdge {
  cursor: ID!
  node: Customer!
}

type PageInfo {
  startCursor: ID
  endCursor: ID
  hasPreviousPage: Boolean!
  hasNextPage: Boolean!
}

type Query {
  getCustomers(pageNumber: Int!, pageSize: Int!): CustomerConnection!
}

schema {
  query: Query
}
  1. Update the CustomerConnection and PageInfo classes:
  • Modify the CustomerConnection and PageInfo classes to match the updated schema:
import java.util.List;

public class CustomerConnection {
    private final PageInfo pageInfo;
    private final List<CustomerEdge> edges;

    public CustomerConnection(PageInfo pageInfo, List<CustomerEdge> edges) {
        this.pageInfo = pageInfo;
        this.edges = edges;
    }

    public PageInfo getPageInfo() {
        return pageInfo;
    }

    public List<CustomerEdge> getEdges() {
        return edges;
    }
}

public class PageInfo {
    private final String startCursor;
    private final String endCursor;
    private final boolean hasPreviousPage;
    private final boolean hasNextPage;

    public PageInfo(String startCursor, String endCursor, boolean hasPreviousPage, boolean hasNextPage) {
        this.startCursor = startCursor;
        this.endCursor = endCursor;
        this.hasPreviousPage = hasPreviousPage;
        this.hasNextPage = hasNextPage;
    }

    public String getStartCursor() {
        return startCursor;
    }

    public String getEndCursor() {
        return endCursor;
    }

    public boolean isHasPreviousPage() {
        return hasPreviousPage;
    }

    public boolean isHasNextPage() {
        return hasNextPage;
    }
}
  1. Run the application and test the API:
  • Run the Spring Boot application.
  • Use the following query in the GraphQL Playground to fetch customers with page-based pagination:
query {
  getCustomers(pageNumber: 1, pageSize: 42) {
    pageInfo {
      startCursor
      endCursor
      hasPreviousPage
      hasNextPage
    }
    edges {
      cursor
      node {
        customerId
        firstName
        lastName
        customerDetails
        customerAccountId
        customerSalesId
        engId
        forgoId
      }
    }
  }
}
  • Adjust the values of pageNumber and pageSize as needed to navigate through the dataset.
  • The response includes the pageInfo object, which provides information about the current page and pagination state.
  • The edges field contains the list of customers with their respective cursors.

Time-based Pagination

  1. Modify the GraphQL resolver:
  • Update the getCustomers method in the GraphQLResolver class to accept additional parameters for the start time, end time, and page size:
import com.coxautodev.graphql.tools.GraphQLQueryResolver;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.time.LocalDateTime;
import java.util.List;

@Component
public class GraphQLResolver implements GraphQLQueryResolver {
    private final CustomerRepository customerRepository;

    @Autowired
    public GraphQLResolver(CustomerRepository customerRepository) {
        this.customerRepository = customerRepository;
    }

    public List<Customer> getCustomers(LocalDateTime startTime, LocalDateTime endTime, int pageSize) {
        return customerRepository.findByTimeRange(startTime, endTime, PageRequest.of(0, pageSize));
    }
}
  1. Update the GraphQL schema:
  • Modify the getCustomers query in the schema.graphqls file to include the additional parameters for the start time, end time, and page size:
type Query {
  getCustomers(startTime: String!, endTime: String!, pageSize: Int!): [Customer!]!
}

schema {
  query: Query
}
  1. Update the CustomerRepository:
  • Update the CustomerRepository interface to include a method that queries customers within a specified time range:
import org.springframework.data.domain.Pageable;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;

import java.time.LocalDateTime;
import java.util.List;

public interface CustomerRepository extends JpaRepository<Customer, Long> {

    @Query("SELECT c FROM Customer c WHERE c.timestamp >= :startTime AND c.timestamp <= :endTime")
    List<Customer> findByTimeRange(LocalDateTime startTime, LocalDateTime endTime, Pageable pageable);
}
  1. Run the application and test the API:
  • Run the Spring Boot application.
  • Use the following query in the GraphQL Playground to fetch customers with time-based pagination:
query {
  getCustomers(startTime: "2023-05-01T00:00:00", endTime: "2023-05-17T23:59:59", pageSize: 42) {
    customerId
    firstName
    lastName
    customerDetails
    customerAccountId
    customerSalesId
    engId
    forgoId
  }
}
  • Adjust the values of startTimeendTime, and pageSize as needed to fetch customers within the desired time range.
  • Make sure to provide valid time range values in ISO 8601 format.

Keyset Pagination

  1. Modify the GraphQL resolver:
  • Update the getCustomers method in the GraphQLResolver class to accept additional parameters for the last key and page size:
import com.coxautodev.graphql.tools.GraphQLQueryResolver;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import java.util.List;

@Component
public class GraphQLResolver implements GraphQLQueryResolver {
    private final CustomerRepository customerRepository;

    @Autowired
    public GraphQLResolver(CustomerRepository customerRepository) {
        this.customerRepository = customerRepository;
    }

    public List<Customer> getCustomers(String lastKey, int pageSize) {
        return customerRepository.findNextCustomers(lastKey, pageSize);
    }
}
  1. Update the GraphQL schema:
  • Modify the getCustomers query in the schema.graphqls file to include the additional parameters for the last key and page size:
type Query {
  getCustomers(lastKey: String, pageSize: Int!): [Customer!]!
}

schema {
  query: Query
}
  1. Update the CustomerRepository:
  • Update the CustomerRepository interface to include a method that queries the next set of customers based on the last key:
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;

import java.util.List;

public interface CustomerRepository extends JpaRepository<Customer, Long> {

    @Query("SELECT c FROM Customer c WHERE c.key > :lastKey ORDER BY c.key ASC")
    List<Customer> findNextCustomers(String lastKey, int pageSize);
}
  1. Run the application and test the API:
  • Run the Spring Boot application.
  • Use the following query in the GraphQL Playground to fetch customers with keyset pagination:
query {
  getCustomers(lastKey: "", pageSize: 42) {
    customerId
    firstName
    lastName
    customerDetails
    customerAccountId
    customerSalesId
    engId
    forgoId
  }
}
  • Adjust the value of pageSize as needed to control the number of records per page.
  • The lastKey parameter is used to retrieve the next set of customers based on the provided key. Initially, use an empty string as the lastKey.
  • Subsequent requests can use the last key value received from the previous response to fetch the next set of customers.

Alright, with all those covered – which I mostly just put together as quickly as possible as examples – I had little time to research any of the latest or greatest ways to put these pagniation patterns together specifically with Java Spring Boot. If you’ve got pointers, suggestions, or otherwise, I’d love a critique of my general code slinging in this post. Cheers!

Other GraphQL Standards, Practices, Patterns, & Related Posts