Starting a New Project – Let’s Choose a Tech Stack!

It’s time to start a new project. Because one can never have enough side projects! /s

This particular project I’ll be writing about in this post is derived from the multi-tenant music collector’s database I’ve already started working on. I’ve finally gotten back to it, during a slight break in collecting and music listening, to write up some of my thinking about this particular project.

Stated Objectives For This Application

  1. Personal Reasons: I always like to have side projects that I could make use of myself. Since I’ve recently started collecting music again, and in that am a new collector of vinyl albums, I wanted a better way to organize all that music and the extensive history, members, song, lyrics, and related information about the music and artists.
  2. For Everybody: Beyond the desire to have a well built application to provide the capabilities I’ve described above, I also want to provide this capability to others. In light of that capability, I’ll be designing this application as a multi-tenant application so that you too dear reader, once I get it built can use the application for your own music collection.
  3. Choose The Tech Stack: I’ll need to write this application in something, obviously, so this post is going to cover my reasoning for the tech stack I’m going to use. The application will be built in three core pieces: the database, the services and middle tier layer, and the user interface. I’ll detail each and cover the reasoning for the stack I’ll choose for each section.
Continue reading “Starting a New Project – Let’s Choose a Tech Stack!”

Top 10 GraphQL Anti-patterns IME “The Horror”

While GraphQL provides a flexible and powerful approach to building APIs, there are some common anti-patterns that developers may unintentionally implement when working with GraphQL query resolvers. These anti-patterns – the opposite of yesterday’s top 10 practices – can lead to issues such as performance bottlenecks, security vulnerabilities, or maintenance difficulties. Here are some of the top anti-patterns to avoid:

  1. N+1 Problem: The N+1 problem occurs when resolver functions trigger additional database queries within a loop or for each item in a list. This can result in a large number of database queries, leading to poor performance. Implement data batching techniques using tools like DataLoader to mitigate this issue, to learn more about DataLoader, check out this post.
  2. Over-fetching and Under-fetching: Over-fetching happens when a resolver fetches more data than the client actually needs, resulting in unnecessary data transfer and increased response size. On the other hand, under-fetching occurs when the resolver does not provide enough data to fulfill the client’s request, leading to additional round trips. Design your resolvers carefully to strike the right balance and only fetch the required data.
  3. Resolver Fatigue: Resolver fatigue refers to a scenario where a single GraphQL resolver is responsible for handling a large number of fields or complex logic. This can make the resolver codebase difficult to maintain, understand, and test. Break down your resolvers into smaller, more manageable units to avoid resolver fatigue.
  4. Deep Nesting: GraphQL allows for nested queries, but excessive nesting can lead to performance issues. Deeply nested queries may result in complex resolver logic and multiple database queries. Try to flatten your schema structure and optimize resolver logic to avoid unnecessary complexity.
  5. Lack of Caching: Not implementing caching mechanisms in your resolvers can result in repeated and costly data fetch operations. Introduce caching strategies, such as in-memory caching or distributed caches, to store frequently accessed data and reduce the load on your data sources.
  6. Inefficient Pagination: Pagination is commonly used in GraphQL to handle large datasets. Implementing pagination incorrectly can lead to performance issues and inefficient querying. Use appropriate pagination techniques, like cursor-based pagination, to efficiently retrieve and display data. To read more details on pagination and how it can be applied to GraphQL queries check out this post.
  7. No Rate Limiting: Without proper rate limiting mechanisms, your GraphQL API may be susceptible to abuse and DoS attacks. Implement rate limiting at the resolver or API level to control the number of requests and protect your server resources.
  8. Lack of Input Validation: Failing to validate and sanitize user input can lead to security vulnerabilities, such as SQL injection or unauthorized data access. Validate and sanitize input parameters in your resolvers to prevent these risks.
  9. Monolithic Resolvers: Creating monolithic resolvers that handle multiple unrelated responsibilities can lead to code duplication, reduced reusability, and increased maintenance effort. Follow the single responsibility principle and modularize your resolvers to improve code organization and maintainability.
  10. Insufficient Error Handling: Inadequate error handling in resolvers can result in unhandled exceptions or unclear error messages returned to the client. Implement comprehensive error handling and provide informative error messages to assist client developers in troubleshooting and debugging. For more details on error handling, check out this post.

By avoiding these anti-patterns and following established best practices, you can enhance the performance, security, and maintainability of your GraphQL query resolvers.

10 Best Practices IMHO for GraphQL

Here are 10 best practices for GraphQL accrued from dozens of GraphQL API implementations:

  1. Keep your schema simple: Design your GraphQL schema with a clear and concise structure. Avoid unnecessary complexity and keep it focused on the specific requirements of your application. One great idea is to implement consistent standards to keep your schema simple, read more about those ideas here.
  2. Think about the client’s needs: GraphQL allows clients to specify their data requirements precisely. Collaborate with the client-side developers to understand their needs and design your schema accordingly, minimizing over-fetching or under-fetching of data.
  3. Version your schema: As your application evolves, consider versioning your GraphQL schema to ensure backward compatibility. This allows you to introduce changes without breaking existing client implementations.
  4. Use precise field names: Choose field names that accurately describe the data they represent. Be consistent with your naming conventions and avoid ambiguity to enhance the readability and maintainability of your schema.
  5. Avoid excessive nesting: While GraphQL supports nested queries, avoid deep nesting of fields as it can lead to performance issues and over-fetching of data. Optimize your schema by flattening nested fields when possible.
  6. Implement proper authentication and authorization: GraphQL does not enforce any specific authentication or authorization mechanisms. It is crucial to implement appropriate security measures to protect your GraphQL API endpoints, such as using authentication tokens, access control rules, and rate limiting.
  7. Implement pagination for large datasets: When dealing with large datasets, use pagination techniques (e.g., cursor-based pagination) to efficiently fetch and display data. This helps in improving performance and reduces the load on both the server and the client. For details on paging patterns and implementation details check out this article.
  8. Utilize data loaders: GraphQL data loaders help optimize data fetching by batching and caching requests. Implement data loaders to avoid the N+1 problem, where multiple database queries are triggered for each item in a list. Check out this post for more details.
  9. Document your schema: Provide comprehensive documentation for your GraphQL schema to assist client developers in understanding the available types, fields, and their usage. Clear documentation and standards promotes developer adoption and simplifies integration. Check out this post for more details on GraphQL standards.
  10. Monitor and optimize performance: Regularly monitor and analyze the performance of your GraphQL API. Identify and optimize slow-performing queries, implement caching strategies, and leverage tools like Apollo Engine or persisted queries to improve overall performance.

Remember that these practices may vary depending on the specific requirements and context of your GraphQL implementation. It’s always recommended to stay updated with the latest best practices and community guidelines.

GraphQL Nested Queries, Relationships, and Different Data Sources Practices

When building a GraphQL API with nested queries and relationships – specifically when you’re using a relational database – it’s important to follow best practices to attain efficient and performant data retrieval while preventing overly nested queries. From the GraphQL API perspective, here are some practices to follow:

  1. Use GraphQL Fragments: Fragments allow you to define reusable sets of fields that can be included in multiple queries. This helps avoid duplicating nested fields and keeps queries concise and readable.
  2. Resolve Nested Data Efficiently: Use efficient data fetching techniques to resolve nested data. Techniques like batch loading and data loaders can help avoid the N+1 query problem, where multiple database queries are triggered for each item in a list.
  3. Limit Depth of Nested Queries: Consider setting a maximum allowed depth for nested queries. In some tools this can be set via configuration, and in most language stacks the libraries focused on GraphQL also support various features and capabilities to get this limitation in place. This helps prevent clients from making excessively deep queries that can lead to performance issues that would, for example, incur a 4, 5 or more tables in a single query into the database!
  4. Pagination: For lists of data, use pagination to limit the amount of data returned in a single query. This prevents queries from becoming overly large and ensures efficient data retrieval.
  5. Use Aliases: Aliases allow clients to request the same field multiple times with different arguments. This can help reduce nesting by fetching data for related entities in a single query.
  6. Avoid Deep Nesting: Strive to keep your GraphQL queries shallow and avoid excessive nesting. If a query becomes too nested, it may be an indication that the schema design needs improvement.
  7. Encourage Specific Queries: Instead of relying solely on generic queries, encourage clients to use specific queries tailored to their needs. This can prevent unnecessary data retrieval and reduce the chance of overly nested queries.
  8. Provide Field Arguments: Offer field arguments to allow clients to customize the shape of the data they retrieve. This way, clients can request only the data they need, reducing the risk of getting overly nested responses.
  9. Use @defer and @stream: GraphQL supports deferred and streamed responses. By using these features, you can provide more fine-grained control over data retrieval and prevent unnecessary waiting for nested data.
  10. Educate API Consumers: If you are building a public API, provide clear documentation and examples on how to use the API efficiently. Educate API consumers on best practices for querying data and avoiding overly nested queries.
  11. Performance Testing: Conduct performance testing on your GraphQL API to identify potential bottlenecks and areas of improvement. This can help you optimize the data fetching process and avoid performance issues due to nested queries.

By following these practices, you can ensure that your GraphQL API provides a smooth and efficient experience for clients, while also preventing the negative impact of overly nested queries on server performance.

But what about situations you’re building a GraphQL API that isn’t going to be built on a relational database? Well you’re in luck, because I’ve done this more than once and I’ve got a few patterns you can use to help ensure your services stay up to snuff.

Apache Cassandra & Mongo DB

When you’re using databases like Apache Cassandra (a wide-column store) or MongoDB (a document-oriented database), there are some additional concerns related to nested queries and data modeling that should be taken into account. For example, Mongo can have nesting in the document itself – and it could go deep – while the document could hold significant nesting, depending on how data is stored and modeled in the underlying BSON (Binary JSON). This can add complexities and the data being queried needs to be understood to realize the implications of querying from something like GraphQL.

  1. Data Modeling for Query Support: Unlike relational databases, Cassandra and MongoDB do not support complex JOIN operations, making it essential to design the data model to support the required queries efficiently. This may involve denormalizing data and duplicating information to facilitate query patterns.
  2. No Transactions: Both Cassandra and MongoDB are NoSQL databases and do not support full ACID transactions across multiple documents or rows. As a result, handling complex nested queries across multiple entities may require careful consideration of eventual consistency and data integrity.
  3. Data Duplication for Performance: To optimize queries, you may need to denormalize and duplicate data, leading to increased storage requirements. Balancing query performance with storage efficiency becomes crucial in such cases.
  4. Aggregation Pipeline (MongoDB): When using MongoDB, the Aggregation Pipeline can be powerful for handling complex data processing and nested queries. Understanding and leveraging the aggregation framework effectively can be essential for optimal performance.
  5. Limitations on Nested Arrays: While both databases support nested data structures (arrays or maps), deeply nested arrays can become challenging to query efficiently. Be cautious when modeling highly nested structures, as it can lead to performance issues.
  6. Data Distribution (Cassandra): In Cassandra, data is distributed across nodes based on the partition key. Designing a proper partitioning strategy is crucial to avoid hotspots and ensure even data distribution for queries.
  7. Secondary Indexes (Cassandra): In Cassandra, using secondary indexes to query nested data can be inefficient. It’s generally recommended to design the schema to support the required queries without relying heavily on secondary indexes.
  8. Data Access Patterns: Understand the common access patterns of your application and design the data model accordingly. The database schema should cater to the specific needs of the queries your application will perform most frequently.
  9. Avoiding Unbounded Queries: In NoSQL databases, unbounded queries can lead to performance issues. Consider using pagination or other query optimizations to limit the amount of data retrieved in a single query.
  10. Sharding and Replication: Both Cassandra and MongoDB are designed to scale horizontally. Consider the implications of sharding and replication when dealing with nested queries, as they can impact query performance and data consistency.
  11. Query Modeling: Model your queries to take advantage of database-specific features, like secondary indexes, compound keys, or materialized views, to optimize performance for specific access patterns.

In conclusion, when you’re using databases like Apache Cassandra or MongoDB the flexibility and scalability force a required and careful consideration of data modeling and query design to efficiently handle nested queries. The complexity can often be more extensive than that of a relational database, but the advantages can be compounded by the very nature of the underlying systems. By understanding these database limitations and optimizing the data model to suit the application’s query patterns, you can make the most of these NoSQL databases while mitigating potential performance bottlenecks.

Elasticsearch

Elasticsearch, important to note it not being a database, but more specifically a search engine with respective distributed storage capabilities introduces a whole new realm of considerations. Here are a few I’ve bumped into over the years of implementing GraphQL APIs on engines like Elasticsearch.

  1. Data Indexing: Elasticsearch requires data to be indexed before it can be searched. Designing a proper indexing strategy is crucial to ensure that the data is organized and optimized for search queries, including nested queries.
  2. Nested Documents: Elasticsearch supports nested documents, allowing for complex data structures. However, keep in mind that nested queries can be more resource-intensive than regular queries, so optimizing the data model to minimize unnecessary nesting is important.
  3. Query Complexity: Complex nested queries in Elasticsearch can result in more processing overhead. Strive to keep your queries as simple as possible to improve search performance.
  4. Document Size: Elasticsearch performs best with reasonably sized documents. If your documents are too large or too nested, it can negatively impact performance. Consider flattening nested data if possible.
  5. Index Mapping: Define explicit mappings for your Elasticsearch indices to specify how fields should be indexed and queried. This can help optimize query performance and avoid unexpected behavior.
  6. Filter vs. Query Context: Understand the difference between filter context and query context in Elasticsearch queries. Filters are more efficient for simple binary decisions, while queries are better for scoring and relevance.
  7. Aggregations: Elasticsearch provides powerful aggregation capabilities to analyze and summarize data. However, complex aggregations can be resource-intensive, so use them judiciously.
  8. Scoring and Relevance: Elasticsearch uses scoring algorithms to rank search results based on relevance. Ensure that your queries and data model align with the desired relevance of search results.
  9. Pagination and Sorting: Plan for efficient pagination and sorting of search results. Avoid deep pagination, as it can lead to performance issues.
  10. Sharding and Replication: Elasticsearch is a distributed system that uses sharding and replication to achieve scalability and fault tolerance. Be mindful of the impact of sharding and replication on query performance and data consistency.
  11. Tuning Index Settings: Elasticsearch provides various index-level settings that can affect search performance. Tuning these settings based on your application’s needs can significantly impact query execution times.
  12. Data Modeling for Search: Design the data model in a way that aligns with the search use cases of your application. Consider the types of queries you will be performing frequently and optimize the data model accordingly.
  13. Cluster Health and Monitoring: Keep an eye on the cluster health and performance metrics. Monitor and optimize the performance of your Elasticsearch cluster regularly.
  14. Indexing and Search Performance Trade-offs: The indexing and search performance of Elasticsearch can be influenced by various factors. Understanding the trade-offs between indexing speed and query performance is crucial when designing your application.

Apache Kafka What?

Finally, there is Apache Kafka that comes up every now and again. Even though I haven’t implemented a GraphQL API on Kafka yet, it’s been done and I’ve been privy of the implications. Here are a few best practices I’ve picked up for implementing against Kafka.

  1. Data Synchronization: Decide on the data synchronization approach between Kafka and your GraphQL API. Will your GraphQL API act as a producer, a consumer, or both? Plan how data flows between the two systems to maintain consistency.
  2. Message Format: Define a standardized message format for data exchanged between Kafka and the GraphQL API. This format should be easily interpretable by both systems and include all necessary information for processing.
  3. Schema Evolution: Consider how schema changes in Kafka messages are handled by the GraphQL API. Plan for backward and forward compatibility to avoid breaking the API when message schemas evolve.
  4. Consumer Groups: When consuming data from Kafka, decide on appropriate consumer group configurations to manage the processing of messages efficiently and in parallel.
  5. Event Deduplication: Ensure that your GraphQL API can handle duplicate events from Kafka gracefully to avoid processing the same data multiple times.
  6. Error Handling: Implement robust error handling and retry mechanisms when processing Kafka messages. Handle failures gracefully and avoid data loss.
  7. Message Ordering: Be aware that Kafka does not guarantee strict message ordering across different partitions. Consider how this might impact the ordering of data processed by the GraphQL API.
  8. Throttling and Backpressure: Plan for throttling and backpressure mechanisms to control the rate at which data is consumed from Kafka to prevent overwhelming the GraphQL API with incoming messages.
  9. Security: Secure your Kafka system and the GraphQL API to prevent unauthorized access. Use appropriate authentication and authorization mechanisms to protect data integrity and confidentiality.
  10. Performance Optimization: Optimize the performance of your Kafka consumer and GraphQL API to handle high loads efficiently. Consider batching messages and implementing caching mechanisms when applicable.
  11. Monitoring and Logging: Implement monitoring and logging for both Kafka and the GraphQL API. Track message processing times, error rates, and system health to identify and resolve potential issues.
  12. Integration Testing: Conduct integration testing to ensure seamless communication between Kafka and the GraphQL API. Test different scenarios, such as handling delayed messages and high loads, to validate the system’s behavior.
  13. Versioning and Compatibility: Plan for versioning in both Kafka messages and GraphQL schema. This helps maintain compatibility and allows for smooth changes in both systems over time.
  14. Infrastructure Scalability: Design your Kafka and GraphQL systems with scalability in mind to handle future growth and increased data volumes.

Summary

Alright, that’s a boat load of practices for the top databases I’ve worked with to implement GraphQL against. I have tons more to add, but that’s enough detail for a single post! Suffice it to say, GraphQL can provide extensive capabilities with these various data sources.

GraphQL Schema Standards, Patterns, & Practices

When planning a GraphQl Schema design, choosing appropriate type names and casing conventions is essential for creating a clear and consistent API. With no industry-wide standard, there are some common practices and recommendations that I tend to follow when setting up new schema and related project assets:

  1. Type Names: Use descriptive and meaningful names for GraphQL types. Type names should represent the data they hold or the entities they represent. For example, if you have a type to represent a user, name it User.
  2. Nest Objects/Types: When naming nested types the naming can become more complex. In this case it is sometimes important to put a nested objects type name on the parent type or vice versa to signify where it sits within a conceptual structure. For example:
type User {
  id: ID!
  name: String!
  address1: Address!  // Notice there is a 1 and 2 address, to which the name differentiates since it is the same nested object, but clearly two differnt addresses.
  address2: Address
  account: Account
}

type Address {
  id: ID!
  address: String!
  userId: ID  // This would be the user the address is related to. If empty, the address wouldn't be related to a specific user.
}

type Account {
  id: ID!
  title: String!
  associatedAccount: Account  // Notice this account name is a compound name, making it more complex, to differentiate it clearly from the "account" that a user might have.
}
  1. Naming Conventions: Stick to a consistent naming convention throughout your schema. There are two popular conventions that I always stick to. Even more specifically I tend to follow whatever – if known – the database naming convention follows, like I detailed here and here. The two top choices:
    • PascalCase: Capitalize the first letter of each word, including the first word. For example: UserProductCategoryOrderDetails.
    • camelCase: Start with a lowercase letter and capitalize the first letter of each subsequent word. For example: userproductCategoryorderDetails. Choose the convention that aligns better with your team’s preferences and the overall codebase.
  2. Avoid Abbreviations: Try to avoid abbreviations in type names unless they are widely known and commonly used. Clear and readable type names make it easier for other developers to understand your schema. I mean, abbreviations died out with Visual Basic pre-.NET right? Ok, I guess that hangs on like all upper COBOL but you get the point, just cut out abbreviations in names across the board! 👍🏻
  3. Singular vs. Plural: Use singular names for types representing single entities (e.g., UserProductCategory) and plural names for types representing collections of those entities (e.g., usersproductscategories).
  4. Enumeration Types: For enumeration types (enums), use singular names, and use uppercase letters for the values. For example:
enum UserRole {
  ADMIN
  CUSTOMER
  GUEST
}
  1. Interfaces and Unions: For interfaces and unions, use descriptive and noun-based names that indicate the common traits of the types they include. For example:
interface Vehicle {
  id: ID!
  brand: String!
}

type Car implements Vehicle {
  id: ID!
  brand: String!
  model: String!
}

type Bike implements Vehicle {
  id: ID!
  brand: String!
  color: String!
}
  1. Field Names: Follow similar naming conventions for field names within types. Use descriptive and concise names that indicate the data they represent. For example, prefer firstName over fn or fName.
  2. Boolean Fields: For boolean fields, use names that suggest a yes/no question and use words like ishas, or should. For example: isActivehasPermissionshouldProcess.

These guidelines have provided a helpful starting point for me, but the most crucial aspect is to maintain consistency across the entire schema and within the team’s development practices. By creating a well-designed and consistently named schema, you make it easier for other developers to understand, maintain, and extend the API efficiently.

For additional ideas and standards around GraphQL development, check out the following posts I’ve written.