Splitgraph has been acquired by EDB! Read the blog post.
 
Previous Post
PostgreSQL FDW aggregation pushdown part III: Elasticsearch edition
Mar 7, 2022 · By Peter Neumark
READING TIME: 10 min

Combining multiple GraphQL backends with schema stitching

Read about we use GraphQL schema stitching to provide a single coherent schema for accessing several services with shared types from overlapping GraphQL schemas.

What is Splitgraph?

Splitgraph is building the Unified Data Stack – an integrated and modern solution for working with data without worrying about its infrastructure.

You can try Splitgraph now! Browse the public data catalog to discover 40,000+ repositories to query with SQL, through the web IDE or any Postgres client.

Splitgraph is powered by open standards and simple abstractions, like data images – immutable tables that you can push and pull, or query on the fly.

We're not replacing SQL as our favorite query language anytime soon 😄, but we rely on GraphQL to implement many of Splitgraph's internal APIs. Thanks to GraphQL schema stitching, we were able to add new backend services which extend the set of fields published by older services. Clients don't need to specify which service provides which field, this is determined automatically by the schema stitching logic.

How we use GraphQL at Splitgraph

Projects like Apollo Client, graphql-playground and PostGraphile have made working with GraphQL a smooth experience, but the declarative query language has proven to be even more important than the tooling. It's given us the flexibility to develop services which can be used independently, but also as components implementing the following unified API:

type Query {
  """Returns a single `Repository` identified
     by it's `namespace` and `repositoryName`."""
  metadataRepository(
    namespace: String!,
    repositoryName: String!): Repository
  authorizationRepository(
    namespace: String!,
    repositoryName: String!): Repository
  """Returns a single `User` based on their username."""
  authorizationUser(username: String!): User
  metadataUser(username: String!): User
}

type Repository {
  namespace: String!,
  repositoryName: String!,
  description: String,
  url: String,
  """Lists all users who have permission to read this repository."""
  usersWithReadAccess: [User!]!
}

type User {
  username: String!,
  fullName: String,
  """Lists all repositories this user has permission to read."""
  repositories: [Repository!]!
}

The "dashboard" page which greets users upon login is one of the many clients of this API. It needs to display the current user's name as well a link to each repository they have access to. A single GraphQL query can fetch the required fields:

query Dashboard($username: String!) {
  metadataUser(username: $username) {
    fullName,
    repositories {
      description,
      url
    }
  }
}

Schema stitching in the gateway service

Two separate services store the data requested by the Dashboard query. The client sends the query to the gateway service, which uses schema stitching to combine the APIs of the backing metadata and authorization services into the unified schema shown above. As the gateway service executes the Dashboard query, it queries the backing services.

Metadata service

The metadata service is powered by PostGraphile and predates the authorization service. Before developing the latter, Splitgraph's internal GraphQL API Schema looked similar to the following:

type Repository {
  namespace: String!
  repositoryName: String!
  description: String
  url: String
}

type User {
  username: String!
  fullName: String
}

type Query {
  metadataRepository(
    namespace: String!,
    repositoryName: String!): Repository
  metadataUser(username: String!): User
}

Authorization service

We built the authorization service based on Ory Keto to keep track of user-repository permissions. The authorization service's schema extends the types introduced by the metadata service:

"""This Repository type is merged with the Repository
   type declared in the metadata service schema."""
type Repository {
  """namespace and repositoryName are the key fields
     by which the repository types in the two
     subschemas can be merged."""
  namespace: String!
  repositoryName: String!
  usersWithReadAccess: [User!]!
}

"""Similar to Repository, User will be merged
   with the metadata service's User type."""
type User {
  """username is the key field of User."""
  username: String!
  repositories: [Repository!]!
}

type Query {
  """authorization service specific fields for retrieving
     User and Repository objects."""
  authorizationRepository(
    namespace: String!,
    repositoryName: String!): Repository
  authorizationUser(username: String!): User
}

What is schema stitching?

The main idea is simple: the different fields of a type may be distributed among several services as long as the corresponding subschemas all use the same type name. By default, the "stitched" schema just combines all the fields of any given type from all subschemas, but accidental type merging can be avoided. Single-service GraphQL schemas operate under the closed world assumption that all fields belonging to a type are declared in the schema. Stitching leads to an open world model where any type may be extended with additional fields declared in a new subschema.

At first, the doubled fields in the Query type may seem superfluous. They have the same return types after all: User and Repository. Why they're necessary becomes apparent as soon as we examine how the Dashboard query is processed by the gateway service.

Querying stitched schemas

The gateway service uses the merging flow to execute queries and delegate to the other services. With the proper subschema configuration, the schema stitching code can determine which service should be queried for a particular field.

Consider the steps required to compute results for the Dashboard query for user mrDorp.

  1. The client submits the Dashboard query to the gateway service, which selects the fullName and repositories fields on the result of metadataUser.

  2. Since metadataUser comes from the metadata service's subschema, the gateway queries it for the fullName field.

    metadataUser(username: "mrDorp") {
      __typename
      fullName
      username
    }
    

    The username field is added implicitly since it's the key field used to join the User types in the two services. __typename is always added by the schema stitching code. The response to the query is:

    {
      "data": {
        "metadataUser": {
          "__typename": "User",
          "username": "mrDorp",
          "fullName": "Ralph Dorp"
      }
    }
    
  3. The repositories field of the User type belongs to the authorization service, so the gateway queries it next. The subschema configuration specifies that the top-level authorizationUser field may be queried to get the User fields declared in the service's subschema.

    authorizationUser(username: "mrDorp") {
      username
      repositories {
        __typename
        namespace
        repositoryName
      }
    }
    

    Note that the Dashboard query selected the description and url fields of each Respository object, but these fields are declared by the metadata service schema. In the request headed for the authorization service, all that can be queried are the key fields. Just as in the previous query, the gateway service implicitly adds the username key field to the selection set. The response is the following:

    {
      "data": {
        "authorizationUser": {
          "__typename": "User",
          "username": "mrDorp",
          "repositories": [
            {
              "__typename": "Repository",
              "namespace": "austintexas-gov",
              "repositoryName": "austin-high-school-graduation-rates-xeb7-q8v3"
            },
            {
              "__typename": "Repository",
              "namespace": "bts-gov",
              "repositoryName": "county-transportation-profiles-qdmf-cxm3"
            }
          ]
      }
    }
    
  4. Having obtained the repositories' key fields, the gateway may query additional Repository fields from the metadata service. It consults the stitching configuration for the top-level field to query - in this case metadataRepository.

    metadataRespository(
      namespace: "austintexas-gov",
      repositoryName: "austin-high-school-graduation-rates-xeb7-q8v3") {
        __typename
        namespace
        repositoryName
        description
        url
    }
    
    metadataRespository(
      namespace: "bts-gov",
      repositoryName: "county-transportation-profiles-qdmf-cxm3") {
        __typename
        namespace
        repositoryName
        description
        url
    }
    

    Just as with the query for the authorizationUser field, the key fields (namespace and repositoryName) and __typename are implicitly added to the query to allow merging of result objects. We created a PostGraphile plugin using GraphQL's DataLoader to combine the two queries above into a single GraphQL query (drop us a line if you're interested in using it). Unfortunately, the current schema does not allow for such optimization, so each repository's metadata fields are queried separately. The metadata service responds with:

    {
      "data": {
        "metadataRepository": {
          "__typename": "Repository",
          "namespace": "austintexas-gov",
          "repositoryName": "austin-high-school-graduation-rates-xeb7-q8v3"
          "description": "Graduation rates for Austin high schools for years 2012 to 2016 provided by the Texas Education Agency.",
          "url": "https://www.splitgraph.com/austintexas-gov/austin-high-school-graduation-rates-xeb7-q8v3"
        }
      }
    }
    
    {
      "data": {
        "metadataRepository": {
          "__typename": "Repository",
          "namespace": "bts-gov",
          "repositoryName": "county-transportation-profiles-qdmf-cxm3"
          "description": "Profiles of transportation features of U.S. counties",
          "url": "https://www.splitgraph.com/bts-gov/county-transportation-profiles-qdmf-cxm3"
        }
      }
    }
    
  5. The gateway has all the fields required to respond to the Dashboard query. Key fields which weren't selected in the Dashboard query are discarded once the objects have been merged.

    {
      "data": {
        "metadataUser": {
          "__typename": "User",
          "fullName": "Ralph Dorp",
          "repositories": [
            {
              "__typename": "Repository",
              "description": "Graduation rates for Austin high schools for years 2012 to 2016 provided by the Texas Education Agency.",
              "url": "https://www.splitgraph.com/austintexas-gov/austin-high-school-graduation-rates-xeb7-q8v3"
            },
            {
              "__typename": "Repository",
              "description": "Profiles of transportation features of U.S. counties",
              "url": "https://www.splitgraph.com/bts-gov/county-transportation-profiles-qdmf-cxm3"
            }
          ]
        }
      }
    }
    

Configuring subschemas to stitch

While processing the Dashboard query, the gateway had to decide which service to query for each requested field. This is determined by the subschema configuration. Below is the simplified code for the gateway service, which is based on the TypeScript examples for remote schemas and schema stitching:

import { stitchSchemas } from "@graphql-tools/stitch";
import { fetch } from "cross-fetch";
import { print, GraphQLSchema } from "graphql";
import { introspectSchema, wrapSchema } from "@graphql-tools/wrap";
import type { AsyncExecutor } from "@graphql-tools/utils/executor";
import { SubschemaConfig } from "@graphql-tools/delegate";

const authorizationServiceUrl = "http://api.splitgrph.com/authorzation/graphql";
const metadataServiceUrl = "http://api.splitgrph.com/metadata/graphql";

const makeRemoteExecutor: (serviceUrl: string) => AsyncExecutor = (
  serviceUrl: string
) => async ({ document, variables }) => {
  const query = print(document);
  const result = await fetch(serviceUrl, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ query, variables }),
  });
  return result.json();
};

const defineSubSchema = async (
  serviceUrl: string,
  merge?: SubschemaConfig["merge"]
): Promise<SubschemaConfig> => {
  const executor = makeRemoteExecutor(serviceUrl);
  const schema = wrapSchema({
    schema: await introspectSchema(executor),
    executor,
  });
  return { executor, schema, merge };
};

export const makePublicSchema = async (): Promise<GraphQLSchema> =>
  stitchSchemas({
    subschemas: [
      await defineSubSchema(authorizationServiceUrl, {
        User: {
          fieldName: "authorizationUser",
          selectionSet: "{ username }",
          args: ({ username }) => ({ username }),
        },
        Repository: {
          fieldName: "authorizationRepository",
          selectionSet: "{ namespace repositoryName }",
          args: ({ namespace, repositoryName }) => ({
            namespace,
            repositoryName,
          }),
        },
      }),
      await defineSubSchema(metadataServiceUrl, {
        User: {
          fieldName: "metadataUser",
          selectionSet: "{ username }",
          args: ({ username }) => ({ username }),
        },
        Repository: {
          fieldName: "metadataRepository",
          selectionSet: "{ namespace repositoryName }",
          args: ({ namespace, repositoryName }) => ({
            namespace,
            repositoryName,
          }),
        },
      }),
    ],
  });

The GraphQLSchema object returned by makePublicSchema can be used to create an HTTP GraphQL endpoint.

In order to query a subschema, it's URL must be known. Conveniently, a GraphQL query can be used to obtain an endpoint's schema. This is what defineSubSchema() does when it calls introspectSchema(). The object passed to defineSubSchema() is the merge configuration. Consider the fragment,

  await defineSubSchema(authorizationServiceUrl, {
    User: {
      fieldName: "authorizationUser",
      selectionSet: "{ username }",
      args: ({ username }) => ({ username }),
    }

It declares: when fields of the User type are queried which were declared by the authorization service schema, they can be obtained by passing the username field of the pre-existing User object as the argument to the top-level authorizationUser field.

This was employed in step 2 of the query process described earlier, when the User.repository field was merged with the User object obtained from the metadata service in the previous step.

fieldName specifies the field of the top-level Query type to consult. selectionSet defines the set of fields to be selected from the existing User object to get authorizationUser's arguments, typically the key field or fields. selectionSet could contain fields not yet available on the current User instance. In such a case, an additional subschema query would be made before the field referred to by fieldName is queried.

The args function transforms the existing fields on the User object to be used as arguments to authorizationUser. In most cases, this is the identity function, but it can be useful for things like string to number conversion, especially when stitching schemas for APIs outside of our control. For example, one could extend the types in the official Shopify GraphQL Schema with custom fields routed to an internal service.

It would also be possible to start by querying the authorization service:

query Dashboard2($username: String!) {
  authorizationUser(username: $username) {
    fullName,
    repositories {
      description,
      url
    }
  }
}

The steps to executing the Dashboard2 query would be:

  1. Query the authorization service for the username key field and the repositories field.
  2. Merge the User.fullName field to the existing User object instance by querying metadataUser.
  3. Merge url and description fields with the existing Repository objects using metadataRepository.

Conclusion

Schema stitching has enabled us to extend an existing service's GraphQL API with new fields served by a new service. The "stitching" is seamless in the sense that clients don't need to know which field belongs to which service's subschema. We evolved our API without affecting earlier queries.

Get your own private Splitgraph data portal