Splitgraph has been acquired by EDB! Read the blog post.
 
Previous Post
Building a GPT-powered agent to answer questions using data from Splitgraph
Jun 19, 2023 · By Grzegorz Rozdzialik
READING TIME: 14 min

Keeping Apollo Cache up-to-date after mutations

A discussion of approaches to keep Apollo cache consistent with the API data after invoking GraphQL mutations.

Apollo Client is a JavaScript GraphQL client commonly used in React applications. It facilitates communication with a GraphQL backend in a convenient way and enhances user experience thanks to caching.

In this article, we go over ways to keep the Apollo Cache up-to-date after successful data mutations.

Apollo's normalized cache and its benefits

One of Apollo's features is a normalized cache. When configured right, it detects situations where 2 different query results returned parts of the same entity, and merges those results into a single object in memory.

This means that when there is an overlap between 2 queries, the result of the later query can update the values read by the earlier query, if those fields happened to have changed in the meantime.

Example data fetching when using Apollo Client. The result of the second query
is merged with the result of the first query. This lets the component that
triggered the first query receive the latest data fetched by the second
query.

Notice how the UserComponent updated and read the latest age (pointed to by the blue arrow) that was fetched by a query initiated by a different component (UserAgeComponent). This is thanks to Apollo's knowledge that both queries refer to the same object in memory, so updates propagate to all readers, provided the query fetch policy permits using the cache.

The Apollo cache is also normalized. This means multiple different fields can refer to the same object in memory.

A graph of cached entities showing a user Alice who likes two bands:
CacheLovers (with 2 songs) and CFowl (with one song). CFowl is also Alice's
favorite
band.)

The band CFowl is referenced both in the list of Alice's likedBands, and as her favoriteBand. Apollo can identify that it is, in fact, the same band, and thus stores that as a single object in memory, rather than two different objects that just happen to look the same.

When CFowl releases a new song, if Apollo Cache learns about it, all components that subscribed to that information will get updated, regardless of how Apollo found out about the songs property having changed.

Such a normalized cache is useful when building user interfaces. We often display the same data in multiple places and benefit from storing the freshest data for components to use, ensuring UI consistency.

Note that for normalization to work, we need to tell Apollo how to identify objects of a given type, so it can check if 2 objects should be treated as the same entity conceptually. The documentation covers that in great detail.

In the case of Splitgraph's GraphQL API, most of the time we can rely on the nodeId field of objects, but we could have also used the same fields that we use for the selectionSet during schema-stitching (for example, cache User objects based on their username).

If you are interested in finding our more about how Apollo Cache works, we recommend this guide from the authors of the apollo-augmented-hooks library.

Keeping the cache up-to-date

All would be well, except for the fact most applications are not read-only. Users can change the content of the application and expect it to propagate throughout the UI.

As we all know, cache invalidation is one of the 2 hard problems in computer science (others being naming things, and off-by-1 errors, pun intended).

More concretely, when a mutation successfully finishes, we should update the cache so components display the updated data. Apollo documented available approaches:

  1. Including modified data in the mutation's response
  2. Updating the cache directly in the frontend
  3. Refetching queries

Let's go over these and see their pros and cons.

Including modified data in the mutation's response

Each GraphQL mutation exposes information back to the caller. Mutation responses are merged into the cache the same way query responses are.

This means that for simple mutations that update existing objects, it should be enough to return the updated data and let Apollo merge it into the normalized cache.

For example, when updating the description of a repository, notice how the description property is included in the repository in the mutation response:

mutation UpdateRepositoryDescription(
  $namespace: String!
  $repository: String!
  $description: String!
) {
  updateRepositoryDescription(
    namespace: $namespace
    repository: $repository
    description: $description
  ) {
    repository {
      id
      description
    }
  }
}

This approach requires no additional network requests, since the updated data is included in the response to the original mutation request. There is also no extra work required by the application - the cache is updated automatically.

It is not the Holy Grail, though. It is difficult to express insertions or deletions in this way.

For example, when creating a new repository, the mutation response would have to contain all the possible lists in which the repository could exist. The situation is similar when deleting repositories. This could easily lead to overfetching, since the UI most likely does not need all these lists. It just needs to update the lists is already had in the cache.

Updates to lists are easier to express using other approaches of Apollo cache updates. Let's move on to them now.

Updating the cache directly in the frontend

When executing a mutation in Apollo, we can supply an update function. It will be called with the mutation response and can modify the normalized data in the cache in any way we want. For example, this lets us add or remove objects from lists.

The function runs in the browser and does not require any additional network calls, which makes it fast (unless you do lots of computations in your update function, but still, it is synchronous).

Again, not the Holy Grail. The syntax for cache updates is quite verbose. We also run the risk of updating the cache in a way that differs from what the API would return. Moreover, we need to modify all the data that the mutation could have changed. This increases the risk of displaying inconsistent data.

Let's see if these problems could be solved by refetching queries.

Refetching queries

The third way of updating the cached data that Apollo supports is refetching queries. After a mutation is done, Apollo can refetch a set of queries to read the latest data.

This approach results in the highest consistency with the API, since we refresh the data directly from the source. The cost is that there are additional network requests involved, which use up the user's bandwidth, and take time, during which the user sees stale information.

There are a two main ways to specify which queries to refetch, each with its upsides and downsides.

Manually specify queries to refetch

The most basic way is to manually specify the queries to refetch in the refetchQueries array. It initially requires very little work, since you only need to specify either query names, or can refetch active or all queries (using the apolloClient.refetchQueries function).

The ease of use comes at a maintenance cost:

  • You introduce coupling between the mutation and individual queries that will be refetched. When adding a new query (for example, in a new component), it needs to be added to this list of queries to refetch.
  • There is a risk of overfetching. These queries will always be refetched, even if they are no longer referencing the modified fields.

Both of these downsides make this solution hard to maintain. To achieve full consistency, each mutation needs to be reviewed when adding, modifying, or removing GraphQL queries. This adds coupling between components that normally should not know about each other.

All in all, it is easy to start with this approach of manually specifying queries to refetch, and we recommend switching to the other solution (described below) as the application grows.

Mark cached fields as invalid

The other approach is a twist of the earlier approach of updating the cache in the frontend. The difference is that instead of modifying the cached values in-place, we mark them as invalid. Apollo will then refetch queries that included these fields in their previous results.

await client.refetchQueries({
  updateCache(cache) {
    cache.modify({
      id: cache.identify(apple),
      fields: {
        color(_value, { INVALIDATE }) {
          return INVALIDATE;
        },
      },
    });
  },
});

This approach scales better to more queries and mutations, and it does not introduce additional coupling between components. If the updateCache function is implemented right and correctly approximates the effects of the mutation, Apollo will then refetch all queries which returned those invalidated fields.

The ease of use of this method depends on the type of mutation. Let's see that on an example. Consider the following GraphQL response:

{
  "data": {
    "seafowlDatabase": {
      "nodeId": "WyJzZWFmb3dsX2RhdGFiYXNlIiwiR2VsaW8iXQ==",
      "name": "my-app",
      "schemas": [
        {
          "nodeId": "WyJzZWFmb3dsX3NjaGVtYSIsIkdlbGlvIiwiR2VsaW8iXQ==",
          "name": "maps",
          "tables": [
            {
              "nodeId": "WyJzZWFmb3dsX3RhYmxlIiwiR2VsaW8iLCJHZWxpbyIsImNpdGllcyJd",
              "name": "cities",
              "__typename": "SeafowlTable"
            },
            {
              "nodeId": "WyJzZWFmb3dsX3RhYmxlIiwiR2VsaW8iLCJhYmMiLCIxMiJd",
              "name": "lakes",
              "__typename": "SeafowlTable"
            }
          ],
          "__typename": "SeafowlSchema"
        }
      ],
      "__typename": "SeafowlDatabase"
    }
  }
}

It represents a Seafowl database with its contents. It is used in the sidebar of the Splitgraph Console.

Updating an existing object

Let's suppose we want to rename the lakes table to waters and assume that the API cannot include the updated object in the mutation response. Thus, we send a mutation:

mutation UpdateTableName {
  updateTableName(
    databaseName: "my-app"
    schemaName: "maps"
    oldTableName: "lakes"
    newTableName: "waters"
  ) {
    mutationId
  }
}

Then, after the mutation is done, we can refetch all queries that referenced the name field of that affected table:

await client.refetchQueries({
  updateCache(cache) {
    cache.modify({
      id: cache.identify(lakesTable),
      fields: {
        name: (_value, { INVALIDATE }) => INVALIDATE,
      },
    });
  },
});

This seems like overkill, because we could have updated the cache directly instead of refetching the queries. Still, refetching queries after updates of existing objects is easy, assuming we can cache.identify the lakesTable.

Deleting an existing object

Let's consider a different type of mutation. Instead of updating the table name, we want to delete the lakes table altogether. We send the following mutation:

mutation DeleteTable {
  deleteTable(databaseName: "my-app", schemaName: "maps", tableName: "lakes") {
    mutationId
  }
}

Again, after the mutation is done, we can refetch all queries that referenced any field of the deleted table:

await client.refetchQueries({
  updateCache(cache) {
    cache.modify({
      id: cache.identify(table),
      // NOTE: notice `fields` is a function, not an object.
      // This way we invalidate ALL table fields without
      // knowing their names.
      fields: (_value, { INVALIDATE }) => INVALIDATE,
    });
  },
});

This is just right. Apollo finds which queries referenced that deleted table, and it refetches them.

Note that returning the modified objects (in this case, SeafowlSchemas that contained the deleted lakes table) in the mutation response would not scale well. The deleted table could have appeared in many queries, and be returned by various parents. For example, the lakes table could have been also returned by a query that returns the latest table in the entire Seafowl instance:

{
  "data": {
    "latestTableInInstance": {
      "nodeId": "WyJzZWFmb3dsX3RhYmxlIiwiR2VsaW8iLCJhYmMiLCIxMiJd",
      "name": "lakes",
      "__typename": "SeafowlTable"
    }
  }
}

The deleteTable mutation result would have to contain both the updated maps SeafowlSchema and this latestTableInInstance top-level query. Whether to include one or the other is not known statically, because it depends on what is in the Apollo cache at the time of sending the mutation. Thus, letting Apollo determine the queries to refetch is perfect for mutations that delete objects.

Adding new objects

Mutations that add new objects are the most difficult to represent as cache updates. Apollo does not know in which queries the added object could appear. Thus, we need to determine all possible parents of the added object, identify them in the cache, and invalidate the field that references the added object.

To see that in practice, let's imagine we add a new mountains SeafowlTable using the following mutation:

mutation CreateTable($sourceSQLQuery: String!) {
  createTable(
    databaseName: "my-app"
    schemaName: "maps"
    tableName: "mountains"
    sourceSQLQuery: $sourceSQLQuery
  ) {
    mutationId
  }
}

Now it is our job to identify all objects that could include that mountains table and tell Apollo to refetch queries that reference them. This is a hard problem, because we do not know what the IDs of these parent objects could be. In our example, we need to refetch:

  1. The maps SeafowlSchema
  2. The latestTableInInstance query

Let's try that:

await client.refetchQueries({
  updateCache(cache) {
    cache.modify({
      // TODO: somehow get access to `mapsSeafowlSchema`
      id: cache.identify(mapsSeafowlSchema),
      fields: {
        tables: (_value, { INVALIDATE }) => INVALIDATE,
      },
    });

    // Invalidate the top-level query
    cache.modify({
      fields: {
        latestTableInInstance: (_value, { INVALIDATE }) => INVALIDATE,
      },
    });
  },
});

Doing it right requires having access to the mapsSeafowlSchema object (the parent of the newly-added table) so we can identify it in the cache. Depending on the structure of the code, this may not be trivial if the component triggering the mutation did not require access to the schema. For example, it could have only known the names of the database (my-app) and the SeafowlSchema (maps) in which to create the table.

To identify the parent SeafowlSchema of the added table, it may first need to fetch that SeafowlSchema from the API (which could be cached, in which case this is almost instant, but this is not guaranteed), and only then invalidate it, so Apollo refetches also other queries.

What about other types in the GraphQL schema that can return this new table? They also need to be considered to achieve full consistency. Moreover, when the GraphQL schema changes, the updateCache functions need to be re-evaluated, since there could be now more fields that need to be invalidated when adding a new table.

All of the above means that invalidating the cache after adding new objects in a mutation is a difficult problem to solve correctly.

Caveat with invalidating fields that do not have an active query

There is a small caveat when there is no active query that returned the invalidated field. Invalidating a field will only refetch active queries that reference that field.

Consider the following scenario:

  1. Component A renders and fetches a query.
  2. The query result is cached.
  3. The component A is unmounted (and, thus, the query is unsubscribed).
  4. One field from the query result is invalidated. There are no active queries that returned that field, so no queries are refetched.
  5. Component A is rendered again. It reads the query result from cache. It does not refetch the query, even though it was marked as invalid in an earlier step.

Not refetching the queries despite data being invalidated is a problem that has been reported on GitHub. The workaround involves using the DELETE sentinel value instead of INVALIDATE during cache mutations. This way the value is field is deleted from cache and Apollo is forced to fetch the query, as it will not return partial data (unless you tell it to).

Comparison

Let's compare the presented ways of keeping Apollo cache up-to-date. We will assess these based on:

  • ease of implementation - how hard is it to implement at first,
  • ease of maintenance - how hard is it to make sure it continues to work as the application evolves,
  • applicability - does it work for all types of mutations,
  • user experience (UX) - does the change feel instant,
  • risk of overfetching - how likely it is to send more requests than necessary and fetch data not needed by the UI,
  • risk of inconsistency - how likely it is to end up with an inconsistent cache (values different than held by the API) despite doing the technique in question.
TechniqueEase of implementationEase of maintenanceApplicabilityUXRisk of overfetchingRisk of inconsistency
Including modified data in the mutation's responseEasy (if the API exposes the modified data)EasyOnly updates of existing objectsInstantLowLow
Updating the cache directly in the frontendDifficultEasy if done correctly from the start 1All types of mutationsInstantNoneHigh 4
Refetching queries - manually specifying queries to refetchEasyDifficult 2All types of mutationsDelayed update 3HighLow
Refetching queries - marking cached fields as invalidEasy for updates and deletes, difficult for insertionsEasy if done correctly from the start 1All types of mutationsDelayed update 3LowLow

Conclusion

Apollo offers many ways of keeping its cache consistent with the backend data. There is no single approach that works best in every scenario. Our recommendation is:

  • for updates, include the modified data in the mutation's response. This requires the least work, leads to best UX, and is easy to maintain.

  • for deletions, mark the fields of the deleted object as invalid.

    When pressed for time, start by manually specifying queries to refetch, and gradually move towards implementing the updateCache function that invalidates fields.

  • for insertions, aim to mark the fields of parent objects as invalid. If that turns out to be difficult, use a manual list of queries to refetch and review it periodically.

    This amortizes the maintenance and implementation costs.

All in all, cache invalidation remains a hard problem, and using Apollo is not an exception.


  1. requires a round-trip to the server.
  2. the update function needs to correctly approximate the changes done by the mutation. Then it only needs to change if the GraphQL schema changes.
  3. the list of queries to refetch needs to be reviewed each time a GraphQL query is changed, added, or removed.
  4. the frontend cache update logic must match the backend's one.
Parsing pgSQL with tree-sitter in WASM