(docs) - Improve Normalized Caching introduction doc (#1371)

Mirror: The highly customizable and versatile GraphQL client with which you add on features like normalized caching as you grow.

fork atom

* (docs) - Add new "Normalized Caching" introduction

* (docs) - Add more custom keys details to "Normalized Caching"

* (docs) - Add introductory manual updates/resolvers section

authored by kitten.sh and committed by

GitHub 5 years ago 3812623d 1de153b6

+390 -66

2 changed files

expand all

docs

assets

query-document-info.png

graphcache

normalized-caching.md

docs/assets/query-document-info.png

This is a binary file and will not be displayed.

+390 -66

docs/graphcache/normalized-caching.md

··· 5 5 6 6 # Normalized Caching 7 7 8 - With _Graphcache_ all data is stored in a normalized data structure. It automatically uses 9 - `__typename` information and `id` fields on entities to create a normalized table of data. Since 10 - GraphQL deals with connected data in a tree structure, each entity may link to other entities or 11 - even lists of entities, which we call "links". The scalar fields on entities like numbers, strings, 12 - etc is what we call "records." 8 + In GraphQL, like its name suggests, we create schemas that express the relational nature of our 9 + data. When we create and query against a `Query` type we walk a graph that starts at the root 10 + `Query` type and walks through relational types. Rather than querying for normalized data, in 11 + GraphQL our queries request a specific shape of denormalized data, a view into our relational data 12 + that can be re-normalized automatically. 13 + 14 + As the GraphQL API walks our query documents it may read from a relational database and _entities_ 15 + and scalar values are copied into a JSON document that matches our query document. The type 16 + information of our entities isn't lost however. A query document may still ask the GraphQL API about 17 + what entity it's dealing with using the `__typename` field, which dynamically introspects an 18 + entity's type. This means that GraphQL clients can automatically re-normalize data as results come 19 + back from the API by using the `__typename` field and keyable fields like an `id` or `_id` field, 20 + which are already common conventions in GraphQL schemas. In other words, normalized caches can build 21 + up a relational database of tables in-memory for our application. 22 + 23 + For our apps normalized caches can enable more sophisticated use-cases, where different API requests 24 + update data in other parts of the app and automatically update data in our cache as we query our 25 + GraphQL API. Normalized caches can essentially keep the UI of our applications up-to-date when 26 + relational data is detected across multiple queries, mutations, or subscriptions. 27 + 28 + ## Normalizing Relational Data 29 + 30 + As previously mentioned, a GraphQL schema creates a tree of types where our application's data 31 + always starts from the `Query` root type and is modified by other data that's incoming from either a 32 + selection on `Mutation` or `Subscription`. All data that we query from the `Query` type will contain 33 + relations between "entities", JSON objects that are hierarchical. 13 34 14 - Instead of storing query results as whole documents, like `urql` does with [its default "Document 15 - Caching"](../concepts/document-caching.md), _Graphcache_ flattens all data it receives automatically. 16 - If we looked at doing this manually on the following piece of data, we'd separate each object into a 17 - list of key-value entries per entity. 35 + A normalized cache seeks to turn this denormalized JSON blob back into a relational data structure, 36 + which stores all entities by a key that can be looked up directly. Since GraphQL documents give the 37 + API a strict specification on how it traverses a schema, the JSON data that the cache receives from 38 + the API will always match the GraphQL query document that has been used to query this data. 39 + A common misconception is that normalized caches in GraphQL store data by the query document somehow, 40 + however, the only thing a normalized cache cares about is that it can use our GraphQL query documents 41 + to walk the structure of the JSON data it received from the API. 42 + 43 + ```graphql 44 + { 45 + __typename 46 + todo(id: 1) { 47 + __typename 48 + id 49 + title 50 + author { 51 + __typename 52 + id 53 + name 54 + } 55 + } 56 + } 57 + ``` 18 58 19 59 ```json 20 60 { ··· 32 72 } 33 73 ``` 34 74 35 - The above would look like the following once we normalized the data: 75 + Above, we see an example of a GraphQL query document and a corresponding JSON result from a GraphQL 76 + API. In GraphQL, we never lose access to the underlying types of the data. Normalized caches can 77 + ask for the `__typename` field in selection sets automatically and will find out which type a JSON 78 + object corresponds to. 79 + 80 + Generally, a normalized cache must do one of two things with a query document like the above: 81 + 82 + - It must be able to walk the query document and JSON data of the result and cache the data, 83 + normalizing it in the process and storing it in relational tables. 84 + - It must later be able to walk the query document and recreate this JSON data just by reading data 85 + from its cache, by reading entries from its in-memory relational tables. 86 + 87 + While the normalized cache can't know the exact type of each field, thanks to the GraphQL query 88 + language it can make a couple of assumptions. The normalized cache can walk the query document. Each 89 + field that has no selection set (like `title` in the above example) must be a "record", a field that 90 + may only be set to a scalar. Each field that does have a selection set must be another "entity" or a 91 + list of "entities". The latter fields with selection sets are our relations between entities, like a 92 + foreign key in relational databases. 93 + Furthermore, the normalized cache can then read the `__typename` field on related entities. This is 94 + called _Type Name Introspection_ and is how it finds out about the types of each entity. 95 + From the above document we can assume the following relations: 96 + 97 + - `Query.todo(id: 1)` → `Todo` 98 + - `Todo.author` → `Author` 99 + 100 + However, this isn't quite enough yet to store the relations from GraphQL results. The normalized 101 + cache must also generate primary keys for each entity so that it can store them in table-like data 102 + structures. This is for instance why [Relay 103 + enforces](https://relay.dev/docs/en/graphql-server-specification.html#object-identification) that 104 + each entity must have an `id` field. This allows it to assume that there's an obvious primary key 105 + for each entity it may query. Instead, `urql`'s Graphcache and Apollo assume that there _may_ be an 106 + `id` or `_id` field in a given selection set. If Graphcache can't find these two fields it'll issue 107 + a warning, however a custom `keys` configuration may be used to generate custom keys for a given 108 + type. With this logic the normalized cache will actually create the following "links" between its 109 + relational data: 110 + 111 + - `"Query"`, `.todo(id: 1)` → `"Todo:1"` 112 + - `"Todo:1"`, `.author` → `"Author:1"` 36 113 37 - ```json 114 + As we can see, the `Query` root type itself has a constant key of `"Query"`. All relational data 115 + originates here, since the GraphQL schema is a graph and, like a tree, all selections on a GraphQL 116 + query document originate from it. 117 + Internally, the normalized cache now stores field values on entities by their primary keys. The 118 + above can also be said or written as: 119 + 120 + - The `Query` entity's `todo` field with `{"id": 1}` arguments points to the `Todo:1` entity. 121 + - The `Todo:1` entity's `author` field points to the `Author:1` entity. 122 + 123 + In Graphcache, these "links" are stored in a nested structure per-entity. "Records" are kept 124 + separate from this relational data. 125 + 126 + ![Normalization is based on types, keys, and relations. This information can all be inferred from 127 + the query document.](../assets/query-document-info.png) 128 + 129 + ## Storing Normalized Data 130 + 131 + At its core, normalizing data means that we take individual fields and store them in a table. In our 132 + case we store all values of fields in a dictionary of their primary key, generated from an ID or 133 + other key and type name, and the field’s name and arguments, if it has any. 134 + 135 + | Primary Key | Field | Value | 136 + | ---------------------- | ----------------------------------------------- | ------------------------ | 137 + | Type name and ID (Key) | Field name (not alias) and optionally arguments | Scalar value or relation | 138 + 139 + To reiterate we have three pieces of information that are stored in tables: 140 + 141 + - The entity's key can be derived from its type name via the `__typename` field and a keyable field. 142 + By default _Graphcache_ will check the `id` and `_id` fields, however this is configurable. 143 + - The field's name (like `todo`) and optional arguments. If the field has any arguments then we can 144 + normalize it by JSON stringifying the arguments, making sure that the JSON key is stable by 145 + sorting its keys. 146 + - Lastly, we may store relations as either `null`, a primary key that refers to another entity, or a 147 + list of such. For storing "records" we can store the scalars in a separate table. 148 + 149 + In _Graphcache_ the data structure for these tables looks a little like the following, where each 150 + entity has a record from fields to other entity keys: 151 + 152 + ```js 38 153 { 39 - "Query": { 40 - "todo": "Todo:1" // link 41 - }, 42 - "Todo:1": { 43 - "__typename": "Todo", 44 - "id": "1", // record 45 - "title": "implement graphcache", // record 46 - "author": "Author:1" // link 47 - }, 48 - "Author:1": { 49 - "__typename": "Author", 50 - "id": "1", // record 51 - "name": "urql-team" // record 154 + links: Map { 155 + 'Query': Record { 156 + 'todo({"id":1})': 'Todo:1' 157 + }, 158 + 'Todo:1': Record { 159 + 'author': 'Author:1' 160 + }, 161 + 'Author:1': Record { }, 162 + } 163 + } 164 + ``` 165 + 166 + We can see how the normalized cache is now able to traverse a GraphQL query by starting on the 167 + `Query` entity and retrieve relations for other fields. 168 + To retrieve "records" which are all fields with scalar values and no selection sets, _Graphcache_ 169 + keeps a second table around with an identical structure. This table only contains scalar values, 170 + which keeps our non-relational data away from our "links": 171 + 172 + ```js 173 + { 174 + records: Map { 175 + 'Query': Record { 176 + '__typename': 'Query' 177 + }, 178 + 'Todo:1': Record { 179 + '__typename': 'Todo', 180 + 'id': 1, 181 + 'title': 'implement graphcache' 182 + }, 183 + 'Author:1': Record { 184 + '__typename': 'Author', 185 + 'id': 1, 186 + 'name': 'urql-team' 187 + }, 52 188 } 53 189 } 54 190 ``` 55 191 56 192 This is very similar to how we'd go about creating a state management store manually, except that 57 - _Graphcache_ can use the GraphQL document and the `__typename` field to perform this normalization 58 - automatically. 193 + _Graphcache_ can use the GraphQL document to perform this normalization automatically. 59 194 60 - The interesting part of normalization starts when we read from the cache instead of writing to it. 61 - Multiple results may refer to the same piece of data — a mutation for instance may update `"Todo:1"` 62 - later on in the app. This would automatically cause _Graphcache_ to update any related queries in 63 - the entire app, because all references to each each entity are shared. 195 + What we gain from this normalization is that we have a data structure that we can both read from and 196 + write to, to reproduce the API results for GraphQL query documents. Any mutation or subscription can 197 + also be written to this data structure. Once _Graphcache_ finds a keyable entity in their results 198 + it's written to its relational table which may update other queries in our application. 199 + Similarly queries may share data between one another which means that they effectively share 200 + entities using this approach and can update one another. 201 + In other words, once we have a primary key like `"Todo:1"` we may find this primary key again in 202 + other entities in other GraphQL results. 64 203 65 - ## Terminology 204 + ## Custom Keys and Non-Keyable Entities 66 205 67 - A few terms that will be used throughout the _Graphcache_ documentation that are important to understand in order to get a full understanding. 206 + In the above introduction we've learned that while _Graphcache_ doesn't enforce `id` fields on each 207 + entity, it checks for the `id` and `_id` fields by default. There are many situations in which 208 + entities may either not have a key field or have different keys. 68 209 69 - - **Entity**, this is an object for which the cache can generate a key, like `Todo:1`. 70 - - **Record**, this is a property that relate to an entity, in the above case this would be `title`, ... 71 - internally these will be represented as `Todo:1.title`. 72 - - **Link**, This is the connection between entities or the base `Query` field, this will link an entity key (ex: `Query`/`Todo:1`) to a single or an array 73 - of keys 210 + As _Graphcache_ traverses JSON data and a GraphQL query document to write data to the cache you may 211 + see a warning from it along the lines of ["Invalid key: [...] No key could be generated for the data 212 + at this field."](./errors.md/#15-invalid-key) _Graphcache_ has many warnings like these that attempt 213 + to detect undesirable behaviour and helps us to update our configuration or queries accordingly. 74 214 75 - ## Key Generation 215 + In the simplest cases, we may simply have forgotten to add the `id` field to the selection set of 216 + our GraphQL query document. However, what if the field is instead called `uuid` and our query looks 217 + accordingly different? 76 218 77 - As we saw in the previous example, by default _Graphcache_ will attempt to generate a key by 78 - combining the `__typename` of a piece of data with the `id` or `_id` fields, if they're present. For 79 - instance, `{ __typename: 'Author', id: 1 }` becomes `"Author:1"`. 219 + ```graphql 220 + { 221 + item { 222 + uuid 223 + } 224 + } 225 + ``` 80 226 81 - _Graphcache_ will log a warning when these fields weren't requested as part of a query's selection 82 - set or aren't present in the data. This can be useful if we forget to include them in our queries. 83 - In general, _Graphcache_ will always output warnings in development when it assumes that something 84 - went wrong. 227 + In the above selection set we have an `item` field that has a `uuid` field rather than an `id` 228 + field. This means that _Graphcache_ won't automatically be able to generate a primary key for this 229 + entity. Instead, we have to help it generate a key by passing it a custom `keys` config: 230 + 231 + ```js 232 + cacheExchange({ 233 + keys: { 234 + Item: data => data.uuid, 235 + }, 236 + }); 237 + ``` 238 + 239 + We may add a function as an entry to the `keys` configuration. The property here, `"Item"` must be 240 + the typename of the entity for which we're generating a key. The function may return an arbitarily 241 + generated key. So for our `item` field, which in our example schema gives us an `Item` entity, we 242 + can create a `keys` configuration entry that creates a key from the `uuid` field rather than the 243 + `id` field. 244 + 245 + This also raises a question, **what does _Graphcache_ do with unkeyable data by default? And, what 246 + if my data has no key?**<br /> 247 + This special case is what we call "embedded data". Not all types in a GraphQL schema will have 248 + keyable fields and some types may just abstract data without themselves being relational. They may 249 + be "edges", entities that have a field pointing to other entities that simply connect two entities, 250 + or data types like a `GeoJson` or `Image` type. 251 + 252 + In these cases, where the normalized cache encounters unkeyable types, it will create an embedded 253 + key by using the parent's primary key and combining it with the field key. This means that 254 + "embedded entities" are only reachable from a specific field on their parent entities. They're 255 + globally unique and aren't strictly speaking relational data. 85 256 86 - However, in your schema you may have types that don't have an `id` or `_id` field, say maybe some 87 - types have a `key` field instead. In such cases the custom `keys` configuration comes into play 257 + ```graphql 258 + { 259 + __typename 260 + todo(id: 1) { 261 + id 262 + image { 263 + url 264 + width 265 + height 266 + } 267 + } 268 + } 269 + ``` 88 270 89 - Let's look at an example. Say we have a set of todos each with a `__typename` 90 - of `Todo`, but instead of identifying on `id` or `_id` we want to identify 91 - each record by its `name`: 271 + In the above example we're querying an `Image` type on a `Todo`. This imaginary `Image` type has no 272 + key because the image is embedded data and will only ever be associated to this `Todo`. In other 273 + words, the API's schema doesn't consider it necessary to have a primary key field for this type. 274 + Maybe it doesn't even have an ID in our backend's database. We _could_ assign this type an imaginary 275 + key (maybe based on the `url`) but in fact if it's not shared data it wouldn't make much sense to 276 + do so. 277 + 278 + When _Graphcache_ attempts to store this entity it will issue the previously mentioned warning. 279 + Internally, it'll then generate an embedded key for this entity based on the parent entity. If 280 + the parent entity's key is `Todo:1` then the embedded key for our `Image` will become 281 + `Todo:1.image`. This is also how this entity will be stored internally by _Graphcache_: 92 282 93 283 ```js 94 - import { cacheExchange } from '@urql/exchange-graphcache'; 284 + { 285 + records: Map { 286 + 'Todo:1.image': Record { 287 + '__typename': 'Image', 288 + 'url': '...', 289 + 'width': 1024, 290 + 'height': 768 291 + }, 292 + } 293 + } 294 + ``` 295 + 296 + This doesn't however mute the warning that _Graphcache_ outputs, since it believes we may have made a 297 + mistake. The warning itself gives us advice on how to mute it: 95 298 96 - const cache = cacheExchange({ 299 + > If this is intentional, create a keys config for `Image` that always returns null. 300 + 301 + Meaning, that we can add an entry to our `keys` config for our non-keyable type that explicitly 302 + returns `null`, which tells _Graphcache_ that the entity has no key: 303 + 304 + ```js 305 + cacheExchange({ 97 306 keys: { 98 - Todo: data => data.name, 307 + Image: () => null, 99 308 }, 100 309 }); 101 310 ``` 102 311 103 - This will cause our cache to generate a key from `__typename` and `name` instead if an entity's type 104 - is `Todo`. 312 + ## Non-Automatic Relations and Updates 105 313 106 - Similarly some pieces of data shouldn't be normalized at all. If _Graphcache_ can't find the `id` or 107 - `_id` fields it will log a warning and _embed the data_ instead. Embedding the data means that it 108 - won't be normalized because the generated key is `null` and will instead only be referenced by the 109 - parent entity. 314 + While _Graphcache_ is able to store and update our entities in an in-memory relational data 315 + structure, which keeps the same entities in singular unique locations, a GraphQL API may make a lot 316 + of implicit changes to the relations of data as it runs or have trivial relations that our cache 317 + doesn't need to see to resolve. Like with the `keys` config, we have two more configuration options 318 + to combat this: `resolvers` and `updates`. 110 319 111 - You can force this behaviour and silence the warning by making a `keys` function that returns `null` 112 - immediately. This can be useful for types that aren't globally unique, like a `GeoPoint`: 320 + ### Manually resolving entities 321 + 322 + Some fields in our configuration can be resolved without checking the GraphQL API for relations. The 323 + `resolvers` config allows us to create a list of client-side resolvers where we can read from the 324 + cache directly as _Graphcache_ creates a local GraphQL result from its cached data. 325 + 326 + ```graphql 327 + { 328 + todo(id: 1) { 329 + id 330 + } 331 + } 332 + ``` 333 + 334 + Previously we've looked at the above query to illustrate how data from a GraphQL API may be written 335 + to _Graphcache_'s relational data structure to store the links and entities in a result against this 336 + GraphQL query document. However, it may be possible for another query to have already written this 337 + `Todo` entity to the cache. So, **how do we resolve a relation manually?** 338 + 339 + In such a case, _Graphcache_ may have seen and stored the `Todo` entity but isn't aware of the 340 + relation between `Query.todo({"id":1})` and the `Todo:1` entity. However, we can tell _Graphcache_ 341 + which entity it should look for when it accesses the `Query.todo` field by creating a resolver for 342 + it: 113 343 114 344 ```js 115 - const myGraphCache = cacheExchange({ 116 - keys: { 117 - GeoPoint: () => null, 345 + cacheExchange({ 346 + resolvers: { 347 + Query: { 348 + todo(parent, args, cache, info) { 349 + return { __typename: 'Todo', id: args.id }; 350 + }, 351 + }, 118 352 }, 119 353 }); 120 354 ``` 121 355 122 - ### Reading on 356 + A resolver is a function that's similar to [GraphQL.js' resolvers on the 357 + server-side](https://www.graphql-tools.com/docs/resolvers/). They receive the parent data, the 358 + field's arguments, access to _Graphcache_'s cached data, and an `info` object. [The entire function 359 + signature and more explanations can be found in the API docs.](../api/graphcache.md#resolvers-option) 360 + Since it can access the field's arguments from the GraphQL query document, we can return a partial 361 + `Todo` entity. As long as this 362 + object is keyable, it will tell _Graphcache_ what the key of the returned entity is. In other words, 363 + we've told it how to get to a `Todo` from the `Query.todo` field. 364 + 365 + This mechanism is immensely more powerful than this example. We have two other use-cases that 366 + resolvers may be used for: 123 367 124 - [On the next page we'll learn about "Computed queries".](./computed-queries.md) 368 + - Resolvers can be applied to fields with records, which means that it can be used to change or 369 + transform scalar values. For instance, we can update a string or parse a `Date` right inside a 370 + resolver. 371 + - Resolvers can return deeply nested results, which will be layered on top of the in-memory 372 + relational cached data of _Graphcache_, which means that it can emulate infinite pagination and 373 + other complex behaviour. 374 + 375 + [Read more about local resolvers ont the following page, "Custom Queries".](./custom-queries.md) 376 + 377 + ### Manual cache updates 378 + 379 + While `resolvers`, as shown above, operate while _Graphcache_ is reading from its in-memory cache, 380 + `updates` are a configuration option that operate while _Graphcache_ is writing to its cached data. 381 + Specifically, these functions can be used to add more updates onto what a `Mutation` or 382 + `Subscription` may automatically update. 383 + 384 + As stated before, a GraphQL schema's data may undergo a lot of implicit changes when we send it a 385 + `Mutation` or `Subscription`. A new item that we create may for instance manipulate a completely 386 + different item or even a list. Often mutations and subscriptions alter relations that their 387 + selection sets wouldn't necessarily see. Since mutations and subscriptions operate on a different 388 + root type, rather than the `Query` root type, we often need to update links in the rest of our data 389 + when a mutation is executed. 390 + 391 + ```graphql 392 + query TodosList { 393 + todos { 394 + id 395 + title 396 + } 397 + } 398 + 399 + mutation AddTodo($title: String!) { 400 + addTodo(title: $title) { 401 + id 402 + title 403 + } 404 + } 405 + ``` 406 + 407 + In a simple example, like the one above, we have a list of todos in a query and create a new todo 408 + using the `Mutation.addTodo` mutation field. When the mutation is executed and we get the result 409 + back, _Graphcache_ already writes the `Todo` item to its normalized cache. However, we also want to 410 + add the new `Todo` item to the list on `Query.todos`: 411 + 412 + ```js 413 + import { gql } from '@urql/core'; 414 + 415 + cacheExchange({ 416 + updates: { 417 + Mutation: { 418 + addTodo(result, args, cache, info) { 419 + const query = gql` 420 + { 421 + todos { 422 + id 423 + } 424 + } 425 + `; 426 + cache.updateQuery({ query }, data => { 427 + data.todos.push(result.addTodo); 428 + return data; 429 + }); 430 + }, 431 + }, 432 + }, 433 + }); 434 + ``` 435 + 436 + In this code example we can first see that the signature of the `updates` entry is very similar to 437 + the one of `resolvers`. However, we're seeing the `cache` in use for the first time. The `cache` 438 + object (as [documented in the API docs](http://localhost:3000/docs/api/graphcache/#cache)) gives us 439 + access to _Graphcache_'s mechanisms directly. Not only can we resolve data using it, we can directly 440 + start sub-queries or sub-writes manually. These are full normalized cache runs inside other runs. In 441 + this case we're calling `cache.updateQuery` on a list of `Todo` items while the `Mutation` that 442 + added the `Todo` is already being written to the cache. 443 + 444 + As we can see, we may perform manual changes inside of `updates` functions, which can be used to 445 + affect other parts of the cache (like `Query.todos` here) beyond the automatic updates that a 446 + normalized cache is expected to perform. 447 + 448 + [Read more about creating custom updates on the "Custom Updates" page.](./custom-updates.md)