# JSON Pointer Tutorial This tutorial introduces JSON Pointer as defined in [RFC 6901](https://www.rfc-editor.org/rfc/rfc6901), and demonstrates the `jsont-pointer` OCaml library through interactive examples. ## What is JSON Pointer? From RFC 6901, Section 1: > JSON Pointer defines a string syntax for identifying a specific value > within a JavaScript Object Notation (JSON) document. In other words, JSON Pointer is an addressing scheme for locating values inside a JSON structure. Think of it like a filesystem path, but for JSON documents instead of files. For example, given this JSON document: ```json { "users": [ {"name": "Alice", "age": 30}, {"name": "Bob", "age": 25} ] } ``` The JSON Pointer `/users/0/name` refers to the string `"Alice"`. In OCaml, this is represented by the `Jsont_pointer.t` type - a sequence of navigation steps from the document root to a target value. ## Syntax: Reference Tokens RFC 6901, Section 3 defines the syntax: > A JSON Pointer is a Unicode string containing a sequence of zero or more > reference tokens, each prefixed by a '/' (%x2F) character. The grammar is elegantly simple: ``` json-pointer = *( "/" reference-token ) reference-token = *( unescaped / escaped ) ``` This means: - The empty string `""` is a valid pointer (it refers to the whole document) - Every non-empty pointer starts with `/` - Everything between `/` characters is a "reference token" Let's see this in action. We can parse pointers and see their structure: ```sh $ jsonpp parse "" OK: [] ``` The empty pointer has no reference tokens - it points to the root. ```sh $ jsonpp parse "/foo" OK: [Mem:foo] ``` The pointer `/foo` has one token: `foo`. Since it's not a number, it's interpreted as an object member name (`Mem`). ```sh $ jsonpp parse "/foo/0" OK: [Mem:foo, Nth:0] ``` Here we have two tokens: `foo` (a member name) and `0` (interpreted as an array index `Nth`). ```sh $ jsonpp parse "/foo/bar/baz" OK: [Mem:foo, Mem:bar, Mem:baz] ``` Multiple tokens navigate deeper into nested structures. ### The Index Type Each reference token becomes an `Index.t` value in the library: ```ocaml type t = | Mem of string (* Object member access *) | Nth of int (* Array index access *) | End (* The special "-" marker for append operations *) ``` The `Mem` variant holds the **unescaped** member name - you work with the actual key string (like `"a/b"`) and the library handles any escaping needed for the JSON Pointer string representation. ### Invalid Syntax What happens if a pointer doesn't start with `/`? ```sh $ jsonpp parse "foo" ERROR: Invalid JSON Pointer: must be empty or start with '/': foo ``` The RFC is strict: non-empty pointers MUST start with `/`. ## Evaluation: Navigating JSON Now we come to the heart of JSON Pointer: evaluation. RFC 6901, Section 4 describes how a pointer is resolved against a JSON document: > Evaluation of a JSON Pointer begins with a reference to the root value > of a JSON document and completes with a reference to some value within > the document. Each reference token in the JSON Pointer is evaluated > sequentially. In the library, this is the `Jsont_pointer.get` function: ```ocaml val get : t -> Jsont.json -> Jsont.json ``` Let's use the example JSON document from RFC 6901, Section 5: ```sh $ cat rfc6901_example.json { "foo": ["bar", "baz"], "": 0, "a/b": 1, "c%d": 2, "e^f": 3, "g|h": 4, "i\\j": 5, "k\"l": 6, " ": 7, "m~n": 8 } ``` This document is carefully constructed to exercise various edge cases! ### The Root Pointer ```sh $ jsonpp eval rfc6901_example.json "" OK: {"foo":["bar","baz"],"":0,"a/b":1,"c%d":2,"e^f":3,"g|h":4,"i\\j":5,"k\"l":6," ":7,"m~n":8} ``` The empty pointer returns the whole document. In OCaml, this is `Jsont_pointer.root`: ```ocaml val root : t (** The empty pointer that references the whole document. *) ``` ### Object Member Access ```sh $ jsonpp eval rfc6901_example.json "/foo" OK: ["bar","baz"] ``` `/foo` accesses the member named `foo`, which is an array. ### Array Index Access ```sh $ jsonpp eval rfc6901_example.json "/foo/0" OK: "bar" ``` `/foo/0` first goes to `foo`, then accesses index 0 of the array. ```sh $ jsonpp eval rfc6901_example.json "/foo/1" OK: "baz" ``` Index 1 gives us the second element. ### Empty String as Key JSON allows empty strings as object keys: ```sh $ jsonpp eval rfc6901_example.json "/" OK: 0 ``` The pointer `/` has one token: the empty string. This accesses the member with an empty name. ### Keys with Special Characters The RFC example includes keys with `/` and `~` characters: ```sh $ jsonpp eval rfc6901_example.json "/a~1b" OK: 1 ``` The token `a~1b` refers to the key `a/b`. We'll explain this escaping [below](#escaping-special-characters). ```sh $ jsonpp eval rfc6901_example.json "/m~0n" OK: 8 ``` The token `m~0n` refers to the key `m~n`. **Important**: When using the OCaml library programmatically, you don't need to worry about escaping. The `Index.Mem` variant holds the literal key name: ```ocaml (* To access the key "a/b", just use the literal string *) let pointer = Jsont_pointer.make [Mem "a/b"] (* The library escapes it when converting to string *) let s = Jsont_pointer.to_string pointer (* "/a~1b" *) ``` ### Other Special Characters (No Escaping Needed) Most characters don't need escaping in JSON Pointer strings: ```sh $ jsonpp eval rfc6901_example.json "/c%d" OK: 2 ``` ```sh $ jsonpp eval rfc6901_example.json "/e^f" OK: 3 ``` ```sh $ jsonpp eval rfc6901_example.json "/g|h" OK: 4 ``` ```sh $ jsonpp eval rfc6901_example.json "/ " OK: 7 ``` Even a space is a valid key character! ### Error Conditions What happens when we try to access something that doesn't exist? ```sh $ jsonpp eval rfc6901_example.json "/nonexistent" ERROR: JSON Pointer: member 'nonexistent' not found File "-": ``` Or an out-of-bounds array index: ```sh $ jsonpp eval rfc6901_example.json "/foo/99" ERROR: JSON Pointer: index 99 out of bounds (array has 2 elements) File "-": ``` Or try to index into a non-container: ```sh $ jsonpp eval rfc6901_example.json "/foo/0/invalid" ERROR: JSON Pointer: cannot index into string with 'invalid' File "-": ``` The library provides both exception-raising and result-returning variants: ```ocaml val get : t -> Jsont.json -> Jsont.json val get_result : t -> Jsont.json -> (Jsont.json, Jsont.Error.t) result val find : t -> Jsont.json -> Jsont.json option ``` ### Array Index Rules RFC 6901 has specific rules for array indices. Section 4 states: > characters comprised of digits [...] that represent an unsigned base-10 > integer value, making the new referenced value the array element with > the zero-based index identified by the token And importantly: > note that leading zeros are not allowed ```sh $ jsonpp parse "/foo/0" OK: [Mem:foo, Nth:0] ``` Zero itself is fine. ```sh $ jsonpp parse "/foo/01" OK: [Mem:foo, Mem:01] ``` But `01` has a leading zero, so it's NOT treated as an array index - it becomes a member name instead. This protects against accidental octal interpretation. ## The End-of-Array Marker: `-` RFC 6901, Section 4 introduces a special token: > exactly the single character "-", making the new referenced value the > (nonexistent) member after the last array element. This is primarily useful for JSON Patch operations (RFC 6902). Let's see how it parses: ```sh $ jsonpp parse "/foo/-" OK: [Mem:foo, End] ``` The `-` is recognized as a special `End` index. However, you cannot evaluate a pointer containing `-` because it refers to a position that doesn't exist: ```sh $ jsonpp eval rfc6901_example.json "/foo/-" ERROR: JSON Pointer: '-' (end marker) refers to nonexistent array element File "-": ``` The RFC explains this: > Note that the use of the "-" character to index an array will always > result in such an error condition because by definition it refers to > a nonexistent array element. But we'll see later that `-` is very useful for mutation operations! ## Mutation Operations While RFC 6901 defines JSON Pointer for read-only access, RFC 6902 (JSON Patch) uses JSON Pointer for modifications. The `jsont-pointer` library provides these operations. ### Add The `add` operation inserts a value at a location: ```sh $ jsonpp add '{"foo":"bar"}' '/baz' '"qux"' {"foo":"bar","baz":"qux"} ``` In OCaml: ```ocaml val add : t -> Jsont.json -> value:Jsont.json -> Jsont.json ``` For arrays, `add` inserts BEFORE the specified index: ```sh $ jsonpp add '{"foo":["a","b"]}' '/foo/1' '"X"' {"foo":["a","X","b"]} ``` This is where the `-` marker shines - it appends to the end: ```sh $ jsonpp add '{"foo":["a","b"]}' '/foo/-' '"c"' {"foo":["a","b","c"]} ``` ### Remove The `remove` operation deletes a value: ```sh $ jsonpp remove '{"foo":"bar","baz":"qux"}' '/baz' {"foo":"bar"} ``` For arrays, it removes and shifts: ```sh $ jsonpp remove '{"foo":["a","b","c"]}' '/foo/1' {"foo":["a","c"]} ``` ### Replace The `replace` operation updates an existing value: ```sh $ jsonpp replace '{"foo":"bar"}' '/foo' '"baz"' {"foo":"baz"} ``` Unlike `add`, `replace` requires the target to already exist: ```sh $ jsonpp replace '{"foo":"bar"}' '/nonexistent' '"value"' ERROR: JSON Pointer: member 'nonexistent' not found File "-": ``` ### Move The `move` operation relocates a value: ```sh $ jsonpp move '{"foo":{"bar":"baz"},"qux":{}}' '/foo/bar' '/qux/thud' {"foo":{},"qux":{"thud":"baz"}} ``` ### Copy The `copy` operation duplicates a value: ```sh $ jsonpp copy '{"foo":{"bar":"baz"}}' '/foo/bar' '/foo/qux' {"foo":{"bar":"baz","qux":"baz"}} ``` ### Test The `test` operation verifies a value (useful in JSON Patch): ```sh $ jsonpp test '{"foo":"bar"}' '/foo' '"bar"' true ``` ```sh $ jsonpp test '{"foo":"bar"}' '/foo' '"baz"' false ``` ## Escaping Special Characters RFC 6901, Section 3 explains the escaping rules: > Because the characters '\~' (%x7E) and '/' (%x2F) have special meanings > in JSON Pointer, '\~' needs to be encoded as '\~0' and '/' needs to be > encoded as '\~1' when these characters appear in a reference token. Why these specific characters? - `/` separates tokens, so it must be escaped inside a token - `~` is the escape character itself, so it must also be escaped The escape sequences are: - `~0` represents `~` (tilde) - `~1` represents `/` (forward slash) ### The Library Handles Escaping Automatically **Important**: When using `jsont-pointer` programmatically, you rarely need to think about escaping. The `Index.Mem` variant stores unescaped strings, and escaping happens automatically during serialization: ```ocaml (* Create a pointer to key "a/b" - no escaping needed *) let p = Jsont_pointer.make [Mem "a/b"] (* Serialize to string - escaping happens automatically *) let s = Jsont_pointer.to_string p (* Returns "/a~1b" *) (* Parse from string - unescaping happens automatically *) let p' = Jsont_pointer.of_string "/a~1b" (* p' contains [Mem "a/b"] - the unescaped key *) ``` The `Token` module exposes the escaping functions if you need them: ```ocaml module Token : sig val escape : string -> string (* "a/b" -> "a~1b" *) val unescape : string -> string (* "a~1b" -> "a/b" *) end ``` ### Escaping in Action Let's see escaping with the CLI tool: ```sh $ jsonpp escape "hello" hello ``` No special characters, no escaping needed. ```sh $ jsonpp escape "a/b" a~1b ``` The `/` becomes `~1`. ```sh $ jsonpp escape "a~b" a~0b ``` The `~` becomes `~0`. ```sh $ jsonpp escape "~/" ~0~1 ``` Both characters are escaped. ### Unescaping And the reverse process: ```sh $ jsonpp unescape "a~1b" OK: a/b ``` ```sh $ jsonpp unescape "a~0b" OK: a~b ``` ### The Order Matters! RFC 6901, Section 4 is careful to specify the unescaping order: > Evaluation of each reference token begins by decoding any escaped > character sequence. This is performed by first transforming any > occurrence of the sequence '~1' to '/', and then transforming any > occurrence of the sequence '~0' to '~'. By performing the substitutions > in this order, an implementation avoids the error of turning '~01' first > into '~1' and then into '/', which would be incorrect (the string '~01' > correctly becomes '~1' after transformation). Let's verify this tricky case: ```sh $ jsonpp unescape "~01" OK: ~1 ``` If we unescaped `~0` first, `~01` would become `~1`, which would then become `/`. But that's wrong! The sequence `~01` should become the literal string `~1` (a tilde followed by the digit one). Invalid escape sequences are rejected: ```sh $ jsonpp unescape "~2" ERROR: Invalid JSON Pointer: invalid escape sequence ~2 ``` ```sh $ jsonpp unescape "hello~" ERROR: Invalid JSON Pointer: incomplete escape sequence at end ``` ## URI Fragment Encoding JSON Pointers can be embedded in URIs. RFC 6901, Section 6 explains: > A JSON Pointer can be represented in a URI fragment identifier by > encoding it into octets using UTF-8, while percent-encoding those > characters not allowed by the fragment rule in RFC 3986. This adds percent-encoding on top of the `~0`/`~1` escaping: ```sh $ jsonpp uri-fragment "/foo" OK: /foo -> /foo ``` Simple pointers often don't need percent-encoding. ```sh $ jsonpp uri-fragment "/a~1b" OK: /a~1b -> /a~1b ``` The `~1` escape stays as-is (it's valid in URI fragments). ```sh $ jsonpp uri-fragment "/c%d" OK: /c%d -> /c%25d ``` The `%` character must be percent-encoded as `%25` in URIs! ```sh $ jsonpp uri-fragment "/ " OK: / -> /%20 ``` Spaces become `%20`. The library provides functions for URI fragment encoding: ```ocaml val to_uri_fragment : t -> string val of_uri_fragment : string -> t val jsont_uri_fragment : t Jsont.t ``` Here's the RFC example showing the URI fragment forms: | JSON Pointer | URI Fragment | Value | |-------------|-------------|-------| | `""` | `#` | whole document | | `"/foo"` | `#/foo` | `["bar", "baz"]` | | `"/foo/0"` | `#/foo/0` | `"bar"` | | `"/"` | `#/` | `0` | | `"/a~1b"` | `#/a~1b` | `1` | | `"/c%d"` | `#/c%25d` | `2` | | `"/ "` | `#/%20` | `7` | | `"/m~0n"` | `#/m~0n` | `8` | ## Deeply Nested Structures JSON Pointer handles arbitrarily deep nesting: ```sh $ jsonpp eval rfc6901_example.json "/foo/0" OK: "bar" ``` For deeper structures, just add more path segments. With nested objects: ```sh $ jsonpp add '{"a":{"b":{"c":"d"}}}' '/a/b/x' '"y"' {"a":{"b":{"c":"d","x":"y"}}} ``` With nested arrays: ```sh $ jsonpp add '{"arr":[[1,2],[3,4]]}' '/arr/0/1' '99' {"arr":[[1,99,2],[3,4]]} ``` ## Jsont Integration The library integrates with the `Jsont` codec system for typed access: ```ocaml (* Codec for JSON Pointers as JSON strings *) val jsont : t Jsont.t (* Query combinators *) val path : ?absent:'a -> t -> 'a Jsont.t -> 'a Jsont.t val set_path : ?allow_absent:bool -> 'a Jsont.t -> t -> 'a -> Jsont.json Jsont.t val update_path : ?absent:'a -> t -> 'a Jsont.t -> Jsont.json Jsont.t val delete_path : ?allow_absent:bool -> t -> Jsont.json Jsont.t ``` These allow you to use JSON Pointers with typed codecs rather than raw `Jsont.json` values. ## Summary JSON Pointer (RFC 6901) provides a simple but powerful way to address values within JSON documents: 1. **Syntax**: Pointers are strings of `/`-separated reference tokens 2. **Escaping**: Use `~0` for `~` and `~1` for `/` in tokens (handled automatically by the library) 3. **Evaluation**: Tokens navigate through objects (by key) and arrays (by index) 4. **URI Encoding**: Pointers can be percent-encoded for use in URIs 5. **Mutations**: Combined with JSON Patch (RFC 6902), pointers enable structured updates The `jsont-pointer` library implements all of this with type-safe OCaml interfaces, integration with the `jsont` codec system, and proper error handling for malformed pointers and missing values.