RFC6901 JSON Pointer implementation in OCaml using jsont

JSON Pointer Tutorial#

This tutorial introduces JSON Pointer as defined in RFC 6901, and demonstrates the jsont-pointer OCaml library through interactive examples.

Setup#

First, let's set up our environment with helper functions:

# open Jsont_pointer;;
# #install_printer Jsont_pointer_top.printer;;
# #install_printer Jsont_pointer_top.json_printer;;
# #install_printer Jsont_pointer_top.error_printer;;
# let parse_json s =
    match Jsont_bytesrw.decode_string Jsont.json s with
    | Ok json -> json
    | Error e -> failwith e;;
val parse_json : string -> Jsont.json = <fun>

What is JSON Pointer?#

From RFC 6901, Section 1:

JSON Pointer defines a string syntax for identifying a specific value within a JavaScript Object Notation (JSON) document.

In other words, JSON Pointer is an addressing scheme for locating values inside a JSON structure. Think of it like a filesystem path, but for JSON documents instead of files.

For example, given this JSON document:

# let users_json = parse_json {|{
    "users": [
      {"name": "Alice", "age": 30},
      {"name": "Bob", "age": 25}
    ]
  }|};;
val users_json : Jsont.json =
  {"users":[{"name":"Alice","age":30},{"name":"Bob","age":25}]}

The JSON Pointer /users/0/name refers to the string "Alice":

# let ptr = of_string "/users/0/name";;
val ptr : t = [`Mem "users"; `Nth 0; `Mem "name"]
# get ptr users_json;;
- : Jsont.json = "Alice"

In OCaml, this is represented by the Jsont_pointer.t type - a sequence of navigation steps from the document root to a target value.

Syntax: Reference Tokens#

RFC 6901, Section 3 defines the syntax:

A JSON Pointer is a Unicode string containing a sequence of zero or more reference tokens, each prefixed by a '/' (%x2F) character.

The grammar is elegantly simple:

json-pointer    = *( "/" reference-token )
reference-token = *( unescaped / escaped )

This means:

  • The empty string "" is a valid pointer (it refers to the whole document)
  • Every non-empty pointer starts with /
  • Everything between / characters is a "reference token"

Let's see this in action:

# of_string "";;
- : t = []

The empty pointer has no reference tokens - it points to the root.

# of_string "/foo";;
- : t = [`Mem "foo"]

The pointer /foo has one token: foo. Since it's not a number, it's interpreted as an object member name (Mem).

# of_string "/foo/0";;
- : t = [`Mem "foo"; `Nth 0]

Here we have two tokens: foo (a member name) and 0 (interpreted as an array index Nth).

# of_string "/foo/bar/baz";;
- : t = [`Mem "foo"; `Mem "bar"; `Mem "baz"]

Multiple tokens navigate deeper into nested structures.

The Index Type#

Each reference token becomes an Index.t value in the library:

type t = [
  | `Mem of string   (* Object member access *)
  | `Nth of int      (* Array index access *)
  | `End             (* The special "-" marker for append operations *)
]

The Mem variant holds the unescaped member name - you work with the actual key string (like "a/b") and the library handles any escaping needed for the JSON Pointer string representation.

Invalid Syntax#

What happens if a pointer doesn't start with /?

# of_string "foo";;
Exception:
Jsont.Error Invalid JSON Pointer: must be empty or start with '/': foo.

The RFC is strict: non-empty pointers MUST start with /.

For safer parsing, use of_string_result:

# of_string_result "foo";;
- : (t, string) result =
Error "Invalid JSON Pointer: must be empty or start with '/': foo"
# of_string_result "/valid";;
- : (t, string) result = Ok [`Mem "valid"]

Evaluation: Navigating JSON#

Now we come to the heart of JSON Pointer: evaluation. RFC 6901, Section 4 describes how a pointer is resolved against a JSON document:

Evaluation of a JSON Pointer begins with a reference to the root value of a JSON document and completes with a reference to some value within the document. Each reference token in the JSON Pointer is evaluated sequentially.

Let's use the example JSON document from RFC 6901, Section 5:

# let rfc_example = parse_json {|{
    "foo": ["bar", "baz"],
    "": 0,
    "a/b": 1,
    "c%d": 2,
    "e^f": 3,
    "g|h": 4,
    "i\\j": 5,
    "k\"l": 6,
    " ": 7,
    "m~n": 8
  }|};;
val rfc_example : Jsont.json =
  {"foo":["bar","baz"],"":0,"a/b":1,"c%d":2,"e^f":3,"g|h":4,"i\\j":5,"k\"l":6," ":7,"m~n":8}

This document is carefully constructed to exercise various edge cases!

The Root Pointer#

# get root rfc_example ;;
- : Jsont.json =
{"foo":["bar","baz"],"":0,"a/b":1,"c%d":2,"e^f":3,"g|h":4,"i\\j":5,"k\"l":6," ":7,"m~n":8}

The empty pointer (root) returns the whole document.

Object Member Access#

# get (of_string "/foo") rfc_example ;;
- : Jsont.json = ["bar","baz"]

/foo accesses the member named foo, which is an array.

Array Index Access#

# get (of_string "/foo/0") rfc_example ;;
- : Jsont.json = "bar"
# get (of_string "/foo/1") rfc_example ;;
- : Jsont.json = "baz"

/foo/0 first goes to foo, then accesses index 0 of the array.

Empty String as Key#

JSON allows empty strings as object keys:

# get (of_string "/") rfc_example ;;
- : Jsont.json = 0

The pointer / has one token: the empty string. This accesses the member with an empty name.

Keys with Special Characters#

The RFC example includes keys with / and ~ characters:

# get (of_string "/a~1b") rfc_example ;;
- : Jsont.json = 1

The token a~1b refers to the key a/b. We'll explain this escaping below.

# get (of_string "/m~0n") rfc_example ;;
- : Jsont.json = 8

The token m~0n refers to the key m~n.

Important: When using the OCaml library programmatically, you don't need to worry about escaping. The Mem variant holds the literal key name:

# let slash_ptr = make [`Mem "a/b"];;
val slash_ptr : t = [`Mem "a/b"]
# to_string slash_ptr;;
- : string = "/a~1b"
# get slash_ptr rfc_example ;;
- : Jsont.json = 1

The library escapes it when converting to string.

Other Special Characters (No Escaping Needed)#

Most characters don't need escaping in JSON Pointer strings:

# get (of_string "/c%d") rfc_example ;;
- : Jsont.json = 2
# get (of_string "/e^f") rfc_example ;;
- : Jsont.json = 3
# get (of_string "/g|h") rfc_example ;;
- : Jsont.json = 4
# get (of_string "/ ") rfc_example ;;
- : Jsont.json = 7

Even a space is a valid key character!

Error Conditions#

What happens when we try to access something that doesn't exist?

# get_result (of_string "/nonexistent") rfc_example;;
- : (Jsont.json, Jsont.Error.t) result =
Error JSON Pointer: member 'nonexistent' not found
File "-":
# find (of_string "/nonexistent") rfc_example;;
- : Jsont.json option = None

Or an out-of-bounds array index:

# find (of_string "/foo/99") rfc_example;;
- : Jsont.json option = None

Or try to index into a non-container:

# find (of_string "/foo/0/invalid") rfc_example;;
- : Jsont.json option = None

The library provides both exception-raising and result-returning variants:

val get : t -> Jsont.json -> Jsont.json
val get_result : t -> Jsont.json -> (Jsont.json, Jsont.Error.t) result
val find : t -> Jsont.json -> Jsont.json option

Array Index Rules#

RFC 6901 has specific rules for array indices. Section 4 states:

characters comprised of digits [...] that represent an unsigned base-10 integer value, making the new referenced value the array element with the zero-based index identified by the token

And importantly:

note that leading zeros are not allowed

# of_string "/foo/0";;
- : t = [`Mem "foo"; `Nth 0]

Zero itself is fine.

# of_string "/foo/01";;
- : t = [`Mem "foo"; `Mem "01"]

But 01 has a leading zero, so it's NOT treated as an array index - it becomes a member name instead. This protects against accidental octal interpretation.

The End-of-Array Marker: -#

RFC 6901, Section 4 introduces a special token:

exactly the single character "-", making the new referenced value the (nonexistent) member after the last array element.

This is primarily useful for JSON Patch operations (RFC 6902). Let's see how it parses:

# of_string "/foo/-";;
- : t = [`Mem "foo"; `End]

The - is recognized as a special End index.

However, you cannot evaluate a pointer containing - because it refers to a position that doesn't exist:

# find (of_string "/foo/-") rfc_example;;
- : Jsont.json option = None

The RFC explains this:

Note that the use of the "-" character to index an array will always result in such an error condition because by definition it refers to a nonexistent array element.

But we'll see later that - is very useful for mutation operations!

Mutation Operations#

While RFC 6901 defines JSON Pointer for read-only access, RFC 6902 (JSON Patch) uses JSON Pointer for modifications. The jsont-pointer library provides these operations.

Add#

The add operation inserts a value at a location:

# let obj = parse_json {|{"foo":"bar"}|};;
val obj : Jsont.json = {"foo":"bar"}
# add (of_string "/baz") obj ~value:(Jsont.Json.string "qux")
  ;;
- : Jsont.json = {"foo":"bar","baz":"qux"}

For arrays, add inserts BEFORE the specified index:

# let arr_obj = parse_json {|{"foo":["a","b"]}|};;
val arr_obj : Jsont.json = {"foo":["a","b"]}
# add (of_string "/foo/1") arr_obj ~value:(Jsont.Json.string "X")
  ;;
- : Jsont.json = {"foo":["a","X","b"]}

This is where the - marker shines - it appends to the end:

# add (of_string "/foo/-") arr_obj ~value:(Jsont.Json.string "c")
  ;;
- : Jsont.json = {"foo":["a","b","c"]}

Remove#

The remove operation deletes a value:

# let two_fields = parse_json {|{"foo":"bar","baz":"qux"}|};;
val two_fields : Jsont.json = {"foo":"bar","baz":"qux"}
# remove (of_string "/baz") two_fields ;;
- : Jsont.json = {"foo":"bar"}

For arrays, it removes and shifts:

# let three_elem = parse_json {|{"foo":["a","b","c"]}|};;
val three_elem : Jsont.json = {"foo":["a","b","c"]}
# remove (of_string "/foo/1") three_elem ;;
- : Jsont.json = {"foo":["a","c"]}

Replace#

The replace operation updates an existing value:

# replace (of_string "/foo") obj ~value:(Jsont.Json.string "baz")
  ;;
- : Jsont.json = {"foo":"baz"}

Unlike add, replace requires the target to already exist. Attempting to replace a nonexistent path raises an error.

Move#

The move operation relocates a value:

# let nested = parse_json {|{"foo":{"bar":"baz"},"qux":{}}|};;
val nested : Jsont.json = {"foo":{"bar":"baz"},"qux":{}}
# move ~from:(of_string "/foo/bar") ~path:(of_string "/qux/thud") nested
  ;;
- : Jsont.json = {"foo":{},"qux":{"thud":"baz"}}

Copy#

The copy operation duplicates a value:

# let to_copy = parse_json {|{"foo":{"bar":"baz"}}|};;
val to_copy : Jsont.json = {"foo":{"bar":"baz"}}
# copy ~from:(of_string "/foo/bar") ~path:(of_string "/foo/qux") to_copy
  ;;
- : Jsont.json = {"foo":{"bar":"baz","qux":"baz"}}

Test#

The test operation verifies a value (useful in JSON Patch):

# test (of_string "/foo") obj ~expected:(Jsont.Json.string "bar");;
- : bool = true
# test (of_string "/foo") obj ~expected:(Jsont.Json.string "wrong");;
- : bool = false

Escaping Special Characters#

RFC 6901, Section 3 explains the escaping rules:

Because the characters '~' (%x7E) and '/' (%x2F) have special meanings in JSON Pointer, '~' needs to be encoded as '~0' and '/' needs to be encoded as '~1' when these characters appear in a reference token.

Why these specific characters?

  • / separates tokens, so it must be escaped inside a token
  • ~ is the escape character itself, so it must also be escaped

The escape sequences are:

  • ~0 represents ~ (tilde)
  • ~1 represents / (forward slash)

The Library Handles Escaping Automatically#

Important: When using jsont-pointer programmatically, you rarely need to think about escaping. The Mem variant stores unescaped strings, and escaping happens automatically during serialization:

# let p = make [`Mem "a/b"];;
val p : t = [`Mem "a/b"]
# to_string p;;
- : string = "/a~1b"
# of_string "/a~1b";;
- : t = [`Mem "a/b"]

Escaping in Action#

The Token module exposes the escaping functions:

# Token.escape "hello";;
- : string = "hello"
# Token.escape "a/b";;
- : string = "a~1b"
# Token.escape "a~b";;
- : string = "a~0b"
# Token.escape "~/";;
- : string = "~0~1"

Unescaping#

And the reverse process:

# Token.unescape "a~1b";;
- : string = "a/b"
# Token.unescape "a~0b";;
- : string = "a~b"

The Order Matters!#

RFC 6901, Section 4 is careful to specify the unescaping order:

Evaluation of each reference token begins by decoding any escaped character sequence. This is performed by first transforming any occurrence of the sequence '~1' to '/', and then transforming any occurrence of the sequence '0' to ''. By performing the substitutions in this order, an implementation avoids the error of turning '~01' first into '~1' and then into '/', which would be incorrect (the string '~01' correctly becomes '~1' after transformation).

Let's verify this tricky case:

# Token.unescape "~01";;
- : string = "~1"

If we unescaped ~0 first, ~01 would become ~1, which would then become /. But that's wrong! The sequence ~01 should become the literal string ~1 (a tilde followed by the digit one).

URI Fragment Encoding#

JSON Pointers can be embedded in URIs. RFC 6901, Section 6 explains:

A JSON Pointer can be represented in a URI fragment identifier by encoding it into octets using UTF-8, while percent-encoding those characters not allowed by the fragment rule in RFC 3986.

This adds percent-encoding on top of the ~0/~1 escaping:

# to_uri_fragment (of_string "/foo");;
- : string = "/foo"
# to_uri_fragment (of_string "/a~1b");;
- : string = "/a~1b"
# to_uri_fragment (of_string "/c%d");;
- : string = "/c%25d"
# to_uri_fragment (of_string "/ ");;
- : string = "/%20"

The % character must be percent-encoded as %25 in URIs, and spaces become %20.

Here's the RFC example showing the URI fragment forms:

JSON Pointer URI Fragment Value
"" # whole document
"/foo" #/foo ["bar", "baz"]
"/foo/0" #/foo/0 "bar"
"/" #/ 0
"/a~1b" #/a~1b 1
"/c%d" #/c%25d 2
"/ " #/%20 7
"/m~0n" #/m~0n 8

Building Pointers Programmatically#

Instead of parsing strings, you can build pointers from indices:

# let port_ptr = make [`Mem "database"; `Mem "port"];;
val port_ptr : t = [`Mem "database"; `Mem "port"]
# to_string port_ptr;;
- : string = "/database/port"

For array access, use Nth:

# let first_feature_ptr = make [`Mem "features"; `Nth 0];;
val first_feature_ptr : t = [`Mem "features"; `Nth 0]
# to_string first_feature_ptr;;
- : string = "/features/0"

Pointer Navigation#

You can build pointers incrementally using append:

# let db_ptr = of_string "/database";;
val db_ptr : t = [`Mem "database"]
# let creds_ptr = append db_ptr (`Mem "credentials");;
val creds_ptr : t = [`Mem "database"; `Mem "credentials"]
# let user_ptr = append creds_ptr (`Mem "username");;
val user_ptr : t = [`Mem "database"; `Mem "credentials"; `Mem "username"]
# to_string user_ptr;;
- : string = "/database/credentials/username"

Or concatenate two pointers:

# let base = of_string "/api/v1";;
val base : t = [`Mem "api"; `Mem "v1"]
# let endpoint = of_string "/users/0";;
val endpoint : t = [`Mem "users"; `Nth 0]
# to_string (concat base endpoint);;
- : string = "/api/v1/users/0"

Jsont Integration#

The library integrates with the Jsont codec system, allowing you to combine JSON Pointer navigation with typed decoding. This is powerful because you can point to a location in a JSON document and decode it directly to an OCaml type.

# let config_json = parse_json {|{
    "database": {
      "host": "localhost",
      "port": 5432,
      "credentials": {"username": "admin", "password": "secret"}
    },
    "features": ["auth", "logging", "metrics"]
  }|};;
val config_json : Jsont.json =
  {"database":{"host":"localhost","port":5432,"credentials":{"username":"admin","password":"secret"}},"features":["auth","logging","metrics"]}

Typed Access with path#

The path combinator combines pointer navigation with typed decoding:

# let db_host =
    Jsont.Json.decode
      (path (of_string "/database/host") Jsont.string)
      config_json
    |> Result.get_ok;;
val db_host : string = "localhost"
# let db_port =
    Jsont.Json.decode
      (path (of_string "/database/port") Jsont.int)
      config_json
    |> Result.get_ok;;
val db_port : int = 5432

Extract a list of strings:

# let features =
    Jsont.Json.decode
      (path (of_string "/features") Jsont.(list string))
      config_json
    |> Result.get_ok;;
val features : string list = ["auth"; "logging"; "metrics"]

Default Values with ~absent#

Use ~absent to provide a default when a path doesn't exist:

# let timeout =
    Jsont.Json.decode
      (path ~absent:30 (of_string "/database/timeout") Jsont.int)
      config_json
    |> Result.get_ok;;
val timeout : int = 30

Nested Path Extraction#

You can extract values from deeply nested structures:

# let org_json = parse_json {|{
    "organization": {
      "owner": {"name": "Alice", "email": "alice@example.com", "age": 35},
      "members": [{"name": "Bob", "email": "bob@example.com", "age": 28}]
    }
  }|};;
val org_json : Jsont.json =
  {"organization":{"owner":{"name":"Alice","email":"alice@example.com","age":35},"members":[{"name":"Bob","email":"bob@example.com","age":28}]}}
# Jsont.Json.decode
    (path (of_string "/organization/owner/name") Jsont.string)
    org_json
  |> Result.get_ok;;
- : string = "Alice"
# Jsont.Json.decode
    (path (of_string "/organization/members/0/age") Jsont.int)
    org_json
  |> Result.get_ok;;
- : int = 28

Comparison: Raw vs Typed Access#

Raw access requires pattern matching:

# let raw_port =
    match get (of_string "/database/port") config_json with
    | Jsont.Number (f, _) -> int_of_float f
    | _ -> failwith "expected number";;
val raw_port : int = 5432

Typed access is cleaner and type-safe:

# let typed_port =
    Jsont.Json.decode
      (path (of_string "/database/port") Jsont.int)
      config_json
    |> Result.get_ok;;
val typed_port : int = 5432

The typed approach catches mismatches at decode time with clear errors.

Summary#

JSON Pointer (RFC 6901) provides a simple but powerful way to address values within JSON documents:

  1. Syntax: Pointers are strings of /-separated reference tokens
  2. Escaping: Use ~0 for ~ and ~1 for / in tokens (handled automatically by the library)
  3. Evaluation: Tokens navigate through objects (by key) and arrays (by index)
  4. URI Encoding: Pointers can be percent-encoded for use in URIs
  5. Mutations: Combined with JSON Patch (RFC 6902), pointers enable structured updates

The jsont-pointer library implements all of this with type-safe OCaml interfaces, integration with the jsont codec system, and proper error handling for malformed pointers and missing values.