···11+# CLAUDE.md
22+33+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
44+55+## Project Overview
66+77+Jacquard is a suite of Rust crates for the AT Protocol (atproto/Bluesky). The project emphasizes spec-compliant, validated, performant baseline types with minimal boilerplate. Key design goals:
88+99+- Validated AT Protocol types including typed at:// URIs
1010+- Custom lexicon extension support
1111+- Lexicon `Value` type for working with unknown atproto data (dag-cbor or json)
1212+- Using as much or as little of the crates as needed
1313+1414+## Workspace Structure
1515+1616+This is a Cargo workspace with several crates:
1717+1818+- **jacquard-common**: Core AT Protocol types (DIDs, handles, at-URIs, NSIDs, TIDs, CIDs, etc.) and the `CowStr` type for efficient string handling
1919+- **jacquard-lexicon**: Lexicon parsing and Rust code generation from lexicon schemas
2020+- **jacquard-api**: Generated API bindings (currently empty/in development)
2121+- **jacquard-derive**: Derive macros for lexicon structures
2222+- **jacquard**: Main binary (currently minimal)
2323+2424+## Development Commands
2525+2626+### Using Nix (preferred)
2727+```bash
2828+# Enter dev shell
2929+nix develop
3030+3131+# Build
3232+nix build
3333+3434+# Run
3535+nix develop -c cargo run
3636+```
3737+3838+### Using Cargo/Just
3939+```bash
4040+# Build
4141+cargo build
4242+4343+# Run tests
4444+cargo test
4545+4646+# Run specific test
4747+cargo test <test_name>
4848+4949+# Run specific package tests
5050+cargo test -p <package_name>
5151+5252+# Run
5353+cargo run
5454+5555+# Auto-recompile and run
5656+just watch [ARGS]
5757+5858+# Format and lint all
5959+just pre-commit-all
6060+```
6161+6262+## String Type Pattern
6363+6464+The codebase uses a consistent pattern for validated string types. Each type should have:
6565+6666+### Constructors
6767+- `new()`: Construct from a string slice with appropriate lifetime (borrows)
6868+- `new_owned()`: Construct from `impl AsRef<str>`, taking ownership
6969+- `new_static()`: Construct from `&'static str` using `SmolStr`/`CowStr`'s static constructor (no allocation)
7070+- `raw()`: Same as `new()` but panics instead of returning `Result`
7171+- `unchecked()`: Same as `new()` but doesn't validate (marked `unsafe`)
7272+- `as_str()`: Return string slice
7373+7474+### Traits
7575+All string types should implement:
7676+- `Serialize` + `Deserialize` (custom impl for latter, sometimes for former)
7777+- `FromStr`, `Display`
7878+- `Debug`, `PartialEq`, `Eq`, `Hash`, `Clone`
7979+- `From<T> for String`, `CowStr`, `SmolStr`
8080+- `From<String>`, `From<CowStr>`, `From<SmolStr>`, or `TryFrom` if likely to fail
8181+- `AsRef<str>`
8282+- `Deref` with `Target = str` (usually)
8383+8484+### Implementation Details
8585+- Use `#[repr(transparent)]` when possible (exception: at-uri type and components)
8686+- Use `SmolStr` directly as inner type if most instances will be under 24 bytes
8787+- Use `CowStr` for longer strings to allow borrowing from input
8888+- Implement `IntoStatic` trait to take ownership of string types
8989+9090+## Code Style
9191+9292+- Avoid comments for self-documenting code
9393+- Comments should not detail fixes when refactoring
9494+- Professional writing within source code and comments only
9595+- Prioritize long-term maintainability over implementation speed
9696+9797+## Testing
9898+9999+- Write test cases for all critical code
100100+- Tests can be run per-package or workspace-wide
101101+- Use `cargo test <name>` to run specific tests
102102+- Current test coverage: 89 tests in jacquard-common
103103+104104+## Current State & Next Steps
105105+106106+### Completed
107107+- ✅ Comprehensive validation tests for all core string types (handle, DID, NSID, TID, record key, AT-URI, datetime, language, identifier)
108108+- ✅ Validated implementations against AT Protocol specs and TypeScript reference implementation
109109+- ✅ String type interface standardization (Language now has `new_static()`, Datetime has full conversion traits)
110110+- ✅ Data serialization: Full serialize/deserialize for `Data<'_>`, `Array`, `Object` with format-specific handling (JSON vs CBOR)
111111+- ✅ CidLink wrapper type with automatic `{"$link": "cid"}` serialization in JSON
112112+- ✅ Integration test with real Bluesky thread data validates round-trip correctness
113113+114114+### Next Steps
115115+1. **Lexicon Code Generation**: Begin work on lexicon-to-Rust code generation now that core types are stable
···1212#[repr(transparent)]
1313pub struct Did<'d>(CowStr<'d>);
14141515+/// Regex for DID validation per AT Protocol spec.
1616+///
1717+/// Note: This regex allows `%` in the identifier but prevents DIDs from ending with `:` or `%`.
1818+/// It does NOT validate that percent-encoding is well-formed (i.e., `%XX` where XX are hex digits).
1919+/// This matches the behavior of the official TypeScript implementation, which also does not
2020+/// enforce percent-encoding validity at validation time. While the spec states "percent sign
2121+/// must be followed by two hex characters," this is treated as a best practice rather than
2222+/// a hard validation requirement.
1523pub static DID_REGEX: LazyLock<Regex> =
1624 LazyLock::new(|| Regex::new(r"^did:[a-z]+:[a-zA-Z0-9._:%-]*[a-zA-Z0-9._-]$").unwrap());
1725···193201 self.as_str()
194202 }
195203}
204204+205205+#[cfg(test)]
206206+mod tests {
207207+ use super::*;
208208+209209+ #[test]
210210+ fn valid_dids() {
211211+ assert!(Did::new("did:plc:abc123").is_ok());
212212+ assert!(Did::new("did:web:example.com").is_ok());
213213+ assert!(Did::new("did:method:val_ue").is_ok());
214214+ assert!(Did::new("did:method:val-ue").is_ok());
215215+ assert!(Did::new("did:method:val.ue").is_ok());
216216+ assert!(Did::new("did:method:val%20ue").is_ok());
217217+ }
218218+219219+ #[test]
220220+ fn prefix_stripping() {
221221+ assert_eq!(Did::new("at://did:plc:foo").unwrap().as_str(), "did:plc:foo");
222222+ assert_eq!(Did::new("did:plc:foo").unwrap().as_str(), "did:plc:foo");
223223+ }
224224+225225+ #[test]
226226+ fn must_start_with_did() {
227227+ assert!(Did::new("DID:plc:foo").is_err());
228228+ assert!(Did::new("plc:foo").is_err());
229229+ assert!(Did::new("foo").is_err());
230230+ }
231231+232232+ #[test]
233233+ fn method_must_be_lowercase() {
234234+ assert!(Did::new("did:plc:foo").is_ok());
235235+ assert!(Did::new("did:PLC:foo").is_err());
236236+ assert!(Did::new("did:Plc:foo").is_err());
237237+ }
238238+239239+ #[test]
240240+ fn cannot_end_with_colon_or_percent() {
241241+ assert!(Did::new("did:plc:foo:").is_err());
242242+ assert!(Did::new("did:plc:foo%").is_err());
243243+ assert!(Did::new("did:plc:foo:bar").is_ok());
244244+ }
245245+246246+ #[test]
247247+ fn max_length() {
248248+ let valid_2048 = format!("did:plc:{}", "a".repeat(2048 - 8));
249249+ assert_eq!(valid_2048.len(), 2048);
250250+ assert!(Did::new(&valid_2048).is_ok());
251251+252252+ let too_long_2049 = format!("did:plc:{}", "a".repeat(2049 - 8));
253253+ assert_eq!(too_long_2049.len(), 2049);
254254+ assert!(Did::new(&too_long_2049).is_err());
255255+ }
256256+257257+ #[test]
258258+ fn allowed_characters() {
259259+ assert!(Did::new("did:method:abc123").is_ok());
260260+ assert!(Did::new("did:method:ABC123").is_ok());
261261+ assert!(Did::new("did:method:a_b_c").is_ok());
262262+ assert!(Did::new("did:method:a-b-c").is_ok());
263263+ assert!(Did::new("did:method:a.b.c").is_ok());
264264+ assert!(Did::new("did:method:a:b:c").is_ok());
265265+ }
266266+267267+ #[test]
268268+ fn disallowed_characters() {
269269+ assert!(Did::new("did:method:a b").is_err());
270270+ assert!(Did::new("did:method:a@b").is_err());
271271+ assert!(Did::new("did:method:a#b").is_err());
272272+ assert!(Did::new("did:method:a?b").is_err());
273273+ }
274274+275275+ #[test]
276276+ fn percent_encoding() {
277277+ // Valid percent encoding
278278+ assert!(Did::new("did:method:foo%20bar").is_ok());
279279+ assert!(Did::new("did:method:foo%2Fbar").is_ok());
280280+281281+ // DIDs cannot end with %
282282+ assert!(Did::new("did:method:foo%").is_err());
283283+284284+ // IMPORTANT: The regex does NOT validate that percent-encoding is well-formed.
285285+ // This matches the TypeScript reference implementation's behavior.
286286+ // While the spec says "percent sign must be followed by two hex characters",
287287+ // implementations treat this as a best practice, not a hard validation requirement.
288288+ // Thus, malformed percent encoding like %2x is accepted by the regex.
289289+ assert!(Did::new("did:method:foo%2x").is_ok());
290290+ assert!(Did::new("did:method:foo%ZZ").is_ok());
291291+ }
292292+}
+120-26
crates/jacquard-common/src/types/handle.rs
···2121impl<'h> Handle<'h> {
2222 /// Fallible constructor, validates, borrows from input
2323 ///
2424- /// Accepts (and strips) preceding '@' if present
2424+ /// Accepts (and strips) preceding '@' or 'at://' if present
2525 pub fn new(handle: &'h str) -> Result<Self, AtStrError> {
2626- let handle = handle
2626+ let stripped = handle
2727 .strip_prefix("at://")
2828- .unwrap_or(handle)
2929- .strip_prefix('@')
2828+ .or_else(|| handle.strip_prefix('@'))
3029 .unwrap_or(handle);
3131- if handle.len() > 253 {
3232- Err(AtStrError::too_long("handle", handle, 253, handle.len()))
3333- } else if !HANDLE_REGEX.is_match(handle) {
3030+3131+ if stripped.len() > 253 {
3232+ Err(AtStrError::too_long("handle", stripped, 253, stripped.len()))
3333+ } else if !HANDLE_REGEX.is_match(stripped) {
3434 Err(AtStrError::regex(
3535 "handle",
3636- handle,
3636+ stripped,
3737 SmolStr::new_static("invalid"),
3838 ))
3939- } else if ends_with(handle, DISALLOWED_TLDS) {
4040- Err(AtStrError::disallowed("handle", handle, DISALLOWED_TLDS))
3939+ } else if ends_with(stripped, DISALLOWED_TLDS) {
4040+ Err(AtStrError::disallowed("handle", stripped, DISALLOWED_TLDS))
4141 } else {
4242- Ok(Self(CowStr::Borrowed(handle)))
4242+ Ok(Self(CowStr::Borrowed(stripped)))
4343 }
4444 }
45454646 /// Fallible constructor, validates, takes ownership
4747 pub fn new_owned(handle: impl AsRef<str>) -> Result<Self, AtStrError> {
4848 let handle = handle.as_ref();
4949- let handle = handle
4949+ let stripped = handle
5050 .strip_prefix("at://")
5151- .unwrap_or(handle)
5252- .strip_prefix('@')
5151+ .or_else(|| handle.strip_prefix('@'))
5352 .unwrap_or(handle);
5353+ let handle = stripped;
5454 if handle.len() > 253 {
5555 Err(AtStrError::too_long("handle", handle, 253, handle.len()))
5656 } else if !HANDLE_REGEX.is_match(handle) {
···68686969 /// Fallible constructor, validates, doesn't allocate
7070 pub fn new_static(handle: &'static str) -> Result<Self, AtStrError> {
7171- let handle = handle
7171+ let stripped = handle
7272 .strip_prefix("at://")
7373- .unwrap_or(handle)
7474- .strip_prefix('@')
7373+ .or_else(|| handle.strip_prefix('@'))
7574 .unwrap_or(handle);
7575+ let handle = stripped;
7676 if handle.len() > 253 {
7777 Err(AtStrError::too_long("handle", handle, 253, handle.len()))
7878 } else if !HANDLE_REGEX.is_match(handle) {
···9292 /// or API values you know are valid (rather than using serde), this is the one to use.
9393 /// The From<String> and From<CowStr> impls use the same logic.
9494 ///
9595- /// Accepts (and strips) preceding '@' if present
9595+ /// Accepts (and strips) preceding '@' or 'at://' if present
9696 pub fn raw(handle: &'h str) -> Self {
9797- let handle = handle
9797+ let stripped = handle
9898 .strip_prefix("at://")
9999- .unwrap_or(handle)
100100- .strip_prefix('@')
9999+ .or_else(|| handle.strip_prefix('@'))
101100 .unwrap_or(handle);
101101+ let handle = stripped;
102102 if handle.len() > 253 {
103103 panic!("handle too long")
104104 } else if !HANDLE_REGEX.is_match(handle) {
···113113 /// Infallible constructor for when you *know* the string is a valid handle.
114114 /// Marked unsafe because responsibility for upholding the invariant is on the developer.
115115 ///
116116- /// Accepts (and strips) preceding '@' if present
116116+ /// Accepts (and strips) preceding '@' or 'at://' if present
117117 pub unsafe fn unchecked(handle: &'h str) -> Self {
118118- let handle = handle
118118+ let stripped = handle
119119 .strip_prefix("at://")
120120- .unwrap_or(handle)
121121- .strip_prefix('@')
120120+ .or_else(|| handle.strip_prefix('@'))
122121 .unwrap_or(handle);
123123- Self(CowStr::Borrowed(handle))
122122+ Self(CowStr::Borrowed(stripped))
124123 }
125124126125 pub fn as_str(&self) -> &str {
···208207 self.as_str()
209208 }
210209}
210210+211211+#[cfg(test)]
212212+mod tests {
213213+ use super::*;
214214+215215+ #[test]
216216+ fn valid_handles() {
217217+ assert!(Handle::new("alice.test").is_ok());
218218+ assert!(Handle::new("foo.bsky.social").is_ok());
219219+ assert!(Handle::new("a.b.c.d.e").is_ok());
220220+ assert!(Handle::new("a1.b2.c3").is_ok());
221221+ assert!(Handle::new("name-with-dash.com").is_ok());
222222+ }
223223+224224+ #[test]
225225+ fn prefix_stripping() {
226226+ assert_eq!(Handle::new("@alice.test").unwrap().as_str(), "alice.test");
227227+ assert_eq!(Handle::new("at://alice.test").unwrap().as_str(), "alice.test");
228228+ assert_eq!(Handle::new("alice.test").unwrap().as_str(), "alice.test");
229229+ }
230230+231231+ #[test]
232232+ fn max_length() {
233233+ // 253 chars: three 63-char segments + one 61-char segment + 3 dots = 253
234234+ let s1 = format!("a{}a", "b".repeat(61)); // 63
235235+ let s2 = format!("c{}c", "d".repeat(61)); // 63
236236+ let s3 = format!("e{}e", "f".repeat(61)); // 63
237237+ let s4 = format!("g{}g", "h".repeat(59)); // 61
238238+ let valid_253 = format!("{}.{}.{}.{}", s1, s2, s3, s4);
239239+ assert_eq!(valid_253.len(), 253);
240240+ assert!(Handle::new(&valid_253).is_ok());
241241+242242+ // 254 chars: make last segment 62 chars
243243+ let s4_long = format!("g{}g", "h".repeat(60)); // 62
244244+ let too_long_254 = format!("{}.{}.{}.{}", s1, s2, s3, s4_long);
245245+ assert_eq!(too_long_254.len(), 254);
246246+ assert!(Handle::new(&too_long_254).is_err());
247247+ }
248248+249249+ #[test]
250250+ fn segment_length_constraints() {
251251+ let valid_63_char_segment = format!("{}.com", "a".repeat(63));
252252+ assert!(Handle::new(&valid_63_char_segment).is_ok());
253253+254254+ let too_long_64_char_segment = format!("{}.com", "a".repeat(64));
255255+ assert!(Handle::new(&too_long_64_char_segment).is_err());
256256+ }
257257+258258+ #[test]
259259+ fn hyphen_placement() {
260260+ assert!(Handle::new("valid-label.com").is_ok());
261261+ assert!(Handle::new("-nope.com").is_err());
262262+ assert!(Handle::new("nope-.com").is_err());
263263+ }
264264+265265+ #[test]
266266+ fn tld_must_start_with_letter() {
267267+ assert!(Handle::new("foo.bar").is_ok());
268268+ assert!(Handle::new("foo.9bar").is_err());
269269+ }
270270+271271+ #[test]
272272+ fn disallowed_tlds() {
273273+ assert!(Handle::new("foo.local").is_err());
274274+ assert!(Handle::new("foo.localhost").is_err());
275275+ assert!(Handle::new("foo.arpa").is_err());
276276+ assert!(Handle::new("foo.invalid").is_err());
277277+ assert!(Handle::new("foo.internal").is_err());
278278+ assert!(Handle::new("foo.example").is_err());
279279+ assert!(Handle::new("foo.alt").is_err());
280280+ assert!(Handle::new("foo.onion").is_err());
281281+ }
282282+283283+ #[test]
284284+ fn minimum_segments() {
285285+ assert!(Handle::new("a.b").is_ok());
286286+ assert!(Handle::new("a").is_err());
287287+ assert!(Handle::new("com").is_err());
288288+ }
289289+290290+ #[test]
291291+ fn invalid_characters() {
292292+ assert!(Handle::new("foo!bar.com").is_err());
293293+ assert!(Handle::new("foo_bar.com").is_err());
294294+ assert!(Handle::new("foo bar.com").is_err());
295295+ assert!(Handle::new("foo@bar.com").is_err());
296296+ }
297297+298298+ #[test]
299299+ fn empty_segments() {
300300+ assert!(Handle::new("foo..com").is_err());
301301+ assert!(Handle::new(".foo.com").is_err());
302302+ assert!(Handle::new("foo.com.").is_err());
303303+ }
304304+}
+37
crates/jacquard-common/src/types/ident.rs
···168168 }
169169 }
170170}
171171+172172+#[cfg(test)]
173173+mod tests {
174174+ use super::*;
175175+176176+ #[test]
177177+ fn parses_did() {
178178+ let ident = AtIdentifier::new("did:plc:foo").unwrap();
179179+ assert!(matches!(ident, AtIdentifier::Did(_)));
180180+ assert_eq!(ident.as_str(), "did:plc:foo");
181181+ }
182182+183183+ #[test]
184184+ fn parses_handle() {
185185+ let ident = AtIdentifier::new("alice.test").unwrap();
186186+ assert!(matches!(ident, AtIdentifier::Handle(_)));
187187+ assert_eq!(ident.as_str(), "alice.test");
188188+ }
189189+190190+ #[test]
191191+ fn did_takes_precedence() {
192192+ // DID is tried first, so valid DIDs are parsed as DIDs
193193+ let ident = AtIdentifier::new("did:web:alice.test").unwrap();
194194+ assert!(matches!(ident, AtIdentifier::Did(_)));
195195+ }
196196+197197+ #[test]
198198+ fn from_types() {
199199+ let did = Did::new("did:plc:foo").unwrap();
200200+ let ident: AtIdentifier = did.into();
201201+ assert!(matches!(ident, AtIdentifier::Did(_)));
202202+203203+ let handle = Handle::new("alice.test").unwrap();
204204+ let ident: AtIdentifier = handle.into();
205205+ assert!(matches!(ident, AtIdentifier::Handle(_)));
206206+ }
207207+}
+35-3
crates/jacquard-common/src/types/language.rs
···2222 T: AsRef<str> + ?Sized,
2323 {
2424 let tag = langtag::LangTag::new(lang)?;
2525- Ok(Language(SmolStr::new_inline(tag.as_str())))
2525+ Ok(Language(SmolStr::new(tag.as_str())))
2626+ }
2727+2828+ /// Parses an IETF language tag from a static string.
2929+ pub fn new_static(lang: &'static str) -> Result<Self, langtag::InvalidLangTag<&'static str>> {
3030+ let tag = langtag::LangTag::new(lang)?;
3131+ Ok(Language(SmolStr::new_static(tag.as_str())))
2632 }
27332834 /// Infallible constructor for when you *know* the string is a valid IETF language tag.
···3238 pub fn raw(lang: impl AsRef<str>) -> Self {
3339 let lang = lang.as_ref();
3440 let tag = langtag::LangTag::new(lang).expect("valid IETF language tag");
3535- Language(SmolStr::new_inline(tag.as_str()))
4141+ Language(SmolStr::new(tag.as_str()))
3642 }
37433844 /// Infallible constructor for when you *know* the string is a valid IETF language tag.
3945 /// Marked unsafe because responsibility for upholding the invariant is on the developer.
4046 pub unsafe fn unchecked(lang: impl AsRef<str>) -> Self {
4147 let lang = lang.as_ref();
4242- Self(SmolStr::new_inline(lang))
4848+ Self(SmolStr::new(lang))
4349 }
44504551 /// Returns the LANG as a string slice.
···112118 self.as_str()
113119 }
114120}
121121+122122+#[cfg(test)]
123123+mod tests {
124124+ use super::*;
125125+126126+ #[test]
127127+ fn valid_language_tags() {
128128+ assert!(Language::new("en").is_ok());
129129+ assert!(Language::new("en-US").is_ok());
130130+ assert!(Language::new("zh-Hans").is_ok());
131131+ assert!(Language::new("es-419").is_ok());
132132+ }
133133+134134+ #[test]
135135+ fn case_insensitive_but_preserves() {
136136+ let lang = Language::new("en-US").unwrap();
137137+ assert_eq!(lang.as_str(), "en-US");
138138+ }
139139+140140+ #[test]
141141+ fn invalid_tags() {
142142+ assert!(Language::new("").is_err());
143143+ assert!(Language::new("not_a_tag").is_err());
144144+ assert!(Language::new("123").is_err());
145145+ }
146146+}