forked from
atscan.net/plcbundle-rs
High-performance implementation of plcbundle written in Rust
1# plcbundle-rs High-Level API Design
2
3## Overview
4
5The `plcbundle-rs` library provides a unified, high-level API through the `BundleManager` type. This API is designed to be:
6
7- **Consistent**: Same patterns across all operations
8- **Efficient**: Minimal allocations, streaming where possible
9- **FFI-friendly**: Clean C bindings that map naturally to Go
10- **CLI-first**: The Rust CLI uses only public high-level APIs
11
12## Quick Reference: All Methods
13
14```rust
15// === Initialization ===
16BundleManager::new<O: IntoManagerOptions>(directory: PathBuf, options: O) -> Result<Self>
17
18// === Repository Setup ===
19init_repository(directory: PathBuf, origin: String) -> Result<()> // Creates plc_bundles.json
20
21// === Bundle Loading ===
22load_bundle(num: u32, options: LoadOptions) -> Result<LoadResult>
23get_bundle_info(num: u32, flags: InfoFlags) -> Result<BundleInfo>
24
25// === Single Operation Access ===
26get_operation_raw(bundle_num: u32, position: usize) -> Result<String>
27get_operation(bundle_num: u32, position: usize) -> Result<Operation>
28get_operation_with_stats(bundle_num: u32, position: usize) -> Result<OperationResult>
29
30// === Batch Operations ===
31get_operations_batch(requests: Vec<OperationRequest>) -> Result<Vec<Operation>>
32get_operations_range(start: u32, end: u32, filter: Option<OperationFilter>) -> RangeIterator
33
34// === Query & Export ===
35query(spec: QuerySpec) -> QueryIterator
36export(spec: ExportSpec) -> ExportIterator
37export_to_writer<W: Write>(spec: ExportSpec, writer: W) -> Result<ExportStats>
38
39// === DID Operations ===
40get_did_operations(did: &str) -> Result<Vec<Operation>>
41resolve_did(did: &str) -> Result<DIDDocument>
42batch_resolve_dids(dids: Vec<String>) -> Result<HashMap<String, Vec<Operation>>>
43sample_random_dids(count: usize, seed: Option<u64>) -> Result<Vec<String>>
44
45// === Verification ===
46verify_bundle(num: u32, spec: VerifySpec) -> Result<VerifyResult>
47verify_chain(spec: ChainVerifySpec) -> Result<ChainVerifyResult>
48
49// === Rollback ===
50rollback_plan(spec: RollbackSpec) -> Result<RollbackPlan>
51rollback(spec: RollbackSpec) -> Result<RollbackResult>
52
53// === Repository Cleanup ===
54clean() -> Result<CleanResult>
55
56// === Performance & Caching ===
57prefetch_bundles(nums: Vec<u32>) -> Result<()>
58warm_up(spec: WarmUpSpec) -> Result<()>
59clear_caches()
60
61// === DID Index ===
62build_did_index(flush_interval: u32, progress_cb: Option<F>, num_threads: Option<usize>, interrupted: Option<Arc<AtomicBool>>) -> Result<RebuildStats>
63verify_did_index(verbose: bool, flush_interval: u32, full: bool, progress_callback: Option<F>) -> Result<did_index::VerifyResult>
64repair_did_index(num_threads: usize, flush_interval: u32, progress_callback: Option<F>) -> Result<did_index::RepairResult>
65get_did_index_stats() -> HashMap<String, serde_json::Value>
66
67// === Sync Operations (Async) ===
68sync_next_bundle(client: &PLCClient) -> Result<u32>
69sync_once(client: &PLCClient) -> Result<usize>
70
71// === Remote Access (Async) ===
72fetch_remote_index(target: &str) -> Result<Index>
73fetch_remote_bundle(base_url: &str, bundle_num: u32) -> Result<Vec<Operation>>
74fetch_remote_operation(base_url: &str, bundle_num: u32, position: usize) -> Result<String>
75
76// === Server API Methods ===
77get_plc_origin() -> String
78stream_bundle_raw(bundle_num: u32) -> Result<File>
79stream_bundle_decompressed(bundle_num: u32) -> Result<Box<dyn Read + Send>>
80get_current_cursor() -> u64
81resolve_handle_or_did(input: &str) -> Result<(String, u64)>
82get_resolver_stats() -> HashMap<String, serde_json::Value>
83get_handle_resolver_base_url() -> Option<String>
84get_did_index() -> Arc<RwLock<did_index::Manager>>
85
86// === Mempool Operations ===
87get_mempool_stats() -> Result<MempoolStats>
88get_mempool_operations() -> Result<Vec<Operation>>
89add_to_mempool(ops: Vec<Operation>) -> Result<usize>
90clear_mempool() -> Result<()>
91
92// === Observability ===
93get_stats() -> ManagerStats
94get_last_bundle() -> u32
95directory() -> &PathBuf
96```
97
98---
99
100## 11. Sync & Repository Management
101
102### Repository Initialization
103
104```rust
105// Standalone function (used by CLI init command)
106pub fn init_repository(directory: PathBuf, origin: String) -> Result<()>
107```
108
109**Purpose**: Initialize a new PLC bundle repository (like `git init`).
110
111**What it does:**
112- Creates directory if it doesn't exist
113- Creates empty `plc_bundles.json` index file
114- Sets origin identifier
115
116**CLI Usage:**
117```bash
118plcbundle init
119plcbundle init /path/to/bundles --origin my-node
120```
121
122### Sync from PLC Directory
123
124```rust
125pub async fn sync_next_bundle(&mut self, client: &PLCClient) -> Result<u32>
126```
127
128**Purpose**: Fetch operations from PLC directory and create next bundle.
129
130**What it does:**
1311. Gets boundary CIDs from last bundle (prevents duplicates)
1322. Fetches operations from PLC until mempool has 10,000
1333. Deduplicates using boundary CID logic
1344. Creates and saves bundle
1355. Returns bundle number
136
137```rust
138pub async fn sync_once(&mut self, client: &PLCClient) -> Result<usize>
139```
140
141**Purpose**: Sync until caught up with PLC directory (like `git fetch`).
142
143**Returns**: Number of bundles synced
144
145**CLI Usage:**
146```bash
147# Sync once
148plcbundle sync
149
150# Continuous daemon
151plcbundle sync --continuous --interval 30s
152
153# Max bundles
154plcbundle sync --max-bundles 10
155```
156
157### PLC Client
158
159```rust
160pub struct PLCClient {
161 // Private fields
162}
163
164impl PLCClient {
165 pub fn new(base_url: impl Into<String>) -> Result<Self>
166 pub async fn fetch_operations(&self, after: &str, count: usize) -> Result<Vec<PLCOperation>>
167}
168```
169
170**Features:**
171- Rate limiting (90 requests/minute)
172- Automatic retries with exponential backoff
173- 429 (rate limit) handling
174
175### Mempool Operations
176
177```rust
178pub fn get_mempool_stats(&self) -> Result<MempoolStats>
179```
180
181**Purpose**: Get current mempool statistics.
182
183```rust
184pub struct MempoolStats {
185 pub count: usize,
186 pub can_create_bundle: bool, // count >= 10,000
187 pub target_bundle: u32,
188 pub min_timestamp: DateTime<Utc>,
189 pub first_time: Option<DateTime<Utc>>,
190 pub last_time: Option<DateTime<Utc>>,
191 pub size_bytes: Option<usize>,
192 pub did_count: Option<usize>,
193}
194```
195
196```rust
197pub fn get_mempool_operations(&self) -> Result<Vec<Operation>>
198pub fn add_to_mempool(&self, ops: Vec<Operation>) -> Result<usize>
199pub fn clear_mempool(&self) -> Result<()>
200```
201
202**CLI Usage:**
203```bash
204plcbundle mempool status
205plcbundle mempool dump
206plcbundle mempool clear
207```
208
209### Boundary CID Deduplication
210
211Critical helper functions for preventing duplicates across bundle boundaries:
212
213```rust
214pub fn get_boundary_cids(operations: &[Operation]) -> HashSet<String>
215```
216
217**Purpose**: Extract CIDs that share the same timestamp as the last operation.
218
219```rust
220pub fn strip_boundary_duplicates(
221 operations: Vec<Operation>,
222 prev_boundary: &HashSet<String>
223) -> Vec<Operation>
224```
225
226**Purpose**: Remove operations that were in the previous bundle's boundary.
227
228**Why this matters**: Operations can have identical timestamps. When creating bundles, the last N operations might share a timestamp. The next fetch might return these same operations again. Must deduplicate!
229
230---
231
232## Design Principles
233
2341. **Single Entry Point**: All operations go through `BundleManager`
2352. **Options Pattern**: Complex operations use dedicated option structs
2363. **Result Types**: Operations return structured result types, not raw tuples
2374. **Streaming by Default**: Use iterators for large datasets
2385. **No Direct File Access**: CLI never opens bundle files directly
239
240## API Structure
241
242### Core Manager
243
244```rust
245pub struct BundleManager {
246 // Private fields
247}
248
249impl BundleManager {
250 /// Create a new BundleManager with options
251 ///
252 /// # Examples
253 ///
254 /// ```rust
255 /// use plcbundle::{BundleManager, ManagerOptions};
256 /// use std::path::PathBuf;
257 ///
258 /// // With default options
259 /// let manager = BundleManager::new(PathBuf::from("."), ())?;
260 ///
261 /// // With custom options
262 /// let options = ManagerOptions {
263 /// handle_resolver_url: Some("https://example.com".to_string()),
264 /// preload_mempool: true,
265 /// verbose: true,
266 /// };
267 /// let manager = BundleManager::new(PathBuf::from("."), options)?;
268 /// ```
269 pub fn new<O: IntoManagerOptions>(directory: PathBuf, options: O) -> Result<Self>
270}
271```
272
273#### ManagerOptions
274
275```rust
276/// Options for configuring BundleManager initialization
277pub struct ManagerOptions {
278 /// Optional handle resolver URL for resolving @handle.did identifiers
279 pub handle_resolver_url: Option<String>,
280 /// Whether to preload the mempool at initialization (for server use)
281 pub preload_mempool: bool,
282 /// Whether to enable verbose logging
283 pub verbose: bool,
284}
285
286impl Default for ManagerOptions {
287 fn default() -> Self {
288 Self {
289 handle_resolver_url: None,
290 preload_mempool: false,
291 verbose: false,
292 }
293 }
294}
295```
296
297**Usage:**
298- Pass `()` to use default options: `BundleManager::new(dir, ())`
299- Pass `ManagerOptions` for custom configuration (including verbose mode)
300
301---
302
303## 1. Bundle Loading
304
305### Individual Bundle Loading
306
307```rust
308pub fn load_bundle(&self, bundle_num: u32, options: LoadOptions) -> Result<LoadResult>
309```
310
311**Purpose**: Load a single bundle with control over parsing, verification, and caching.
312
313**Options:**
314```rust
315pub struct LoadOptions {
316 pub verify_hash: bool, // Verify bundle integrity
317 pub decompress: bool, // Decompress bundle data
318 pub cache: bool, // Cache in memory
319 pub parse_operations: bool, // Parse JSON into Operation structs
320}
321```
322
323**Result:**
324```rust
325pub struct LoadResult {
326 pub bundle_number: u32,
327 pub operations: Vec<Operation>,
328 pub metadata: BundleMetadata,
329 pub hash: Option<String>,
330}
331```
332
333**Use Cases:**
334- CLI: `plcbundle info --bundle 42`
335- CLI: `plcbundle verify --bundle 42`
336
337---
338
339## 2. Operation Access
340
341### Single Operation
342
343```rust
344pub fn get_operation(&self, bundle_num: u32, position: usize, options: OperationOptions) -> Result<Operation>
345```
346
347**Purpose**: Efficiently retrieve a single operation without loading entire bundle.
348
349**Options:**
350```rust
351pub struct OperationOptions {
352 pub raw_json: bool, // Return raw JSON string (faster)
353 pub parse: bool, // Parse into Operation struct
354}
355```
356
357**Use Cases:**
358- CLI: `plcbundle op get 42 1337`
359- CLI: `plcbundle op show 420000`
360
361### Batch Operations
362
363```rust
364pub fn get_operations_batch(&self, requests: Vec<OperationRequest>) -> Result<Vec<Operation>>
365```
366
367**Purpose**: Retrieve multiple operations in one call (optimizes file I/O).
368
369```rust
370pub struct OperationRequest {
371 pub bundle_num: u32,
372 pub position: usize,
373}
374```
375
376### Range Operations
377
378```rust
379pub fn get_operations_range(&self, start: u64, end: u64, filter: Option<OperationFilter>) -> Result<RangeIterator>
380```
381
382**Purpose**: Stream operations across bundle boundaries by global position.
383
384---
385
386## 3. DID Operations
387
388### DID Lookup
389
390```rust
391pub fn get_did_operations(&self, did: &str) -> Result<Vec<Operation>>
392```
393
394**Purpose**: Get all operations for a specific DID (requires DID index).
395
396### Batch DID Lookup
397
398```rust
399pub fn batch_resolve_dids(&self, dids: Vec<String>) -> Result<HashMap<String, Vec<Operation>>>
400```
401
402**Purpose**: Efficiently resolve multiple DIDs in one call.
403
404**Use Cases:**
405- Bulk DID resolution
406- Identity verification workflows
407
408### Random DID Sampling
409
410```rust
411pub fn sample_random_dids(&self, count: usize, seed: Option<u64>) -> Result<Vec<String>>
412```
413
414**Purpose**: Retrieve pseudo-random DIDs directly from the DID index without touching bundle files.
415
416**Details:**
417- Reads identifiers from memory-mapped shard data (no decompression).
418- Deterministic when `seed` is provided; otherwise uses current timestamp.
419- Useful for benchmarks, sampling, or quick spot checks.
420
421---
422
423## 4. Query & Export (Streaming)
424
425### Query Operations
426
427```rust
428pub fn query(&self, spec: QuerySpec) -> Result<QueryIterator>
429```
430
431**Purpose**: Execute JMESPath or simple path queries across bundles.
432
433```rust
434pub struct QuerySpec {
435 pub expression: String, // JMESPath or simple path
436 pub bundles: BundleRange, // Which bundles to search
437 pub mode: QueryMode, // Auto, Simple, or JMESPath
438 pub filter: Option<OperationFilter>,
439 pub parallel: Option<usize>, // Number of threads (0=auto)
440 pub batch_size: usize, // Output batch size
441}
442
443pub enum QueryMode {
444 Auto,
445 Simple, // Fast path: direct field access
446 JMESPath, // Full JMESPath evaluation
447}
448```
449
450**Iterator:**
451```rust
452pub struct QueryIterator {
453 // Implements Iterator<Item = Result<String>>
454}
455```
456
457**Use Cases:**
458- CLI: `plcbundle query "did" --bundles 1-100`
459- CLI: `plcbundle query "operation.type" --mode simple`
460
461### Export Operations
462
463```rust
464pub fn export(&self, spec: ExportSpec) -> Result<ExportIterator>
465```
466
467**Purpose**: Export operations in various formats with filtering.
468
469```rust
470pub struct ExportSpec {
471 pub bundles: BundleRange,
472 pub format: ExportFormat, // JSONL only
473 pub filter: Option<OperationFilter>,
474 pub count: Option<usize>, // Limit number of operations
475 pub after_timestamp: Option<String>,// Filter by timestamp
476}
477
478pub enum ExportFormat {
479 Jsonl,
480}
481
482pub enum BundleRange {
483 Single(u32),
484 Range(u32, u32),
485 List(Vec<u32>),
486 All,
487}
488```
489
490**Use Cases:**
491- CLI: `plcbundle export --range 1-100 --format jsonl`
492- CLI: `plcbundle export --all --after 2024-01-01T00:00:00Z`
493
494### Export with Callback
495
496```rust
497pub fn export_to_writer<W: Write>(&self, spec: ExportSpec, writer: W) -> Result<ExportStats>
498```
499
500**Purpose**: Export directly to a writer (file, stdout, network).
501
502```rust
503pub struct ExportStats {
504 pub records_written: u64,
505 pub bytes_written: u64,
506 pub bundles_processed: u32,
507 pub duration: Duration,
508}
509```
510
511**Use Cases:**
512- FFI: Stream to Go `io.Writer`
513- Direct file writing without buffering
514
515---
516
517## 5. Verification
518
519### Bundle Verification
520
521```rust
522pub fn verify_bundle(&self, bundle_num: u32, spec: VerifySpec) -> Result<VerifyResult>
523```
524
525**Purpose**: Verify bundle integrity with various checks.
526
527```rust
528pub struct VerifySpec {
529 pub check_hash: bool,
530 pub check_compression: bool,
531 pub check_json: bool,
532 pub parallel: bool,
533}
534
535pub struct VerifyResult {
536 pub valid: bool,
537 pub hash_match: Option<bool>,
538 pub compression_ok: Option<bool>,
539 pub json_valid: Option<bool>,
540 pub errors: Vec<String>,
541}
542```
543
544### Chain Verification
545
546```rust
547pub fn verify_chain(&self, spec: ChainVerifySpec) -> Result<ChainVerifyResult>
548```
549
550**Purpose**: Verify chain continuity and prev pointers.
551
552```rust
553pub struct ChainVerifySpec {
554 pub bundles: BundleRange,
555 pub check_prev_links: bool,
556 pub check_timestamps: bool,
557}
558
559pub struct ChainVerifyResult {
560 pub valid: bool,
561 pub bundles_checked: u32,
562 pub chain_breaks: Vec<ChainBreak>,
563}
564
565pub struct ChainBreak {
566 pub bundle_num: u32,
567 pub did: String,
568 pub reason: String,
569}
570```
571
572**Use Cases:**
573- CLI: `plcbundle verify --chain --bundles 1-100`
574
575---
576
577## 6. Bundle Information
578
579```rust
580pub fn get_bundle_info(&self, bundle_num: u32, flags: InfoFlags) -> Result<BundleInfo>
581```
582
583**Purpose**: Get metadata and statistics for a bundle.
584
585```rust
586pub struct InfoFlags {
587 pub include_dids: bool,
588 pub include_types: bool,
589 pub include_timeline: bool,
590}
591
592pub struct BundleInfo {
593 pub bundle_number: u32,
594 pub operation_count: u32,
595 pub did_count: Option<u32>,
596 pub compressed_size: u64,
597 pub uncompressed_size: u64,
598 pub start_time: String,
599 pub end_time: String,
600 pub hash: String,
601 pub operation_types: Option<HashMap<String, u32>>,
602}
603```
604
605**Use Cases:**
606- CLI: `plcbundle info --bundle 42 --detailed`
607
608---
609
610## 7. Rollback Operations
611
612### Plan Rollback
613
614```rust
615pub fn rollback_plan(&self, spec: RollbackSpec) -> Result<RollbackPlan>
616```
617
618**Purpose**: Preview what would be rolled back (dry-run).
619
620```rust
621pub struct RollbackSpec {
622 pub target_bundle: u32,
623 pub keep_index: bool,
624}
625
626pub struct RollbackPlan {
627 pub bundles_to_remove: Vec<u32>,
628 pub total_size: u64,
629 pub operations_affected: u64,
630}
631```
632
633### Execute Rollback
634
635```rust
636pub fn rollback(&self, spec: RollbackSpec) -> Result<RollbackResult>
637```
638
639**Purpose**: Execute the rollback operation.
640
641```rust
642pub struct RollbackResult {
643 pub bundles_removed: Vec<u32>,
644 pub bytes_freed: u64,
645 pub success: bool,
646}
647```
648
649---
650
651## 7.5. Repository Cleanup
652
653### Clean Temporary Files
654
655```rust
656pub fn clean(&self) -> Result<CleanResult>
657```
658
659**Purpose**: Remove all temporary files from the repository.
660
661**What it does:**
662- Removes all `.tmp` files from the repository root directory (e.g., `plc_bundles.json.tmp`)
663- Removes temporary files from the DID index directory `.plcbundle/` (e.g., `config.json.tmp`)
664- Removes temporary shard files from `.plcbundle/shards/` (e.g., `00.tmp`, `01.tmp`, etc.)
665
666**Result:**
667```rust
668pub struct CleanResult {
669 pub files_removed: usize,
670 pub bytes_freed: u64,
671 pub errors: Option<Vec<String>>,
672}
673```
674
675**Use Cases:**
676- CLI: `plcbundle clean`
677- Cleanup after interrupted operations
678- Maintenance tasks
679
680**Note**: Temporary files are normally cleaned up automatically during atomic write operations. This method is useful for cleaning up leftover files from interrupted operations.
681
682---
683
684## 8. Performance & Caching
685
686### Cache Management
687
688```rust
689pub fn prefetch_bundles(&self, bundle_nums: Vec<u32>) -> Result<()>
690```
691
692**Purpose**: Preload bundles into cache for faster access.
693
694```rust
695pub fn warm_up(&self, spec: WarmUpSpec) -> Result<()>
696```
697
698**Purpose**: Warm up caches with intelligent prefetching.
699
700```rust
701pub struct WarmUpSpec {
702 pub bundles: BundleRange,
703 pub strategy: WarmUpStrategy,
704}
705
706pub enum WarmUpStrategy {
707 Sequential, // Load in order
708 Parallel, // Load concurrently
709 Adaptive, // Based on access patterns
710}
711```
712
713```rust
714pub fn clear_caches(&self)
715```
716
717**Purpose**: Clear all in-memory caches.
718
719---
720
721## 9. DID Index Management
722
723### Build Index
724
725```rust
726pub fn build_did_index<F>(
727 &self,
728 flush_interval: u32,
729 progress_cb: Option<F>,
730 num_threads: Option<usize>,
731 interrupted: Option<Arc<AtomicBool>>,
732) -> Result<RebuildStats>
733where
734 F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes)
735```
736
737**Purpose**: Build or rebuild the DID → operations index from scratch.
738
739**Parameters:**
740- `flush_interval`: Flush to disk every N bundles (0 = only at end)
741- `progress_cb`: Optional progress callback
742- `num_threads`: Number of threads (None = auto-detect)
743- `interrupted`: Optional flag to check for cancellation
744
745**Result:**
746```rust
747pub struct RebuildStats {
748 pub bundles_processed: u32,
749 pub operations_indexed: u64,
750 pub unique_dids: usize,
751 pub duration: Duration,
752}
753```
754
755### Verify Index
756
757```rust
758pub fn verify_did_index<F>(
759 &self,
760 verbose: bool,
761 flush_interval: u32,
762 full: bool,
763 progress_callback: Option<F>,
764) -> Result<did_index::VerifyResult>
765where
766 F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes)
767```
768
769**Purpose**: Verify DID index integrity. Performs standard checks by default, or full verification (rebuild and compare) if `full` is true.
770
771**Parameters:**
772- `verbose`: Enable verbose logging
773- `flush_interval`: Flush interval for full rebuild (if `full` is true)
774- `full`: If true, rebuilds index in temp directory and compares with existing
775- `progress_callback`: Optional progress callback for full verification
776
777**Result:**
778```rust
779pub struct VerifyResult {
780 pub errors: usize,
781 pub warnings: usize,
782 pub missing_base_shards: usize,
783 pub missing_delta_segments: usize,
784 pub shards_checked: usize,
785 pub segments_checked: usize,
786 pub error_categories: Vec<(String, usize)>,
787 pub index_last_bundle: u32,
788 pub last_bundle_in_repo: u32,
789}
790```
791
792**Use Cases:**
793- CLI: `plcbundle index verify` (standard check)
794- CLI: `plcbundle index verify --full` (full rebuild and compare)
795- Server: Startup integrity check (call with `full=false` and check `missing_base_shards`/`missing_delta_segments`)
796
797### Repair Index
798
799```rust
800pub fn repair_did_index<F>(
801 &self,
802 num_threads: usize,
803 flush_interval: u32,
804 progress_callback: Option<F>,
805) -> Result<did_index::RepairResult>
806where
807 F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes)
808```
809
810**Purpose**: Intelligently repair the DID index by:
811- Rebuilding missing delta segments (if base shards are intact)
812- Performing incremental update (if < 1000 bundles behind)
813- Performing full rebuild (if > 1000 bundles behind or base shards corrupted)
814- Compacting delta segments (if > 50 segments)
815
816**Parameters:**
817- `num_threads`: Number of threads (0 = auto-detect)
818- `flush_interval`: Flush to disk every N bundles (0 = only at end)
819- `progress_callback`: Optional progress callback
820
821**Result:**
822```rust
823pub struct RepairResult {
824 pub repaired: bool,
825 pub compacted: bool,
826 pub bundles_processed: u32,
827 pub segments_rebuilt: usize,
828}
829```
830
831**Use Cases:**
832- CLI: `plcbundle index repair`
833- Maintenance: Fix corrupted or incomplete index
834- Recovery: Restore index after data loss
835
836### Index Statistics
837
838```rust
839pub fn get_did_index_stats(&self) -> HashMap<String, serde_json::Value>
840```
841
842**Purpose**: Get comprehensive DID index statistics.
843
844**Returns**: JSON-compatible map with fields like:
845- `exists`: Whether index exists
846- `total_dids`: Total number of unique DIDs
847- `last_bundle`: Last bundle indexed
848- `delta_segments`: Number of delta segments
849- `shard_count`: Number of shards (should be 256)
850
851**Use Cases:**
852- CLI: `plcbundle index status`
853- Server: Status endpoint
854- Monitoring: Health checks
855
856---
857
858## 10. Server API Methods
859
860The following methods are specifically designed for use by the HTTP server component. They provide streaming capabilities, cursor tracking, and resolver functionality.
861
862### Streaming Bundle Data
863
864```rust
865/// Stream bundle raw (compressed) data
866pub fn stream_bundle_raw(&self, bundle_num: u32) -> Result<std::fs::File>
867
868/// Stream bundle decompressed (JSONL) data
869pub fn stream_bundle_decompressed(&self, bundle_num: u32) -> Result<Box<dyn std::io::Read + Send>>
870```
871
872**Purpose**: Efficient streaming of bundle data for HTTP responses.
873
874**Use Cases:**
875- Server: `/data/:number` endpoint (raw compressed)
876- Server: `/jsonl/:number` endpoint (decompressed JSONL)
877- WebSocket: Streaming operations to clients
878
879### Cursor & Position Tracking
880
881```rust
882/// Get current cursor (global position of last operation)
883/// Cursor = (last_bundle * 10000) + mempool_ops_count
884pub fn get_current_cursor(&self) -> u64
885```
886
887**Purpose**: Track the global position in the operation stream for WebSocket streaming.
888
889**Use Cases:**
890- WebSocket: `/ws?cursor=N` to resume from a specific position
891- Server: Status endpoint showing current position
892
893#### Global Position Mapping
894
895- Positions are 0-indexed per bundle (`0..9,999`).
896- Global position formula: `global = ((bundle - 1) × 10,000) + position`.
897- Conversion back: `bundle = floor(global / 10,000) + 1`, `position = global % 10,000`.
898- Examples:
899 - `global 0 → bundle 1, position 0`
900 - `global 10000 → bundle 2, position 0`
901 - `global 88410345 → bundle 8842, position 345`
902- Shorthand: Small numbers `< 10000` are treated as `bundle 1` positions.
903
904### Handle & DID Resolution
905
906```rust
907/// Resolve handle to DID or validate DID format
908/// Returns (did, handle_resolve_time_ms)
909pub fn resolve_handle_or_did(&self, input: &str) -> Result<(String, u64)>
910
911/// Get handle resolver base URL (if configured)
912pub fn get_handle_resolver_base_url(&self) -> Option<String>
913
914/// Get resolver statistics
915pub fn get_resolver_stats(&self) -> HashMap<String, serde_json::Value>
916```
917
918**Purpose**: Support handle-to-DID resolution and resolver metrics.
919
920**Use Cases:**
921- Server: `/:did` endpoints (accepts both DIDs and handles)
922- Server: `/status` endpoint (resolver stats)
923- Server: `/debug/resolver` endpoint
924
925### DID Index Access
926
927```rust
928/// Get DID index manager (for stats and direct access)
929pub fn get_did_index(&self) -> Arc<RwLock<did_index::Manager>>
930```
931
932**Purpose**: Direct access to DID index for server endpoints.
933
934**Use Cases:**
935- Server: `/debug/didindex` endpoint
936- Server: `/status` endpoint (DID index stats)
937
938### PLC Origin
939
940```rust
941/// Get PLC origin from index
942pub fn get_plc_origin(&self) -> String
943```
944
945**Purpose**: Get the origin identifier for the repository.
946
947**Use Cases:**
948- Server: Root endpoint (display origin)
949- Server: `/status` endpoint
950
951---
952
953## 11. Observability
954
955### Manager Statistics
956
957```rust
958pub fn get_stats(&self) -> ManagerStats
959```
960
961**Purpose**: Get runtime statistics for monitoring.
962
963```rust
964pub struct ManagerStats {
965 pub cache_hits: u64,
966 pub cache_misses: u64,
967 pub bundles_loaded: u64,
968 pub operations_read: u64,
969 pub bytes_read: u64,
970}
971```
972
973---
974
975## Common Types
976
977### Operation Filter
978
979```rust
980pub struct OperationFilter {
981 pub did: Option<String>,
982 pub operation_type: Option<String>,
983 pub time_range: Option<(String, String)>,
984 pub include_nullified: bool,
985}
986```
987
988Used across query, export, and range operations.
989
990### Operation
991
992```rust
993pub struct Operation {
994 pub did: String,
995 pub operation: serde_json::Value,
996 pub cid: Option<String>,
997 pub nullified: bool,
998 pub created_at: String,
999 // Optional: raw_json field for zero-copy access
1000}
1001```
1002
1003---
1004
1005## CLI Usage Examples
1006
1007All CLI commands use only the high-level API:
1008
1009```bash
1010# Uses: get_operation()
1011plcbundle op get 42 1337
1012
1013# Uses: query()
1014plcbundle query "did" --bundles 1-100
1015
1016# Uses: export_to_writer()
1017plcbundle export --range 1-100 -o output.jsonl
1018
1019# Uses: get_bundle_info()
1020plcbundle info --bundle 42
1021
1022# Uses: verify_bundle()
1023plcbundle verify --bundle 42 --checksums
1024
1025# Uses: get_did_operations()
1026plcbundle did-lookup did:plc:xyz123
1027```
1028
1029---
1030
1031## FFI Mapping
1032
1033The high-level API maps cleanly to C/Go:
1034
1035### C Bindings
1036
1037```c
1038// Direct 1:1 mapping
1039// Note: C bindings use default options (equivalent to passing () in Rust)
1040CBundleManager* bundle_manager_new(const char* path);
1041int bundle_manager_load_bundle(CBundleManager* mgr, uint32_t num, CLoadOptions* opts, CLoadResult* result);
1042int bundle_manager_get_operation(CBundleManager* mgr, uint32_t bundle, size_t pos, COperation* out);
1043int bundle_manager_export(CBundleManager* mgr, CExportSpec* spec, ExportCallback cb, void* user_data);
1044```
1045
1046### Go Bindings
1047
1048```go
1049type BundleManager struct { /* ... */ }
1050
1051func (m *BundleManager) LoadBundle(num uint32, opts LoadOptions) (*LoadResult, error)
1052func (m *BundleManager) GetOperation(bundle uint32, position int, opts OperationOptions) (*Operation, error)
1053func (m *BundleManager) Query(spec QuerySpec) (*QueryIterator, error)
1054func (m *BundleManager) Export(spec ExportSpec, opts ExportOptions) (*ExportStats, error)
1055```
1056
1057---
1058
1059## Implementation Status
1060
1061### ✅ Implemented
1062- `load_bundle()`
1063- `get_bundle_info()`
1064- `query()`
1065- `export()` / `export_to_writer()`
1066- `verify_bundle()`
1067- `verify_chain()`
1068- `get_stats()`
1069- `build_did_index()`
1070- `verify_did_index()`
1071- `repair_did_index()`
1072
1073### 🚧 Needs Refactoring
1074- `get_operation()` - Currently in CLI, should be in `BundleManager`
1075- `get_operations_batch()` - Not yet implemented
1076- `get_operations_range()` - Partially implemented
1077
1078### 📋 To Implement
1079- `get_did_operations()` - Uses DID index
1080- `batch_resolve_dids()` - Batch DID lookup
1081- `prefetch_bundles()` - Cache warming
1082- `warm_up()` - Intelligent prefetching
1083
1084---
1085
1086## Migration Plan
1087
1088### Phase 1: Move Operation Access to Manager ✅ PRIORITY
1089
1090```rust
1091// Add to BundleManager
1092impl BundleManager {
1093 /// Get a single operation efficiently (reads only the needed line)
1094 pub fn get_operation(&self, bundle_num: u32, position: usize) -> Result<OperationData>
1095
1096 /// Get operation with raw JSON (zero-copy)
1097 pub fn get_operation_raw(&self, bundle_num: u32, position: usize) -> Result<String>
1098}
1099
1100pub struct OperationData {
1101 pub raw_json: String, // Original JSON from file
1102 pub parsed: Option<Operation>, // Lazy parsing
1103}
1104```
1105
1106### Phase 2: Update CLI to Use API
1107
1108```rust
1109// Before (direct file access):
1110let file = std::fs::File::open(bundle_path)?;
1111let decoder = zstd::Decoder::new(file)?;
1112// ... manual line reading
1113
1114// After (high-level API):
1115let op_data = manager.get_operation_raw(bundle_num, position)?;
1116println!("{}", op_data);
1117```
1118
1119### Phase 3: Add to FFI/Go
1120
1121Once in `BundleManager`, automatically available to FFI:
1122
1123```c
1124int bundle_manager_get_operation_raw(
1125 CBundleManager* mgr,
1126 uint32_t bundle_num,
1127 size_t position,
1128 char** out_json,
1129 size_t* out_len
1130);
1131```
1132
1133```go
1134func (m *BundleManager) GetOperationRaw(bundle uint32, position int) (string, error)
1135```
1136
1137---
1138
1139## Design Benefits
1140
11411. **Single Source of Truth**: All functionality in `BundleManager`
11422. **Testable**: High-level API is easy to unit test
11433. **Consistent**: Same patterns across Rust, C, and Go
11444. **Maintainable**: Changes in one place affect all consumers
11455. **Efficient**: API designed for performance (streaming, lazy parsing)
11466. **Safe**: No direct file access by consumers
1147
1148---
1149
1150## Next Steps
1151
11521. ✅ Add `get_operation()` / `get_operation_raw()` to `BundleManager`
11532. ✅ Update CLI `op get` to use new API
11543. ✅ Add FFI bindings for operation access
11554. ⏭️ Implement `get_operations_batch()`
11565. ⏭️ Implement DID lookup operations
11576. ⏭️ Add cache warming APIs
1158