High-performance implementation of plcbundle written in Rust
at main 1158 lines 30 kB view raw view rendered
1# plcbundle-rs High-Level API Design 2 3## Overview 4 5The `plcbundle-rs` library provides a unified, high-level API through the `BundleManager` type. This API is designed to be: 6 7- **Consistent**: Same patterns across all operations 8- **Efficient**: Minimal allocations, streaming where possible 9- **FFI-friendly**: Clean C bindings that map naturally to Go 10- **CLI-first**: The Rust CLI uses only public high-level APIs 11 12## Quick Reference: All Methods 13 14```rust 15// === Initialization === 16BundleManager::new<O: IntoManagerOptions>(directory: PathBuf, options: O) -> Result<Self> 17 18// === Repository Setup === 19init_repository(directory: PathBuf, origin: String) -> Result<()> // Creates plc_bundles.json 20 21// === Bundle Loading === 22load_bundle(num: u32, options: LoadOptions) -> Result<LoadResult> 23get_bundle_info(num: u32, flags: InfoFlags) -> Result<BundleInfo> 24 25// === Single Operation Access === 26get_operation_raw(bundle_num: u32, position: usize) -> Result<String> 27get_operation(bundle_num: u32, position: usize) -> Result<Operation> 28get_operation_with_stats(bundle_num: u32, position: usize) -> Result<OperationResult> 29 30// === Batch Operations === 31get_operations_batch(requests: Vec<OperationRequest>) -> Result<Vec<Operation>> 32get_operations_range(start: u32, end: u32, filter: Option<OperationFilter>) -> RangeIterator 33 34// === Query & Export === 35query(spec: QuerySpec) -> QueryIterator 36export(spec: ExportSpec) -> ExportIterator 37export_to_writer<W: Write>(spec: ExportSpec, writer: W) -> Result<ExportStats> 38 39// === DID Operations === 40get_did_operations(did: &str) -> Result<Vec<Operation>> 41resolve_did(did: &str) -> Result<DIDDocument> 42batch_resolve_dids(dids: Vec<String>) -> Result<HashMap<String, Vec<Operation>>> 43sample_random_dids(count: usize, seed: Option<u64>) -> Result<Vec<String>> 44 45// === Verification === 46verify_bundle(num: u32, spec: VerifySpec) -> Result<VerifyResult> 47verify_chain(spec: ChainVerifySpec) -> Result<ChainVerifyResult> 48 49// === Rollback === 50rollback_plan(spec: RollbackSpec) -> Result<RollbackPlan> 51rollback(spec: RollbackSpec) -> Result<RollbackResult> 52 53// === Repository Cleanup === 54clean() -> Result<CleanResult> 55 56// === Performance & Caching === 57prefetch_bundles(nums: Vec<u32>) -> Result<()> 58warm_up(spec: WarmUpSpec) -> Result<()> 59clear_caches() 60 61// === DID Index === 62build_did_index(flush_interval: u32, progress_cb: Option<F>, num_threads: Option<usize>, interrupted: Option<Arc<AtomicBool>>) -> Result<RebuildStats> 63verify_did_index(verbose: bool, flush_interval: u32, full: bool, progress_callback: Option<F>) -> Result<did_index::VerifyResult> 64repair_did_index(num_threads: usize, flush_interval: u32, progress_callback: Option<F>) -> Result<did_index::RepairResult> 65get_did_index_stats() -> HashMap<String, serde_json::Value> 66 67// === Sync Operations (Async) === 68sync_next_bundle(client: &PLCClient) -> Result<u32> 69sync_once(client: &PLCClient) -> Result<usize> 70 71// === Remote Access (Async) === 72fetch_remote_index(target: &str) -> Result<Index> 73fetch_remote_bundle(base_url: &str, bundle_num: u32) -> Result<Vec<Operation>> 74fetch_remote_operation(base_url: &str, bundle_num: u32, position: usize) -> Result<String> 75 76// === Server API Methods === 77get_plc_origin() -> String 78stream_bundle_raw(bundle_num: u32) -> Result<File> 79stream_bundle_decompressed(bundle_num: u32) -> Result<Box<dyn Read + Send>> 80get_current_cursor() -> u64 81resolve_handle_or_did(input: &str) -> Result<(String, u64)> 82get_resolver_stats() -> HashMap<String, serde_json::Value> 83get_handle_resolver_base_url() -> Option<String> 84get_did_index() -> Arc<RwLock<did_index::Manager>> 85 86// === Mempool Operations === 87get_mempool_stats() -> Result<MempoolStats> 88get_mempool_operations() -> Result<Vec<Operation>> 89add_to_mempool(ops: Vec<Operation>) -> Result<usize> 90clear_mempool() -> Result<()> 91 92// === Observability === 93get_stats() -> ManagerStats 94get_last_bundle() -> u32 95directory() -> &PathBuf 96``` 97 98--- 99 100## 11. Sync & Repository Management 101 102### Repository Initialization 103 104```rust 105// Standalone function (used by CLI init command) 106pub fn init_repository(directory: PathBuf, origin: String) -> Result<()> 107``` 108 109**Purpose**: Initialize a new PLC bundle repository (like `git init`). 110 111**What it does:** 112- Creates directory if it doesn't exist 113- Creates empty `plc_bundles.json` index file 114- Sets origin identifier 115 116**CLI Usage:** 117```bash 118plcbundle init 119plcbundle init /path/to/bundles --origin my-node 120``` 121 122### Sync from PLC Directory 123 124```rust 125pub async fn sync_next_bundle(&mut self, client: &PLCClient) -> Result<u32> 126``` 127 128**Purpose**: Fetch operations from PLC directory and create next bundle. 129 130**What it does:** 1311. Gets boundary CIDs from last bundle (prevents duplicates) 1322. Fetches operations from PLC until mempool has 10,000 1333. Deduplicates using boundary CID logic 1344. Creates and saves bundle 1355. Returns bundle number 136 137```rust 138pub async fn sync_once(&mut self, client: &PLCClient) -> Result<usize> 139``` 140 141**Purpose**: Sync until caught up with PLC directory (like `git fetch`). 142 143**Returns**: Number of bundles synced 144 145**CLI Usage:** 146```bash 147# Sync once 148plcbundle sync 149 150# Continuous daemon 151plcbundle sync --continuous --interval 30s 152 153# Max bundles 154plcbundle sync --max-bundles 10 155``` 156 157### PLC Client 158 159```rust 160pub struct PLCClient { 161 // Private fields 162} 163 164impl PLCClient { 165 pub fn new(base_url: impl Into<String>) -> Result<Self> 166 pub async fn fetch_operations(&self, after: &str, count: usize) -> Result<Vec<PLCOperation>> 167} 168``` 169 170**Features:** 171- Rate limiting (90 requests/minute) 172- Automatic retries with exponential backoff 173- 429 (rate limit) handling 174 175### Mempool Operations 176 177```rust 178pub fn get_mempool_stats(&self) -> Result<MempoolStats> 179``` 180 181**Purpose**: Get current mempool statistics. 182 183```rust 184pub struct MempoolStats { 185 pub count: usize, 186 pub can_create_bundle: bool, // count >= 10,000 187 pub target_bundle: u32, 188 pub min_timestamp: DateTime<Utc>, 189 pub first_time: Option<DateTime<Utc>>, 190 pub last_time: Option<DateTime<Utc>>, 191 pub size_bytes: Option<usize>, 192 pub did_count: Option<usize>, 193} 194``` 195 196```rust 197pub fn get_mempool_operations(&self) -> Result<Vec<Operation>> 198pub fn add_to_mempool(&self, ops: Vec<Operation>) -> Result<usize> 199pub fn clear_mempool(&self) -> Result<()> 200``` 201 202**CLI Usage:** 203```bash 204plcbundle mempool status 205plcbundle mempool dump 206plcbundle mempool clear 207``` 208 209### Boundary CID Deduplication 210 211Critical helper functions for preventing duplicates across bundle boundaries: 212 213```rust 214pub fn get_boundary_cids(operations: &[Operation]) -> HashSet<String> 215``` 216 217**Purpose**: Extract CIDs that share the same timestamp as the last operation. 218 219```rust 220pub fn strip_boundary_duplicates( 221 operations: Vec<Operation>, 222 prev_boundary: &HashSet<String> 223) -> Vec<Operation> 224``` 225 226**Purpose**: Remove operations that were in the previous bundle's boundary. 227 228**Why this matters**: Operations can have identical timestamps. When creating bundles, the last N operations might share a timestamp. The next fetch might return these same operations again. Must deduplicate! 229 230--- 231 232## Design Principles 233 2341. **Single Entry Point**: All operations go through `BundleManager` 2352. **Options Pattern**: Complex operations use dedicated option structs 2363. **Result Types**: Operations return structured result types, not raw tuples 2374. **Streaming by Default**: Use iterators for large datasets 2385. **No Direct File Access**: CLI never opens bundle files directly 239 240## API Structure 241 242### Core Manager 243 244```rust 245pub struct BundleManager { 246 // Private fields 247} 248 249impl BundleManager { 250 /// Create a new BundleManager with options 251 /// 252 /// # Examples 253 /// 254 /// ```rust 255 /// use plcbundle::{BundleManager, ManagerOptions}; 256 /// use std::path::PathBuf; 257 /// 258 /// // With default options 259 /// let manager = BundleManager::new(PathBuf::from("."), ())?; 260 /// 261 /// // With custom options 262 /// let options = ManagerOptions { 263 /// handle_resolver_url: Some("https://example.com".to_string()), 264 /// preload_mempool: true, 265 /// verbose: true, 266 /// }; 267 /// let manager = BundleManager::new(PathBuf::from("."), options)?; 268 /// ``` 269 pub fn new<O: IntoManagerOptions>(directory: PathBuf, options: O) -> Result<Self> 270} 271``` 272 273#### ManagerOptions 274 275```rust 276/// Options for configuring BundleManager initialization 277pub struct ManagerOptions { 278 /// Optional handle resolver URL for resolving @handle.did identifiers 279 pub handle_resolver_url: Option<String>, 280 /// Whether to preload the mempool at initialization (for server use) 281 pub preload_mempool: bool, 282 /// Whether to enable verbose logging 283 pub verbose: bool, 284} 285 286impl Default for ManagerOptions { 287 fn default() -> Self { 288 Self { 289 handle_resolver_url: None, 290 preload_mempool: false, 291 verbose: false, 292 } 293 } 294} 295``` 296 297**Usage:** 298- Pass `()` to use default options: `BundleManager::new(dir, ())` 299- Pass `ManagerOptions` for custom configuration (including verbose mode) 300 301--- 302 303## 1. Bundle Loading 304 305### Individual Bundle Loading 306 307```rust 308pub fn load_bundle(&self, bundle_num: u32, options: LoadOptions) -> Result<LoadResult> 309``` 310 311**Purpose**: Load a single bundle with control over parsing, verification, and caching. 312 313**Options:** 314```rust 315pub struct LoadOptions { 316 pub verify_hash: bool, // Verify bundle integrity 317 pub decompress: bool, // Decompress bundle data 318 pub cache: bool, // Cache in memory 319 pub parse_operations: bool, // Parse JSON into Operation structs 320} 321``` 322 323**Result:** 324```rust 325pub struct LoadResult { 326 pub bundle_number: u32, 327 pub operations: Vec<Operation>, 328 pub metadata: BundleMetadata, 329 pub hash: Option<String>, 330} 331``` 332 333**Use Cases:** 334- CLI: `plcbundle info --bundle 42` 335- CLI: `plcbundle verify --bundle 42` 336 337--- 338 339## 2. Operation Access 340 341### Single Operation 342 343```rust 344pub fn get_operation(&self, bundle_num: u32, position: usize, options: OperationOptions) -> Result<Operation> 345``` 346 347**Purpose**: Efficiently retrieve a single operation without loading entire bundle. 348 349**Options:** 350```rust 351pub struct OperationOptions { 352 pub raw_json: bool, // Return raw JSON string (faster) 353 pub parse: bool, // Parse into Operation struct 354} 355``` 356 357**Use Cases:** 358- CLI: `plcbundle op get 42 1337` 359- CLI: `plcbundle op show 420000` 360 361### Batch Operations 362 363```rust 364pub fn get_operations_batch(&self, requests: Vec<OperationRequest>) -> Result<Vec<Operation>> 365``` 366 367**Purpose**: Retrieve multiple operations in one call (optimizes file I/O). 368 369```rust 370pub struct OperationRequest { 371 pub bundle_num: u32, 372 pub position: usize, 373} 374``` 375 376### Range Operations 377 378```rust 379pub fn get_operations_range(&self, start: u64, end: u64, filter: Option<OperationFilter>) -> Result<RangeIterator> 380``` 381 382**Purpose**: Stream operations across bundle boundaries by global position. 383 384--- 385 386## 3. DID Operations 387 388### DID Lookup 389 390```rust 391pub fn get_did_operations(&self, did: &str) -> Result<Vec<Operation>> 392``` 393 394**Purpose**: Get all operations for a specific DID (requires DID index). 395 396### Batch DID Lookup 397 398```rust 399pub fn batch_resolve_dids(&self, dids: Vec<String>) -> Result<HashMap<String, Vec<Operation>>> 400``` 401 402**Purpose**: Efficiently resolve multiple DIDs in one call. 403 404**Use Cases:** 405- Bulk DID resolution 406- Identity verification workflows 407 408### Random DID Sampling 409 410```rust 411pub fn sample_random_dids(&self, count: usize, seed: Option<u64>) -> Result<Vec<String>> 412``` 413 414**Purpose**: Retrieve pseudo-random DIDs directly from the DID index without touching bundle files. 415 416**Details:** 417- Reads identifiers from memory-mapped shard data (no decompression). 418- Deterministic when `seed` is provided; otherwise uses current timestamp. 419- Useful for benchmarks, sampling, or quick spot checks. 420 421--- 422 423## 4. Query & Export (Streaming) 424 425### Query Operations 426 427```rust 428pub fn query(&self, spec: QuerySpec) -> Result<QueryIterator> 429``` 430 431**Purpose**: Execute JMESPath or simple path queries across bundles. 432 433```rust 434pub struct QuerySpec { 435 pub expression: String, // JMESPath or simple path 436 pub bundles: BundleRange, // Which bundles to search 437 pub mode: QueryMode, // Auto, Simple, or JMESPath 438 pub filter: Option<OperationFilter>, 439 pub parallel: Option<usize>, // Number of threads (0=auto) 440 pub batch_size: usize, // Output batch size 441} 442 443pub enum QueryMode { 444 Auto, 445 Simple, // Fast path: direct field access 446 JMESPath, // Full JMESPath evaluation 447} 448``` 449 450**Iterator:** 451```rust 452pub struct QueryIterator { 453 // Implements Iterator<Item = Result<String>> 454} 455``` 456 457**Use Cases:** 458- CLI: `plcbundle query "did" --bundles 1-100` 459- CLI: `plcbundle query "operation.type" --mode simple` 460 461### Export Operations 462 463```rust 464pub fn export(&self, spec: ExportSpec) -> Result<ExportIterator> 465``` 466 467**Purpose**: Export operations in various formats with filtering. 468 469```rust 470pub struct ExportSpec { 471 pub bundles: BundleRange, 472 pub format: ExportFormat, // JSONL only 473 pub filter: Option<OperationFilter>, 474 pub count: Option<usize>, // Limit number of operations 475 pub after_timestamp: Option<String>,// Filter by timestamp 476} 477 478pub enum ExportFormat { 479 Jsonl, 480} 481 482pub enum BundleRange { 483 Single(u32), 484 Range(u32, u32), 485 List(Vec<u32>), 486 All, 487} 488``` 489 490**Use Cases:** 491- CLI: `plcbundle export --range 1-100 --format jsonl` 492- CLI: `plcbundle export --all --after 2024-01-01T00:00:00Z` 493 494### Export with Callback 495 496```rust 497pub fn export_to_writer<W: Write>(&self, spec: ExportSpec, writer: W) -> Result<ExportStats> 498``` 499 500**Purpose**: Export directly to a writer (file, stdout, network). 501 502```rust 503pub struct ExportStats { 504 pub records_written: u64, 505 pub bytes_written: u64, 506 pub bundles_processed: u32, 507 pub duration: Duration, 508} 509``` 510 511**Use Cases:** 512- FFI: Stream to Go `io.Writer` 513- Direct file writing without buffering 514 515--- 516 517## 5. Verification 518 519### Bundle Verification 520 521```rust 522pub fn verify_bundle(&self, bundle_num: u32, spec: VerifySpec) -> Result<VerifyResult> 523``` 524 525**Purpose**: Verify bundle integrity with various checks. 526 527```rust 528pub struct VerifySpec { 529 pub check_hash: bool, 530 pub check_compression: bool, 531 pub check_json: bool, 532 pub parallel: bool, 533} 534 535pub struct VerifyResult { 536 pub valid: bool, 537 pub hash_match: Option<bool>, 538 pub compression_ok: Option<bool>, 539 pub json_valid: Option<bool>, 540 pub errors: Vec<String>, 541} 542``` 543 544### Chain Verification 545 546```rust 547pub fn verify_chain(&self, spec: ChainVerifySpec) -> Result<ChainVerifyResult> 548``` 549 550**Purpose**: Verify chain continuity and prev pointers. 551 552```rust 553pub struct ChainVerifySpec { 554 pub bundles: BundleRange, 555 pub check_prev_links: bool, 556 pub check_timestamps: bool, 557} 558 559pub struct ChainVerifyResult { 560 pub valid: bool, 561 pub bundles_checked: u32, 562 pub chain_breaks: Vec<ChainBreak>, 563} 564 565pub struct ChainBreak { 566 pub bundle_num: u32, 567 pub did: String, 568 pub reason: String, 569} 570``` 571 572**Use Cases:** 573- CLI: `plcbundle verify --chain --bundles 1-100` 574 575--- 576 577## 6. Bundle Information 578 579```rust 580pub fn get_bundle_info(&self, bundle_num: u32, flags: InfoFlags) -> Result<BundleInfo> 581``` 582 583**Purpose**: Get metadata and statistics for a bundle. 584 585```rust 586pub struct InfoFlags { 587 pub include_dids: bool, 588 pub include_types: bool, 589 pub include_timeline: bool, 590} 591 592pub struct BundleInfo { 593 pub bundle_number: u32, 594 pub operation_count: u32, 595 pub did_count: Option<u32>, 596 pub compressed_size: u64, 597 pub uncompressed_size: u64, 598 pub start_time: String, 599 pub end_time: String, 600 pub hash: String, 601 pub operation_types: Option<HashMap<String, u32>>, 602} 603``` 604 605**Use Cases:** 606- CLI: `plcbundle info --bundle 42 --detailed` 607 608--- 609 610## 7. Rollback Operations 611 612### Plan Rollback 613 614```rust 615pub fn rollback_plan(&self, spec: RollbackSpec) -> Result<RollbackPlan> 616``` 617 618**Purpose**: Preview what would be rolled back (dry-run). 619 620```rust 621pub struct RollbackSpec { 622 pub target_bundle: u32, 623 pub keep_index: bool, 624} 625 626pub struct RollbackPlan { 627 pub bundles_to_remove: Vec<u32>, 628 pub total_size: u64, 629 pub operations_affected: u64, 630} 631``` 632 633### Execute Rollback 634 635```rust 636pub fn rollback(&self, spec: RollbackSpec) -> Result<RollbackResult> 637``` 638 639**Purpose**: Execute the rollback operation. 640 641```rust 642pub struct RollbackResult { 643 pub bundles_removed: Vec<u32>, 644 pub bytes_freed: u64, 645 pub success: bool, 646} 647``` 648 649--- 650 651## 7.5. Repository Cleanup 652 653### Clean Temporary Files 654 655```rust 656pub fn clean(&self) -> Result<CleanResult> 657``` 658 659**Purpose**: Remove all temporary files from the repository. 660 661**What it does:** 662- Removes all `.tmp` files from the repository root directory (e.g., `plc_bundles.json.tmp`) 663- Removes temporary files from the DID index directory `.plcbundle/` (e.g., `config.json.tmp`) 664- Removes temporary shard files from `.plcbundle/shards/` (e.g., `00.tmp`, `01.tmp`, etc.) 665 666**Result:** 667```rust 668pub struct CleanResult { 669 pub files_removed: usize, 670 pub bytes_freed: u64, 671 pub errors: Option<Vec<String>>, 672} 673``` 674 675**Use Cases:** 676- CLI: `plcbundle clean` 677- Cleanup after interrupted operations 678- Maintenance tasks 679 680**Note**: Temporary files are normally cleaned up automatically during atomic write operations. This method is useful for cleaning up leftover files from interrupted operations. 681 682--- 683 684## 8. Performance & Caching 685 686### Cache Management 687 688```rust 689pub fn prefetch_bundles(&self, bundle_nums: Vec<u32>) -> Result<()> 690``` 691 692**Purpose**: Preload bundles into cache for faster access. 693 694```rust 695pub fn warm_up(&self, spec: WarmUpSpec) -> Result<()> 696``` 697 698**Purpose**: Warm up caches with intelligent prefetching. 699 700```rust 701pub struct WarmUpSpec { 702 pub bundles: BundleRange, 703 pub strategy: WarmUpStrategy, 704} 705 706pub enum WarmUpStrategy { 707 Sequential, // Load in order 708 Parallel, // Load concurrently 709 Adaptive, // Based on access patterns 710} 711``` 712 713```rust 714pub fn clear_caches(&self) 715``` 716 717**Purpose**: Clear all in-memory caches. 718 719--- 720 721## 9. DID Index Management 722 723### Build Index 724 725```rust 726pub fn build_did_index<F>( 727 &self, 728 flush_interval: u32, 729 progress_cb: Option<F>, 730 num_threads: Option<usize>, 731 interrupted: Option<Arc<AtomicBool>>, 732) -> Result<RebuildStats> 733where 734 F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes) 735``` 736 737**Purpose**: Build or rebuild the DID → operations index from scratch. 738 739**Parameters:** 740- `flush_interval`: Flush to disk every N bundles (0 = only at end) 741- `progress_cb`: Optional progress callback 742- `num_threads`: Number of threads (None = auto-detect) 743- `interrupted`: Optional flag to check for cancellation 744 745**Result:** 746```rust 747pub struct RebuildStats { 748 pub bundles_processed: u32, 749 pub operations_indexed: u64, 750 pub unique_dids: usize, 751 pub duration: Duration, 752} 753``` 754 755### Verify Index 756 757```rust 758pub fn verify_did_index<F>( 759 &self, 760 verbose: bool, 761 flush_interval: u32, 762 full: bool, 763 progress_callback: Option<F>, 764) -> Result<did_index::VerifyResult> 765where 766 F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes) 767``` 768 769**Purpose**: Verify DID index integrity. Performs standard checks by default, or full verification (rebuild and compare) if `full` is true. 770 771**Parameters:** 772- `verbose`: Enable verbose logging 773- `flush_interval`: Flush interval for full rebuild (if `full` is true) 774- `full`: If true, rebuilds index in temp directory and compares with existing 775- `progress_callback`: Optional progress callback for full verification 776 777**Result:** 778```rust 779pub struct VerifyResult { 780 pub errors: usize, 781 pub warnings: usize, 782 pub missing_base_shards: usize, 783 pub missing_delta_segments: usize, 784 pub shards_checked: usize, 785 pub segments_checked: usize, 786 pub error_categories: Vec<(String, usize)>, 787 pub index_last_bundle: u32, 788 pub last_bundle_in_repo: u32, 789} 790``` 791 792**Use Cases:** 793- CLI: `plcbundle index verify` (standard check) 794- CLI: `plcbundle index verify --full` (full rebuild and compare) 795- Server: Startup integrity check (call with `full=false` and check `missing_base_shards`/`missing_delta_segments`) 796 797### Repair Index 798 799```rust 800pub fn repair_did_index<F>( 801 &self, 802 num_threads: usize, 803 flush_interval: u32, 804 progress_callback: Option<F>, 805) -> Result<did_index::RepairResult> 806where 807 F: Fn(u32, u32, u64, u64) + Send + Sync, // (current, total, bytes_processed, total_bytes) 808``` 809 810**Purpose**: Intelligently repair the DID index by: 811- Rebuilding missing delta segments (if base shards are intact) 812- Performing incremental update (if < 1000 bundles behind) 813- Performing full rebuild (if > 1000 bundles behind or base shards corrupted) 814- Compacting delta segments (if > 50 segments) 815 816**Parameters:** 817- `num_threads`: Number of threads (0 = auto-detect) 818- `flush_interval`: Flush to disk every N bundles (0 = only at end) 819- `progress_callback`: Optional progress callback 820 821**Result:** 822```rust 823pub struct RepairResult { 824 pub repaired: bool, 825 pub compacted: bool, 826 pub bundles_processed: u32, 827 pub segments_rebuilt: usize, 828} 829``` 830 831**Use Cases:** 832- CLI: `plcbundle index repair` 833- Maintenance: Fix corrupted or incomplete index 834- Recovery: Restore index after data loss 835 836### Index Statistics 837 838```rust 839pub fn get_did_index_stats(&self) -> HashMap<String, serde_json::Value> 840``` 841 842**Purpose**: Get comprehensive DID index statistics. 843 844**Returns**: JSON-compatible map with fields like: 845- `exists`: Whether index exists 846- `total_dids`: Total number of unique DIDs 847- `last_bundle`: Last bundle indexed 848- `delta_segments`: Number of delta segments 849- `shard_count`: Number of shards (should be 256) 850 851**Use Cases:** 852- CLI: `plcbundle index status` 853- Server: Status endpoint 854- Monitoring: Health checks 855 856--- 857 858## 10. Server API Methods 859 860The following methods are specifically designed for use by the HTTP server component. They provide streaming capabilities, cursor tracking, and resolver functionality. 861 862### Streaming Bundle Data 863 864```rust 865/// Stream bundle raw (compressed) data 866pub fn stream_bundle_raw(&self, bundle_num: u32) -> Result<std::fs::File> 867 868/// Stream bundle decompressed (JSONL) data 869pub fn stream_bundle_decompressed(&self, bundle_num: u32) -> Result<Box<dyn std::io::Read + Send>> 870``` 871 872**Purpose**: Efficient streaming of bundle data for HTTP responses. 873 874**Use Cases:** 875- Server: `/data/:number` endpoint (raw compressed) 876- Server: `/jsonl/:number` endpoint (decompressed JSONL) 877- WebSocket: Streaming operations to clients 878 879### Cursor & Position Tracking 880 881```rust 882/// Get current cursor (global position of last operation) 883/// Cursor = (last_bundle * 10000) + mempool_ops_count 884pub fn get_current_cursor(&self) -> u64 885``` 886 887**Purpose**: Track the global position in the operation stream for WebSocket streaming. 888 889**Use Cases:** 890- WebSocket: `/ws?cursor=N` to resume from a specific position 891- Server: Status endpoint showing current position 892 893#### Global Position Mapping 894 895- Positions are 0-indexed per bundle (`0..9,999`). 896- Global position formula: `global = ((bundle - 1) × 10,000) + position`. 897- Conversion back: `bundle = floor(global / 10,000) + 1`, `position = global % 10,000`. 898- Examples: 899 - `global 0 → bundle 1, position 0` 900 - `global 10000 → bundle 2, position 0` 901 - `global 88410345 → bundle 8842, position 345` 902- Shorthand: Small numbers `< 10000` are treated as `bundle 1` positions. 903 904### Handle & DID Resolution 905 906```rust 907/// Resolve handle to DID or validate DID format 908/// Returns (did, handle_resolve_time_ms) 909pub fn resolve_handle_or_did(&self, input: &str) -> Result<(String, u64)> 910 911/// Get handle resolver base URL (if configured) 912pub fn get_handle_resolver_base_url(&self) -> Option<String> 913 914/// Get resolver statistics 915pub fn get_resolver_stats(&self) -> HashMap<String, serde_json::Value> 916``` 917 918**Purpose**: Support handle-to-DID resolution and resolver metrics. 919 920**Use Cases:** 921- Server: `/:did` endpoints (accepts both DIDs and handles) 922- Server: `/status` endpoint (resolver stats) 923- Server: `/debug/resolver` endpoint 924 925### DID Index Access 926 927```rust 928/// Get DID index manager (for stats and direct access) 929pub fn get_did_index(&self) -> Arc<RwLock<did_index::Manager>> 930``` 931 932**Purpose**: Direct access to DID index for server endpoints. 933 934**Use Cases:** 935- Server: `/debug/didindex` endpoint 936- Server: `/status` endpoint (DID index stats) 937 938### PLC Origin 939 940```rust 941/// Get PLC origin from index 942pub fn get_plc_origin(&self) -> String 943``` 944 945**Purpose**: Get the origin identifier for the repository. 946 947**Use Cases:** 948- Server: Root endpoint (display origin) 949- Server: `/status` endpoint 950 951--- 952 953## 11. Observability 954 955### Manager Statistics 956 957```rust 958pub fn get_stats(&self) -> ManagerStats 959``` 960 961**Purpose**: Get runtime statistics for monitoring. 962 963```rust 964pub struct ManagerStats { 965 pub cache_hits: u64, 966 pub cache_misses: u64, 967 pub bundles_loaded: u64, 968 pub operations_read: u64, 969 pub bytes_read: u64, 970} 971``` 972 973--- 974 975## Common Types 976 977### Operation Filter 978 979```rust 980pub struct OperationFilter { 981 pub did: Option<String>, 982 pub operation_type: Option<String>, 983 pub time_range: Option<(String, String)>, 984 pub include_nullified: bool, 985} 986``` 987 988Used across query, export, and range operations. 989 990### Operation 991 992```rust 993pub struct Operation { 994 pub did: String, 995 pub operation: serde_json::Value, 996 pub cid: Option<String>, 997 pub nullified: bool, 998 pub created_at: String, 999 // Optional: raw_json field for zero-copy access 1000} 1001``` 1002 1003--- 1004 1005## CLI Usage Examples 1006 1007All CLI commands use only the high-level API: 1008 1009```bash 1010# Uses: get_operation() 1011plcbundle op get 42 1337 1012 1013# Uses: query() 1014plcbundle query "did" --bundles 1-100 1015 1016# Uses: export_to_writer() 1017plcbundle export --range 1-100 -o output.jsonl 1018 1019# Uses: get_bundle_info() 1020plcbundle info --bundle 42 1021 1022# Uses: verify_bundle() 1023plcbundle verify --bundle 42 --checksums 1024 1025# Uses: get_did_operations() 1026plcbundle did-lookup did:plc:xyz123 1027``` 1028 1029--- 1030 1031## FFI Mapping 1032 1033The high-level API maps cleanly to C/Go: 1034 1035### C Bindings 1036 1037```c 1038// Direct 1:1 mapping 1039// Note: C bindings use default options (equivalent to passing () in Rust) 1040CBundleManager* bundle_manager_new(const char* path); 1041int bundle_manager_load_bundle(CBundleManager* mgr, uint32_t num, CLoadOptions* opts, CLoadResult* result); 1042int bundle_manager_get_operation(CBundleManager* mgr, uint32_t bundle, size_t pos, COperation* out); 1043int bundle_manager_export(CBundleManager* mgr, CExportSpec* spec, ExportCallback cb, void* user_data); 1044``` 1045 1046### Go Bindings 1047 1048```go 1049type BundleManager struct { /* ... */ } 1050 1051func (m *BundleManager) LoadBundle(num uint32, opts LoadOptions) (*LoadResult, error) 1052func (m *BundleManager) GetOperation(bundle uint32, position int, opts OperationOptions) (*Operation, error) 1053func (m *BundleManager) Query(spec QuerySpec) (*QueryIterator, error) 1054func (m *BundleManager) Export(spec ExportSpec, opts ExportOptions) (*ExportStats, error) 1055``` 1056 1057--- 1058 1059## Implementation Status 1060 1061### ✅ Implemented 1062- `load_bundle()` 1063- `get_bundle_info()` 1064- `query()` 1065- `export()` / `export_to_writer()` 1066- `verify_bundle()` 1067- `verify_chain()` 1068- `get_stats()` 1069- `build_did_index()` 1070- `verify_did_index()` 1071- `repair_did_index()` 1072 1073### 🚧 Needs Refactoring 1074- `get_operation()` - Currently in CLI, should be in `BundleManager` 1075- `get_operations_batch()` - Not yet implemented 1076- `get_operations_range()` - Partially implemented 1077 1078### 📋 To Implement 1079- `get_did_operations()` - Uses DID index 1080- `batch_resolve_dids()` - Batch DID lookup 1081- `prefetch_bundles()` - Cache warming 1082- `warm_up()` - Intelligent prefetching 1083 1084--- 1085 1086## Migration Plan 1087 1088### Phase 1: Move Operation Access to Manager ✅ PRIORITY 1089 1090```rust 1091// Add to BundleManager 1092impl BundleManager { 1093 /// Get a single operation efficiently (reads only the needed line) 1094 pub fn get_operation(&self, bundle_num: u32, position: usize) -> Result<OperationData> 1095 1096 /// Get operation with raw JSON (zero-copy) 1097 pub fn get_operation_raw(&self, bundle_num: u32, position: usize) -> Result<String> 1098} 1099 1100pub struct OperationData { 1101 pub raw_json: String, // Original JSON from file 1102 pub parsed: Option<Operation>, // Lazy parsing 1103} 1104``` 1105 1106### Phase 2: Update CLI to Use API 1107 1108```rust 1109// Before (direct file access): 1110let file = std::fs::File::open(bundle_path)?; 1111let decoder = zstd::Decoder::new(file)?; 1112// ... manual line reading 1113 1114// After (high-level API): 1115let op_data = manager.get_operation_raw(bundle_num, position)?; 1116println!("{}", op_data); 1117``` 1118 1119### Phase 3: Add to FFI/Go 1120 1121Once in `BundleManager`, automatically available to FFI: 1122 1123```c 1124int bundle_manager_get_operation_raw( 1125 CBundleManager* mgr, 1126 uint32_t bundle_num, 1127 size_t position, 1128 char** out_json, 1129 size_t* out_len 1130); 1131``` 1132 1133```go 1134func (m *BundleManager) GetOperationRaw(bundle uint32, position int) (string, error) 1135``` 1136 1137--- 1138 1139## Design Benefits 1140 11411. **Single Source of Truth**: All functionality in `BundleManager` 11422. **Testable**: High-level API is easy to unit test 11433. **Consistent**: Same patterns across Rust, C, and Go 11444. **Maintainable**: Changes in one place affect all consumers 11455. **Efficient**: API designed for performance (streaming, lazy parsing) 11466. **Safe**: No direct file access by consumers 1147 1148--- 1149 1150## Next Steps 1151 11521. ✅ Add `get_operation()` / `get_operation_raw()` to `BundleManager` 11532. ✅ Update CLI `op get` to use new API 11543. ✅ Add FFI bindings for operation access 11554. ⏭️ Implement `get_operations_batch()` 11565. ⏭️ Implement DID lookup operations 11576. ⏭️ Add cache warming APIs 1158