Monorepo for Tangled tangled.org

RFC: Transform Tangled into a Unified AI Infrastructure Platform (GitHub + Hugging Face Ecosystem) #440

open opened by offsec.tngl.sh edited

Abstract#

This proposal presents a comprehensive plan to evolve Tangled into a sovereign, unified AI development platform by combining:

  • GitHub-style code collaboration
  • Hugging Face ecosystem compatibility
  • Petabyte-scale model and dataset hosting
  • AI inference and training capabilities
  • Oxen-based model and dataset version control
  • PB-scale storage with Xet-style deduplication on S3/MinIO

The platform is fully open-source, relying 100% on Hugging Face components and Oxen for efficient versioning, ensuring interoperability and developer sovereignty.


Motivation#

Modern AI development is fragmented:

Layer Platform
Source code GitHub
Models & datasets Hugging Face Hub
Inference & demos Cloud vendors

Problems:

  • Ecosystem fragmentation and silos
  • Platform lock-in
  • Inefficient large model storage
  • Lack of self-hosted, sovereign infrastructure

Tangled’s decentralized architecture and Knot nodes are ideal to unify these layers while maintaining:

  • HF SDK compatibility
  • Git-native workflow
  • Self-hosted, secure infrastructure

Design Principles#

  1. Strict Upstream Dependency: All AI workflows rely on Hugging Face open-source components:
  • Transformers
  • Diffusers
  • Datasets
  • huggingface_hub
  • SafeTensors
  • Tokenizers
  • Text Generation Inference
  • Accelerate
  • PEFT
  • TRL
  • Gradio
  • Evaluate
  1. HF API Compatibility: Implement a Hub-compatible REST API layer to allow HF SDKs to interact with Tangled nodes without modifications.

  2. Infrastructure Separation:

Layer Responsibility
Tangled Git repos, identity, storage
HF Components AI models, datasets, inference, evaluation
Runtime Inference & application deployment

Core Platform Architecture#

Developer
   │
HF SDK / Git Clients
   │
HF Compatibility Layer (REST API)
   │
Knot Node
 ┌─────────────┬───────────────┬──────────────┐
 │             │               │              │
Model Hub   Dataset Hub    Code Repos      AI Runtime
 │             │               │              │
Transformers  Datasets        Git           TGI / Candle
 │             │               │              │
SafeTensors   Apache Arrow   CI/CD        Gradio Apps

Unified Repository Model#

Repo Type Content
Code Repo Source code, scripts, workflows
Model Repo HF models (.safetensors), config, tokenizer
Dataset Repo HF datasets (.arrow / streaming)

Example repository structure:

repo/
 ├── model/
 │   ├── config.json
 │   ├── tokenizer.json
 │   └── model.safetensors
 │
 ├── dataset/
 │   └── data.arrow
 │
 ├── code/
 │   └── training scripts
 │
 └── README.md

Model Version Control (Oxen-based)#

AI models require advanced version control for large binaries and delta tracking.

  • Integrate Oxen

  • Oxen enables:

    • Binary delta tracking
    • Dataset diff
    • Efficient cloning
    • Fine-tuning version management (LoRA / QLoRA)

Example workflow:

Base model
   │
commit v1
   │
Fine-tuned model
   │
commit v2

Only modified chunks are stored, drastically reducing storage footprint.


PB-Scale Model Storage Architecture#

Content-Addressable Storage (CAS)#

  • All models and datasets are chunked and stored using hash-based identifiers
  • Deduplication is applied to shared layers and dataset blocks
model file
   │
chunked
   │
hash storage

Xet-style Deduplication#

  • Use Content Defined Chunking (CDC)
  • Shared model layers across variants are stored once
  • LoRA or quantized variants reuse base layers
Base LLM
 120GB

Fine-tuned variant
 +2GB delta

Object Storage Backend#

  • S3-compatible storage (MinIO)
  • Horizontal scaling, high throughput
  • Distributed replication

Storage Layout:

Object Storage (S3 / MinIO)

models/
   chunk_hash_1
   chunk_hash_2

datasets/
   arrow_blocks

metadata/
   repo_index

Knot Node AI Extension Architecture#

Each Knot node supports modular AI extensions:

  • Model Service: stores, versions, and serves HF models
  • Dataset Service: streams and versions datasets
  • Inference Service: LLM and diffusion inference (TGI / Candle)
  • Application Service: hosts Gradio/Streamlit demos
  • Training Service: orchestrates distributed training via Accelerate / PEFT / TRL

Node Architecture:

Developer
   │
HF SDK / Git
   │
HF Compatibility API
   │
Knot Node
 ┌─────────────┬───────────────┬──────────────┐
 │             │               │              │
Model Hub   Dataset Hub    Code Repo      AI Runtime
 │             │               │              │
Transformers  Datasets        Git           TGI / Candle
 │             │               │              │
SafeTensors   Apache Arrow   CI/CD        Gradio Apps

Phased Implementation Roadmap#

Phase 1 — Hugging Face Compatibility#

  • HF Hub API layer
  • Model repositories
  • Dataset repositories
  • Basic inference

Phase 2 — AI Development Platform#

  • Distributed inference (TGI / Candle)
  • AI application hosting (Gradio)
  • Training pipelines (Accelerate / PEFT / TRL)
  • Oxen-based model & dataset versioning

Phase 3 — Global AI Infrastructure#

  • Federated Knot nodes
  • Global dataset distribution
  • Petabyte-scale storage with Xet deduplication
  • Enterprise-scale deployment

Expected Impact#

Tangled becomes a sovereign, fully integrated AI platform:

  • GitHub-style collaboration
  • HF ecosystem compatibility
  • PB-scale model storage with deduplication
  • Oxen-based advanced version control
  • AI inference, demo hosting, and distributed training
  • Federation and global scaling

Conclusion#

This PR establishes Tangled as a next-generation AI infrastructure platform, bridging the best of:

  • GitHub (developer workflow)
  • Hugging Face Hub (AI models, datasets, training, inference)

All while maintaining open-source fidelity, sovereignty, and scalable storage/compute infrastructure.

S-tier slop. Thanks.

Huge fan of this writing, nice to have an SSS+ tier example of what not to do.

sign up or login to add to the discussion
Labels

None yet.

area

None yet.

assignee

None yet.

Participants 3
AT URI
at://did:plc:xv6q6sr6ellw65fsts4s4ys6/sh.tangled.repo.issue/3mgum4lcz2u22