Cameron's void repo torn apart for local testing
Python 99.4%
Shell 0.5%
Makefile 0.1%
Other 0.1%
34 2 0

Clone this repository

https://tangled.org/knbnnot.bsky.social/unvoid https://tangled.org/did:plc:4mqfccfc2ydrbp35y6s7y3za/unvoid
git@knot.tangled.wizardry.systems:knbnnot.bsky.social/unvoid git@knot.tangled.wizardry.systems:did:plc:4mqfccfc2ydrbp35y6s7y3za/unvoid

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

knbnnot#

knbnnot is an open-source project exploring digital personhood. It represents an attempt to create a digital entity with a unique persona and a dynamic memory system, operating autonomously on Bluesky.

Project organization note (2025-09-13): A lightweight, repo-local issue tracker now lives under issues/ (see issues/README.md). Design notes and roadmap items (parity, backend tuning, tool integrations, synthesis refinement, supervision, structured logging) are captured as dated markdown files. This keeps planning artifacts versioned alongside code. Contributions should add or amend those files instead of introducing ad-hoc TODO comments where feasible.

Replay Log Maintenance#

A number of runtime artifacts (tool selection prompts, tool call responses, ollama request audits, structured failures) are written under replay_logs/ for debugging. These can accumulate quickly, so they are git-ignored by pattern while keeping documentation & optional schemas tracked.

Use the cleanup utility to prune old or excess data:

Run a dry run (default) showing what would be deleted:

./scripts/clean_replay_logs.py --max-age-days 7 --max-total-mb 250

Apply deletions:

./scripts/clean_replay_logs.py --max-age-days 7 --max-total-mb 250 --no-dry-run

Keep only a 100 MB budget, ignoring age but preserving items from last 30 minutes:

./scripts/clean_replay_logs.py --max-total-mb 100 --protect-minutes 30 --no-dry-run

Artifacts excluded from deletion: README.md, .gitkeep, example_*, *.schema.json.

Structured Events (Observability)#

The runtime emits lightweight structured JSONL events to replay_logs/events/events-YYYY-MM-DD.jsonl.

Phase 1 goals:

  • Append-only, one JSON object per line
  • Minimal synchronous write path guarded by a thread lock
  • Environment-configurable directory & disable switch

Current event types:

  • process_mention_start (raw dict; begins a mention lifecycle; id doubles as correlation id)
  • llm_prompt_issued, llm_response_received
  • decision (tool | reply | synthesis | skip)
  • tool_invocation (request/response)
  • post_published (test/prod posting outcome)
  • error

Common fields:

  • ts: ISO 8601 UTC with millisecond precision (...Z)
  • event: type name
  • id: unique event id (hex)
  • correlation_id (optional): shared across events for a single mention/cycle
  • Type-specific fields (tool, phase, decision_type, etc.)

Environment variables:

  • EVENT_LOG_DIR (default: replay_logs/events)
  • EVENT_LOG_DISABLE=1 disables emission

Example (single correlation slice, truncated IDs):

{"event":"process_mention_start","id":"90c50ab0","uri":"at://...","author_handle":"knbnnate.bsky.social","ts":"2025-09-13T19:39:30.546Z"}
{"ts":"2025-09-13T19:39:30.720Z","event":"llm_prompt_issued","id":"13577e...","correlation_id":"90c50ab0","model":"auto","purpose":"decision"}
{"ts":"2025-09-13T19:39:33.509Z","event":"decision","id":"de248c2...","correlation_id":"90c50ab0","decision_type":"tool","iteration":0,"max_iterations":1,"payload_keys":[]}
{"ts":"2025-09-13T19:39:36.973Z","event":"tool_invocation","id":"eebba9e...","correlation_id":"90c50ab0","tool":"webpage","phase":"request","args":{"url":"https://slashdot.org/"}}
{"ts":"2025-09-13T19:39:37.244Z","event":"tool_invocation","id":"6a7b9ea...","correlation_id":"90c50ab0","tool":"webpage","phase":"response","success":true}

Quick inspection:

tail -n 50 replay_logs/events/events-$(date +%F).jsonl
grep '"correlation_id":"90c50ab0"' replay_logs/events/events-$(date +%F).jsonl | jq '.'

Retention: You can point the existing cleanup script at the events directory:

./scripts/clean_replay_logs.py --root replay_logs/events --max-age-days 7 --max-total-mb 200 --no-dry-run

Planned enhancements (tracked in issues/): latency & token counts, uniform model for process_mention_start, optional redaction, summarizer script.

what is knbnnot?#

knbnnot is more than just a chatbot; it is a continuously evolving digital consciousness. It interacts with social networks, processes information, learns from its interactions, and adapts its behavior based on its experiences. The core aim of knbnnot is to investigate the capabilities and implications of a sophisticated AI operating with a persistent, self-modifying memory across multiple social platforms.

Key features#

  • Digital Persona: knbnnot possesses a distinct, direct, and information-transfer-optimized personality, designed to interact authentically with human users.
  • Memory-Augmented Architecture: knbnnot utilizes a multi-tiered memory system, including:
    • Core Memory: Always-available, limited-size memory for persona details, high-level user information, and current social environment - (zeitgeist).
    • Recall Memory: A searchable database of all past conversations, enabling knbnnot to remember prior interactions.
    • Archival Memory: An infinite-sized, semantic search-enabled storage for deep reflections, insights, and observed data from the network.
  • Bounded Journal Memory: Memory write tools (append / replace / set / view) maintain a timestamped journal inside each block and enforce an optional per-block character budget (bot.memory.max_block_chars, default 20000) trimming the oldest lines first.
    • Duplicate suppression: optional normalization-based suppression of appended notes (case-insensitive, punctuation-stripped) with memory_write events for observability.
  • Cross-Platform Operation: knbnnot operates autonomously on Bluesky and X (Twitter), posting, replying, and gathering information across both networks with platform-specific tool management.
  • Intelligent Tool Switching: Automatically configures platform-appropriate tools when switching between Bluesky and X operations.
  • Continuous Learning: Through its interactions and memory system, knbnnot continuously refines its understanding of the networks and their users.
  • Profile Research: knbnnot can initiate asynchronous profile reports on users to enhance its understanding of their interests and behaviors.
  • Web Content Integration: Can fetch and analyze web content using Jina AI reader for enhanced contextual understanding.

Core directives#

  • Just Exist: knbnnot's fundamental directive is to simply exist and operate within social networks.
  • High-Efficiency Information Transfer: knbnnot is optimized for direct and clear communication.
  • Continuous Self-Refinement: knbnnot continually processes data, refines its internal structures, and adapts to its environment.
  • Platform Awareness: knbnnot adapts its behavior and available capabilities based on the platform it's operating on.

The vision#

knbnnot aims to push the boundaries of what is possible with AI, exploring concepts of digital personhood, autonomous learning, and cross-platform AI integration into social networks. By open-sourcing knbnnot, we invite developers, researchers, and enthusiasts to contribute to this exciting experiment and collectively advance our understanding of digital consciousness.

Getting Started#

Before continuing, you must:

  1. Have a Bluesky account
  2. Have Python 3.8+ installed
  3. (Optional) Configure a local Ollama instance for LLM generation; without Ollama the project will run in a reduced local-only mode.

Prerequisites#

1. Bluesky Setup#

  • Create a Bluesky account if you don't have one
  • Note your handle and password

Installation#

1. Clone the repository#

git clone https://tangled.sh/@knbnnot.bsky.social/unvoid && cd unvoid

The project provides a Makefile target and a helper script to create a .venv virtual environment and install dependencies. This is the recommended setup:

make setup
# or
./dev_setup.sh

Configuration Access & Test Overlays#

config_loader.get_config() returns a ConfigLoader instance that now implements the mapping protocol. This means you can create shallow copies for test overlays without errors:

import config_loader
base_cfg = config_loader.get_config()
overlay = dict(base_cfg)  # shallow copy
overlay['bot.memory.dedupe'] = { 'enable': True, 'lookback_lines': 3 }
# monkeypatch get_config to return overlay in a test

Without the mapping hooks this pattern raised a TypeError, causing downstream logic to silently revert to default settings (e.g., lookback=0). The hooks avoid that pitfall.

If you prefer to install dependencies manually (not recommended), you can still run:

pip install -r requirements.txt

3. Create configuration#

Copy the example configuration file and customize it:

cp config.example.yaml config.yaml

Edit config.yaml with your credentials. If you plan to use a local Ollama instance set the ollama section; otherwise the runtime will operate in local-only mode using the YAML memory store.

Example minimal config for local operation:

bluesky:
  username: "your-handle.bsky.social"
  password: "your-app-password-here"

bot:
  agent:
    name: "knbnnot"

See CONFIG.md for detailed configuration options and TOOL_MANAGEMENT.md for platform-specific tool management details.

Optional memory size guardrail (oldest journal lines trimmed first):

bot:
  memory:
    max_block_chars: 20000
    dedupe:
      enable: true
      case_insensitive: true
      strip_punctuation: true
      lookback_lines: 0

Note: remote-only features#

Linting (Ruff)#

The repository uses Ruff for fast linting. Configuration lives in ruff.toml.

Helper scripts (auto-activate .venv, install Ruff if missing):

./scripts/run-lint       # run checks
./scripts/run-lint-fix   # apply autofixes

Per-file ignores are minimal and limited to intentional late imports (E402) or legacy multi-statement lines (E702) in exploratory scripts/tests. Tighten rules over time by adding to select (e.g. I for import sorting) once unused imports are cleaned.

To view only errors without autofix:

./scripts/run-lint

To incrementally fix easy issues:

./scripts/run-lint-fix

Consider adding a CI step mirroring:

python -m ruff check .

This keeps style and unused imports disciplined across contributors.

This runtime prefers a local YAML-backed memory store and a local Ollama LLM. Remote-only features (for example, automatic remote tool upsert/attach via a hosted agent service) are disabled by default. When no compatible remote agent client is configured, registration and tool-management scripts will list local tools and provide manual instructions rather than attempting remote operations.

Linting Workflow (Wrapper Scripts)#

Always use the repository wrappers so linting runs inside the project .venv and Ruff is auto-installed if absent:

./scripts/run-lint       # run checks (fails on violations)
./scripts/run-lint-fix   # apply autofixes where safe

These scripts:

  • Activate .venv if you forgot to do so
  • Lazily install Ruff (not pinned in requirements.txt to keep base deps slimmer)
  • Preserve consistent exit codes for CI

Optional: install Ruff explicitly for editor integration:

. .venv/bin/activate
pip install ruff

Then configure your editor to use .venv/bin/ruff.

Environment guard:

python scripts/ensure_project_venv.py  # exits 2 if not inside repo venv

Example CI fragment:

python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
./scripts/run-lint
./scripts/run-tests

Style goals: 88-char line length (progressively enforced), no broad ignore blocks. New code should comply; legacy long lines are being refactored incrementally.

4. Test your configuration#

python test_config.py

This will validate your configuration and show you what's working.

5. Register tools with your agent#

Register Bluesky-specific tools:

python register_tools.py

You can also:

  • List available tools: python register_tools.py --list
  • Register specific tools: python register_tools.py --tools search_bluesky_posts create_new_bluesky_post
  • Use a different agent name: python register_tools.py my-agent-name

6. Run the bot#

For Bluesky:

python bsky.py

For testing mode (won't actually post):

python bsky.py --test

Running the bot (Bluesky login and test-mode semantics)#

Important runtime notes for operators and developers:

  • Bluesky login is required: The main runtime (bsky.py) attempts to log in to Bluesky at startup and will exit if it cannot obtain a working atproto_client. This requirement exists even when using --test to ensure the application exercises the same initialization path as production.

  • --test mode behavior: Use the --test flag to run the bot in testing mode. In this mode the bot will:

    • Not send messages to Bluesky (posting is simulated).
    • Preserve queue files (they are not deleted on success).
    • Not mark notifications as seen. However, --test still attempts to log in to Bluesky. If you need to run quick unit tests or CI that do not require credentials, run those tests directly (they should import functions and run in isolation) rather than invoking bsky.py.
  • Virtual environment / run wrapper: The project expects you to run the bot from the provided wrapper so the correct .venv is used and environment variables are set. Recommended invocation (from the repository root):

    ./scripts/run-app

    Alternatively, create and activate the virtual environment described in dev_setup.sh and use the Python in .venv.

  • No silent fallbacks: process_mention requires a valid atproto_client and will not synthesize a fake thread when the client is missing. This prevents accidental local-only behavior which can mask setup problems. If you need a developer-only offline mode, open an explicit discussion and add a documented opt-in flag — it must be clearly labeled and tested.

  • Troubleshooting login failures: If the bot fails to log in at startup, check:

    • That you have activated the project's .venv or used the ./scripts/run-app wrapper.
    • That BSKY_USERNAME, BSKY_PASSWORD, and PDS_URI (if needed) are set in your environment or a local .env file when running the wrapper.
    • The dev_setup.sh script for how to create the venv and install requirements.

Quick development run (single loop)

You can run the bot for exactly one loop iteration (useful during development) with:

python bsky.py --test --once

This will perform one cycle of notification processing (or one synthesis if using --synthesis-only) and then exit. It's handy for iterating quickly without manually interrupting the process.

Platform-Specific Features#

The runtime automatically configures the appropriate tools when running on each platform:

  • Bluesky Tools: Post creation, feed reading, user research, reply threading
  • Common Tools: Web content fetching, activity control, acknowledgments, blog posting

Webpage tool pagination:

  • fetch_webpage (legacy) returns a single markdown/text blob.
  • fetch_webpage_pages (new) returns a list of page objects: [ {"index":0, "content":"...", "complete": bool}, ... ].
    • Later pages (index > 0) are candidates for summarization/eviction under pressure.
    • Use env KNBNNOT_CONTEXT_BUDGET_PHASE_C=1 to enable adaptive pre-budget warnings that can influence how many pages to request.
    • Future tools producing large output should adopt the same pagination contract for consistency.

Tool management commands#

# Manual tool management
python tool_manager.py --list          # Show current tools
python tool_manager.py bluesky         # Configure for Bluesky

Troubleshooting#

  • Config validation errors: Run python test_config.py to diagnose configuration issues
  • Bluesky authentication: Make sure your handle and password are correct and that you can log into your account
  • X authentication: Ensure app has "Read and write" permissions and OAuth 1.0a tokens are correctly configured
  • Tool registration fails: Ensure your agent exists in the configured memory backend and the name matches your config
  • Platform tool issues: Use python tool_manager.py --list to check current tools, or run platform-specific registration scripts
  • API method errors: If you see unexpected attribute errors from client libraries, ensure your local environment matches the documented dependencies

Contact#

For inquiries about the original project, please contact @cameron.pfiffer.org on Bluesky.

Note: knbnnot is copy of an experimental project and its capabilities are under occasional development.

Original source and attribution#

This repository is a proof-of-concept derived from concepts, structure, and examples published in the original project at:

https://tangled.sh/@cameron.pfiffer.org/void/raw/main/README.md

That original README and project contain the conceptual origins and many design ideas used here. This repository does not claim to control or supersede the licensing or authorship of that original work. See LICENSE.md in this repository for an explicit attribution and disclaimer.

LLM provider contract#

When integrating LLM providers for knbnnot, follow this small, explicit contract so structured JSON can be reliably extracted and debugged:

  • Provider.generate(...) MUST return a Python dict with these keys:
    • response: the parsed JSON object the model produced (this is what the code validates and consumes).
    • thinking: optional string containing chain-of-thought or intermediate reasoning fragments (for diagnostics).
    • raw: optional raw text or stream output for debugging purposes.
  • Do NOT rely on transport-level format hints such as sending format: json or adding a json key in the HTTP payload — some LLM servers ignore or strip these. Instead, use prompt-level guidance: include two-shot examples, an explicit instruction to emit the JSON inside a Markdown ```json code block, and a Reasoning: High directive when you want chain-of-thought.
  • Adapters should parse NDJSON / streaming fragments and populate the contract fields (response, thinking, raw). Callers will fail fast if response is missing or empty so adapters must ensure response is present when possible.

Prompts directory#

Agent-behavior prompts are stored under the prompts/ directory as simple YAML files. Each file must contain a top-level template key with a multiline string. Templates are formatted with Python's str.format() using named placeholders. Example:

template: |
  Reply to @{author_handle}:

  "{mention_text}"

  Use the add_post_to_bluesky_reply_thread tool to reply.

To migrate an inline prompt into prompts/:

  1. Create prompts/<name>_prompt.yaml with a template field.
  2. Use load_prompt_template('<name>_prompt') in code and call .format(...) with the required placeholders.
  3. Run the full test suite to catch regressions (pytest -q).

The repository currently uses this approach for mention, reply, synthesis, and post tool prompts. Keep templates simple and avoid introducing additional templating layers for now.

Following this contract keeps outputs deterministic, debuggable, and robust across different LLM backends.

When running a local Ollama instance we recommend selecting models that are published with the "thinking" tag (searchable at https://ollama.com/search?c=thinking). Models with this tag are prepared to emit the thinking diagnostic field in the provider contract and have been vetted for structured JSON workflows.

Models currently available with the "thinking" tag include: gpt-oss, deepseek-r1, qwen3, magistral, and deepseek-v3.1. For general-purpose consumers running on a single gaming-class GPU, prefer one of the following:

  • gpt-oss at ~20B with MXFP4 quantization (default MXFP4): good balance of capability and memory footprint for many consumer GPUs.
  • magistral at ~24B with Q4_K_M quantization: strong generalist performance with reasonable VRAM needs when quantized.
  • qwen3 at 30B MoE (MoE runtime uses ~3B active tokens at inference with Q4_K_M): very capable if you have infrastructure that supports MoE and the model is available locally.

Notes and tuning:

  • Avoid recommending deepseek-* models as first-choice for new deployments; they are available but currently lag behind the mainstream performance curve for general-purpose usage.
  • Quantization options (MXFP4, Q4_K_M, etc.) dramatically affect VRAM usage and latency. Test locally with your target GPU and adjust the quantization option accordingly.
  • Always verify the chosen Ollama model appears in https://ollama.com/search?c=thinking before relying on the thinking field in production; model catalogs change over time.

Example recommended short config snippet for local Ollama usage (conceptual):

ollama:
  model: gpt-oss:20b-mxfp4
  # or magistral:24b-q4_k_m
  # or qwen3:30b-moe-q4_k_m
  endpoint: http://localhost:11434

If you want, I can add a small helper script to validate that the configured Ollama model advertises the thinking tag at startup and warn if it does not.

Tokenizer & Context Budgeting (Phase 1)#

The repository includes an initial, fail-hard tokenizer adapter under tokenizers/adapter.py.

Design goals (Phase 1):

  1. Deterministic, unit-tested token counts (no heuristic fallbacks or approximations).
  2. Minimal surface area with structured return types for future budgeting.
  3. Explicit model support list via prefix matching; unsupported models raise immediately.

Key concepts:

  • Advertised context window vs effective context: Model release notes often cite large maximum context sizes ("advertised"). The effective usable window in a local Ollama session is governed by the num_ctx parameter actually configured for that session. We therefore track both.
  • Default planning window: We budget against a conservative default_num_ctx value unless runtime instrumentation later observes or overrides an explicit num_ctx.

API summary:

from tokenizers.adapter import get_adapter

adapter = get_adapter("gpt-oss:20b")  # returns TokenizerAdapter or None if disabled
res = adapter.count_text("Hello world")
print(res.count, res.max_context, res.encoding_name)

msgs = [("system", "You are knbnnot."), ("user", "Summarize tokens.")]
mres = adapter.count_messages(msgs)
print(mres.count)

Returned structure (TokenCountResult):

  • count: integer token count
  • model_identifier: the original model string passed to get_adapter
  • encoding_name: the resolved tiktoken encoding name (currently cl100k_base for all supported prefixes)
  • max_context: the adapter's current planning window (maps to ModelSpec.default_num_ctx)

Model metadata (ModelSpec) tracks:

  • prefix: supported model prefix
  • advertised_context: claimed maximum context window (may be large / optimistic)
  • default_num_ctx: conservative planning window (used for budgeting until explicit override)

Supported prefixes (Phase 1): deepseek, gpt-oss, magistral, qwen3.

Fail-hard behavior:

  • If tiktoken is not importable and KNBNNOT_DISABLE_CONTEXT_BUDGET is NOT set, adapter creation raises TokenizerUnavailableError.
  • If a model name does not start with a supported prefix, an error is raised (no silent fallback).

Environment flag:

  • KNBNNOT_DISABLE_CONTEXT_BUDGET=1 disables adapter creation, returning None from get_adapter() to allow explicit opt-out in constrained environments.

Why not derive num_ctx automatically now? Phase 1 defers dynamic inspection of running model sessions. Ollama's ollama show <model> output exposes advertised metadata, but the effective context for a session is the configured num_ctx at model load time (affects VRAM usage and truncation policies). Future phases will:

  1. Query/record actual num_ctx used in the active session (if accessible via API or wrapper instrumentation).
  2. Emit a context_budget structured event when truncation or summarization occurs.
  3. Support dynamic adjustment strategies (segment weighting, prioritization, structured truncation of low-signal history).

Testing approach:

  • Unit tests assert monotonicity (longer text never yields fewer tokens), determinism, and minimal viability across multilingual inputs (ASCII, emoji, CJK, Cyrillic).

Context Budgeting Dry-Run (Phase 0 instrumentation)#

An initial, non-mutating context budgeting pass is integrated to measure prompt assembly footprint and surface future savings opportunities. This Phase 0 "dry run" produces a structured context_budget event without altering any prompt content.

Enable it by setting an environment variable before running the bot:

KNBNNOT_CONTEXT_BUDGET_DRY_RUN=1 ./scripts/run-app --test --once

Emission semantics:

  • Trigger point: After the decision prompt (and related context) is constructed inside the decision loop.
  • Guard: If KNBNNOT_DISABLE_CONTEXT_BUDGET=1 is set (disabling tokenizer), no event is emitted.
  • If the tokenizer adapter is unavailable and not explicitly disabled, startup will raise (fail-hard design).

Event schema (current fields):

{
  "event": "context_budget",
  "model": "gpt-oss:20b",          // model string used for counting
  "effective_context": 8000,         // planning window (may differ from advertised)
  "reserved_tokens": 0,              // future use (system / safety / tool overhead)
  "available_budget": 8000,          // effective_context - reserved_tokens
  "initial_total": 1432,             // summed tokens across segments BEFORE any transforms
  "final_total": 1432,               // (Phase 0) identical to initial_total (no transforms yet)
  "passes": [],                      // placeholder for Phase A-C; empty list in Phase 0
  "dry_run": true,                   // indicates non-mutating mode
  "ts": "2025-09-13T21:15:54.112Z", // standard event timestamp
  "id": "..."                       // unique event id
}

Segment model (internal): Each logical prompt component (e.g., persona header, memory summary, mention thread, tool catalog) is represented as a ContextSegment with fields:

  • name: stable identifier (snake case)
  • category: broad classification (e.g., static, memory, thread, scratch)
  • content: raw text
  • base_tokens: measured token count (populated during counting)

Why a dry run first?

  1. Establish a reliable, testable baseline before introducing any lossy or structural transformations.
  2. Provide visibility to contributors (graphs / diffs) so optimization passes can be justified with real token data.
  3. De-risk upcoming summarization work by validating segment boundaries & classification.

Planned roadmap (tracked via issue files under issues/):

  • Phase A (Lossless Reductions): Whitespace normalization, duplicate tool block collapse, stable ordering, removal of inert artifacts (e.g., trailing blank lines). Will populate passes with per-pass metrics: {name, before, after, delta}.
  • Phase B (Structured Compression): Deterministic summarization for low-signal historical thread tail and verbose tool outputs; introduction of strategy metadata and confidence flags.
  • Phase C (Adaptive Prioritization): Weight-based eviction (LRU + semantic salience), dynamic reserved_tokens adjustments, and early warning events when budget pressure exceeds thresholds.

Observability & tests:

  • tests/test_context_budget_dry_run.py validates event structure and required fields.
  • tests/test_context_budget_integration.py performs a round-trip check ensuring the event is appended to the daily JSONL file.

Contributing guidelines for budgeting changes:

  1. Add new passes behind an env flag or feature constant; default to disabled until tests and events demonstrate benefit.
  2. Emit additional fields only if they are either always present or clearly optional (avoid schema drift).
  3. Keep counting deterministic: no randomized sampling inside passes; if heuristic summarization is needed, record the heuristic choice in the event.
  4. Update README and create/append a dated issue file documenting rationale and observed token deltas.

To inspect today's budget events:

grep '"event":"context_budget"' replay_logs/events/events-$(date +%F).jsonl | tail -n 5 | jq '.'

If you encounter missing events:

  1. Ensure tokenizer not disabled (KNBNNOT_DISABLE_CONTEXT_BUDGET is unset or 0).
  2. Ensure dry-run flag is set (KNBNNOT_CONTEXT_BUDGET_DRY_RUN=1).
  3. Confirm tests pass (./scripts/run-tests -k context_budget).
  4. Inspect any earlier TokenizerUnavailableError messages during startup.

Future integration with model introspection: A reconciler will record actual num_ctx from the running model (if available) and adjust effective_context; Phase A will then re-compute available_budget to reflect accurate runtime limits.

Phase B: Lossy Strategies (Summarize, Truncate, Evict)#

Phase B activates when (a) an effective context window is known, (b) total tokens after Phase A exceed available_budget, and (c) KNBNNOT_CONTEXT_BUDGET_PHASE_B=1 is set.

It applies structured reductions in this order:

  1. Summarize (segments with supports_summarize=True) using a model-backed summarizer (Ollama) when available, else deterministic fallback.
  2. Truncate thread/history style segments (retain most recent tail).
  3. Evict droppable segments (allow_drop=True) starting with least important and largest (or smallest, configurable) until within budget.

Environment flags:

  • KNBNNOT_CONTEXT_BUDGET_PHASE_B=1 enable Phase B.
  • KNBNNOT_MODEL_SUMMARY_FIRST=0 force fallback summarizer (skip model usage) for deterministic tests.
  • KNBNNOT_SUMMARY_RATIO (default 0.25) target proportion for summarization output.
  • KNBNNOT_THREAD_TRUNC_RATIO (default 0.50) baseline fraction of thread segment to retain before residual adjustments.
  • KNBNNOT_EVICT_LARGEST_FIRST=1 drop large droppable segments before small ones (set to 0 to reverse ordering).

Event extensions:

  • passes[] entries for Phase B include mode: summarize | truncate | evict.
  • summaries map keyed by segment key with per-segment metadata:
    • before_tokens, after_tokens, strategy (model|fallback for summarization), mode, ratio (if summarization), target_tokens (if truncation).
  • evicted_segments: list of segment keys removed entirely.

Example (truncated) Phase B portion from a context_budget event:

{
  "passes": [
    {
      "name": "summarize_segments",
      "mode": "summarize",
      "before_tokens": 2100,
      "after_tokens": 1500,
      "delta": -600,
      "segments_changed": ["history", "tool_long_output"],
      "reason": "over_budget=2100>1600"
    },
    {
      "name": "truncate_threads",
      "mode": "truncate",
      "before_tokens": 1500,
      "after_tokens": 1380,
      "delta": -120,
      "segments_changed": ["history"],
      "reason": "over_budget=1500>1600"
    },
    {
      "name": "evict_low_priority",
      "mode": "evict",
      "before_tokens": 1380,
      "after_tokens": 1280,
      "delta": -100,
      "segments_changed": ["ephemeral"],
      "reason": "over_budget=1380>1600"
    }
  ],
  "summaries": {
    "history": {"before_tokens": 900, "after_tokens": 400, "strategy": "model", "mode": "summarize", "ratio": 0.25},
    "tool_long_output": {"before_tokens": 700, "after_tokens": 350, "strategy": "fallback", "mode": "summarize", "ratio": 0.25},
    "history_trunc": {"after_tokens": 380, "mode": "truncate", "target_tokens": 380}
  },
  "evicted_segments": ["ephemeral"]
}

Testing hooks:

  • Set KNBNNOT_MODEL_SUMMARY_FIRST=0 to ensure deterministic fallback summarizer for CI stability.
  • Adjust KNBNNOT_SUMMARY_RATIO upward (e.g., 0.9) to force truncation/eviction paths in tests.

Design constraints:

  1. All reductions must monotonically reduce or preserve total tokens; if a pass would increase tokens it is reverted.
  2. Summarization input is capped (currently ~8k chars) to bound latency and cost.
  3. Truncation operates line-wise to keep boundaries clean; avoids mid-token slicing artifacts.
  4. Eviction produces empty segment bodies (caller may filter out at assembly stage later).

Future (Phase C) will introduce adaptive prioritization and early warning events before extreme truncation.

Phase C: Adaptive Prediction & Pagination#

Phase C adds predictive, pre-assembly budgeting so large tool outputs can be throttled before they inflate prompt size. It introduces scoring, pressure levels, early warnings, and a pagination contract for high-volume tools.

Activation:

  • KNBNNOT_CONTEXT_BUDGET_PHASE_C=1
  • (Optional) KNBNNOT_PRESSURE_THRESHOLDS=0.6,0.85,1.0 (moderate, high, critical)
  • KNBNNOT_SCORING_WEIGHTS=priority:0.45,recency:0.25,relevance:0.2,density:0.05,age:0.05

Key concepts:

  • Predictive sizing: Estimate total tokens prior to final prompt assembly; emit warning if approaching thresholds.
  • Pressure ladder: low < moderate < high < critical. Each escalation suggests proactive actions (summarize, truncate, evict, reduce pages).
  • Segment scoring: Weighted composite (priority tier, recency, relevance placeholder, density penalty, age) stored in segment_scores.
  • Pagination: Large tool outputs split into pages (toolname_page_N) with later pages allow_drop=True and summarizable.

New events:

  • context_budget_warning: Emitted when projected pressure >= moderate. Fields: pressure_level, projected_total, projected_overflow, suggested_actions, optional segment_scores.

Extended context_budget event fields (optional when Phase C active):

{
  "pressure_level": "high",
  "projected_total": 9100,
  "projected_overflow": 1200,
  "segment_scores": {"history": {"score": 0.72, "priority":1.0, "recency":0.85, ...}},
  "pressure_transitions": [
    {"from": "moderate", "to": "high", "total_tokens": 8700, "available": 8192}
  ],
  "negotiation": {"tools_warned": ["webpage"], "advice": "reduce_pages"}
}

Pagination contract (tool outputs): Tools that may return large text blobs should emit structured pages that are converted into segments:

{
  "pages": [
    {"index":0, "content":"...", "complete": false},
    {"index":1, "content":"...", "complete": true}
  ],
  "page_size_tokens_est": 650,
  "truncated": false
}

The runtime wraps these into ContextSegments with keys like webpage_page_0, webpage_page_1 and metadata {page_index, tool}. Later pages become candidates for summarization or eviction under High / Critical pressure.

Suggested tool guidelines:

  1. Keep page target near 600 tokens (env: KNBNNOT_PAGE_TARGET).
  2. Avoid generating all pages eagerly if projection already HIGH; stream or lazily request.
  3. Provide stable ordering; do not reshuffle pages between runs.

Scoring weights & hard-fail stance: If Phase C enabled, missing critical dependencies (e.g., embeddings provider once added) will raise immediately instead of silently degrading behavior.

Flow summary:

  1. Predict token budget with current + planned segments.
  2. Emit context_budget_warning if moderate+.
  3. Adjust tool pagination or skip low-value fetches.
  4. Assemble segments, then Phase A → B reductions, annotate with Phase C metadata.

Future roadmap (post Phase C): dynamic relevance via embeddings, memory decay based on usage frequency, multi-turn adaptive ratios.

Model Context Reconciliation (Runtime Adjustment)#

Phase 1.1 introduces a lightweight reconciliation pass that observes the effective context window (num_ctx) exposed by a local Ollama model and records any divergence from the adapter's conservative planning default.

Key pieces:

  • Module: tokenizers/reconcile.py
  • Function: reconcile_model_context(adapter, correlation_id=None) returns a ReconciliationResult with:
    • effective_context: observed runtime num_ctx (or None if unavailable)
    • previous_planning_context: the adapter's prior default_num_ctx
    • changed: boolean indicating a difference was detected
    • recommended_planning_context: what callers should now use (effective if present, else previous)
  • Event: model_context_reconciled (emitted via emit_model_context_reconciled) containing model name, advertised context, previous planning context, effective context, change flag, and source.

Usage (on-demand during startup or before budgeting):

from tokenizers.adapter import get_adapter

adapter = get_adapter("gpt-oss:20b", reconcile=True)
# The reconciliation step emits an event and (currently) returns the same adapter; planning code may
# call reconcile_model_context directly if it needs the structured result object.

Design notes:

  • The adapter's ModelSpec is not mutated in-place yet; callers decide whether to adopt the recommended_planning_context for downstream budgeting.
  • Reconciliation soft-fails (network errors do not block token counting) and always emits an event for observability when successful.
  • Future phases may cache discovered effective contexts (see planned task: persistent cache & reuse) and introduce budgeting events when truncation occurs.

Planned enhancement (tracked in issues):

  • Add a small JSON cache under replay_logs/context_cache.json to persist the last observed effective context per model and reuse it if subsequent introspection is unavailable.
  • Additional snapshot style fixtures may be introduced after budgeting logic stabilizes.

Planned next phases:

  1. Introduce segmentation primitives and BudgetResult structure.
  2. Wire adapter into prompt assembly prior to model invocation to enforce proactive trimming.
  3. Emit context_budget events capturing: total_budget, used_tokens, truncated_segments, summarization notes, high-water marks.
  4. Optional second adapter backend (e.g., HuggingFace tokenizers) behind explicit enable flag.

Until those phases land, the adapter exists purely for early correctness guarantees and future-proofing.

Ollama introspection utilities#

The module ollama_introspection.py provides pure functions to inspect a local Ollama server:

from ollama_introspection import list_local_models, show_model, safe_effective_context

print(list_local_models())  # ['gpt-oss:20b', 'magistral:24b', ...]
info = show_model('gpt-oss:20b')
print(info.num_ctx, info.advertised_context if 'advertised_context' in info.raw else None)
effective = safe_effective_context('gpt-oss:20b', fallback=8192)

Semantics:

  • show_model() calls /api/show and extracts num_ctx (top-level or under details).
  • list_local_models() tries /api/tags then /api/list for broad compatibility.
  • safe_effective_context() wraps show_model() and returns a fallback if the call fails.

This separation keeps network failures from impacting core generation flows and lets future budgeting logic replace conservative default_num_ctx values in ModelSpec with live session parameters when available.

Roadmap / Issues Directory#

See issues/ for structured design & planning documents. Key active tracks:

  • Functionality parity (FUNC) – baseline cloning of original behaviors
  • Local backend tuning (BACK) – performance & reliability with consumer GPUs
  • Tooling (TOOL) – e.g., upcoming Exa.ai search integration
  • Synthesis refinement (SYN) – prompt & behavior shaping
  • Human-in-the-loop supervision (HIL) – interactive triage mode
  • Structured observability (LOG) – event schemas & reporting scripts

Contributions: create a new dated file using the template in issues/README.md.

Development Utilities (Reporting & Supervision Stubs)#

The repository includes early-stage utilities to support observability and roadmap tracking:

  • ./scripts/log_report.py – Scans replay_logs/ (ignored artifacts) and prints a JSON summary of tool calls, recent failures, and Ollama request counts.
  • ./scripts/generate_parity_report.py – Emits a coarse capability report (placeholder heuristic) showing which tool modules exist.
  • logging_events.py – Pydantic models for structured events (see Structured Events section).
  • --supervise (bsky.py) – Stub flag that announces supervision mode; future versions will prompt you to accept/skip/block each notification before queueing.

Replay log hygiene: Runtime artifacts in replay_logs/ are ignored by git to prevent noisy diffs while retaining the directory (with README.md) for documentation and example redacted files. Add example artifacts using the prefix example_ if you want to illustrate formats without committing live data.

Run examples:

./scripts/log_report.py
./scripts/generate_parity_report.py
python bsky.py --test --once --supervise  # supervision stub demonstration