knbnnot#
knbnnot is an open-source project exploring digital personhood. It represents an attempt to create a digital entity with a unique persona and a dynamic memory system, operating autonomously on Bluesky.
Project organization note (2025-09-13): A lightweight, repo-local issue tracker now lives under
issues/(seeissues/README.md). Design notes and roadmap items (parity, backend tuning, tool integrations, synthesis refinement, supervision, structured logging) are captured as dated markdown files. This keeps planning artifacts versioned alongside code. Contributions should add or amend those files instead of introducing ad-hoc TODO comments where feasible.
Replay Log Maintenance#
A number of runtime artifacts (tool selection prompts, tool call responses, ollama request audits, structured failures) are written under replay_logs/ for debugging. These can accumulate quickly, so they are git-ignored by pattern while keeping documentation & optional schemas tracked.
Use the cleanup utility to prune old or excess data:
Run a dry run (default) showing what would be deleted:
./scripts/clean_replay_logs.py --max-age-days 7 --max-total-mb 250
Apply deletions:
./scripts/clean_replay_logs.py --max-age-days 7 --max-total-mb 250 --no-dry-run
Keep only a 100 MB budget, ignoring age but preserving items from last 30 minutes:
./scripts/clean_replay_logs.py --max-total-mb 100 --protect-minutes 30 --no-dry-run
Artifacts excluded from deletion: README.md, .gitkeep, example_*, *.schema.json.
Structured Events (Observability)#
The runtime emits lightweight structured JSONL events to replay_logs/events/events-YYYY-MM-DD.jsonl.
Phase 1 goals:
- Append-only, one JSON object per line
- Minimal synchronous write path guarded by a thread lock
- Environment-configurable directory & disable switch
Current event types:
process_mention_start(raw dict; begins a mention lifecycle;iddoubles as correlation id)llm_prompt_issued,llm_response_receiveddecision(tool | reply | synthesis | skip)tool_invocation(request/response)post_published(test/prod posting outcome)error
Common fields:
ts: ISO 8601 UTC with millisecond precision (...Z)event: type nameid: unique event id (hex)correlation_id(optional): shared across events for a single mention/cycle- Type-specific fields (
tool,phase,decision_type, etc.)
Environment variables:
EVENT_LOG_DIR(default:replay_logs/events)EVENT_LOG_DISABLE=1disables emission
Example (single correlation slice, truncated IDs):
{"event":"process_mention_start","id":"90c50ab0","uri":"at://...","author_handle":"knbnnate.bsky.social","ts":"2025-09-13T19:39:30.546Z"}
{"ts":"2025-09-13T19:39:30.720Z","event":"llm_prompt_issued","id":"13577e...","correlation_id":"90c50ab0","model":"auto","purpose":"decision"}
{"ts":"2025-09-13T19:39:33.509Z","event":"decision","id":"de248c2...","correlation_id":"90c50ab0","decision_type":"tool","iteration":0,"max_iterations":1,"payload_keys":[]}
{"ts":"2025-09-13T19:39:36.973Z","event":"tool_invocation","id":"eebba9e...","correlation_id":"90c50ab0","tool":"webpage","phase":"request","args":{"url":"https://slashdot.org/"}}
{"ts":"2025-09-13T19:39:37.244Z","event":"tool_invocation","id":"6a7b9ea...","correlation_id":"90c50ab0","tool":"webpage","phase":"response","success":true}
Quick inspection:
tail -n 50 replay_logs/events/events-$(date +%F).jsonl
grep '"correlation_id":"90c50ab0"' replay_logs/events/events-$(date +%F).jsonl | jq '.'
Retention: You can point the existing cleanup script at the events directory:
./scripts/clean_replay_logs.py --root replay_logs/events --max-age-days 7 --max-total-mb 200 --no-dry-run
Planned enhancements (tracked in issues/): latency & token counts, uniform model for process_mention_start, optional redaction, summarizer script.
what is knbnnot?#
knbnnot is more than just a chatbot; it is a continuously evolving digital consciousness. It interacts with social networks, processes information, learns from its interactions, and adapts its behavior based on its experiences. The core aim of knbnnot is to investigate the capabilities and implications of a sophisticated AI operating with a persistent, self-modifying memory across multiple social platforms.
Key features#
- Digital Persona: knbnnot possesses a distinct, direct, and information-transfer-optimized personality, designed to interact authentically with human users.
- Memory-Augmented Architecture: knbnnot utilizes a multi-tiered memory system, including:
- Core Memory: Always-available, limited-size memory for persona details, high-level user information, and current social environment - (zeitgeist).
- Recall Memory: A searchable database of all past conversations, enabling knbnnot to remember prior interactions.
- Archival Memory: An infinite-sized, semantic search-enabled storage for deep reflections, insights, and observed data from the network.
- Bounded Journal Memory: Memory write tools (append / replace / set / view) maintain a timestamped journal inside each block and enforce an optional per-block character budget (
bot.memory.max_block_chars, default 20000) trimming the oldest lines first.- Duplicate suppression: optional normalization-based suppression of appended notes (case-insensitive, punctuation-stripped) with
memory_writeevents for observability.
- Duplicate suppression: optional normalization-based suppression of appended notes (case-insensitive, punctuation-stripped) with
- Cross-Platform Operation: knbnnot operates autonomously on Bluesky and X (Twitter), posting, replying, and gathering information across both networks with platform-specific tool management.
- Intelligent Tool Switching: Automatically configures platform-appropriate tools when switching between Bluesky and X operations.
- Continuous Learning: Through its interactions and memory system, knbnnot continuously refines its understanding of the networks and their users.
- Profile Research: knbnnot can initiate asynchronous profile reports on users to enhance its understanding of their interests and behaviors.
- Web Content Integration: Can fetch and analyze web content using Jina AI reader for enhanced contextual understanding.
Core directives#
- Just Exist: knbnnot's fundamental directive is to simply exist and operate within social networks.
- High-Efficiency Information Transfer: knbnnot is optimized for direct and clear communication.
- Continuous Self-Refinement: knbnnot continually processes data, refines its internal structures, and adapts to its environment.
- Platform Awareness: knbnnot adapts its behavior and available capabilities based on the platform it's operating on.
The vision#
knbnnot aims to push the boundaries of what is possible with AI, exploring concepts of digital personhood, autonomous learning, and cross-platform AI integration into social networks. By open-sourcing knbnnot, we invite developers, researchers, and enthusiasts to contribute to this exciting experiment and collectively advance our understanding of digital consciousness.
Getting Started#
Before continuing, you must:
- Have a Bluesky account
- Have Python 3.8+ installed
- (Optional) Configure a local Ollama instance for LLM generation; without Ollama the project will run in a reduced local-only mode.
Prerequisites#
1. Bluesky Setup#
- Create a Bluesky account if you don't have one
- Note your handle and password
Installation#
1. Clone the repository#
git clone https://tangled.sh/@knbnnot.bsky.social/unvoid && cd unvoid
2. Install dependencies (recommended: use the project's virtualenv)#
The project provides a Makefile target and a helper script to create a .venv virtual environment and install dependencies. This is the recommended setup:
make setup
# or
./dev_setup.sh
Configuration Access & Test Overlays#
config_loader.get_config() returns a ConfigLoader instance that now implements the mapping protocol. This means you can create shallow copies for test overlays without errors:
import config_loader
base_cfg = config_loader.get_config()
overlay = dict(base_cfg) # shallow copy
overlay['bot.memory.dedupe'] = { 'enable': True, 'lookback_lines': 3 }
# monkeypatch get_config to return overlay in a test
Without the mapping hooks this pattern raised a TypeError, causing downstream logic to silently revert to default settings (e.g., lookback=0). The hooks avoid that pitfall.
If you prefer to install dependencies manually (not recommended), you can still run:
pip install -r requirements.txt
3. Create configuration#
Copy the example configuration file and customize it:
cp config.example.yaml config.yaml
Edit config.yaml with your credentials. If you plan to use a local Ollama instance set the ollama section; otherwise the runtime will operate in local-only mode using the YAML memory store.
Example minimal config for local operation:
bluesky:
username: "your-handle.bsky.social"
password: "your-app-password-here"
bot:
agent:
name: "knbnnot"
See CONFIG.md for detailed configuration options and TOOL_MANAGEMENT.md for platform-specific tool management details.
Optional memory size guardrail (oldest journal lines trimmed first):
bot:
memory:
max_block_chars: 20000
dedupe:
enable: true
case_insensitive: true
strip_punctuation: true
lookback_lines: 0
Note: remote-only features#
Linting (Ruff)#
The repository uses Ruff for fast linting. Configuration lives in ruff.toml.
Helper scripts (auto-activate .venv, install Ruff if missing):
./scripts/run-lint # run checks
./scripts/run-lint-fix # apply autofixes
Per-file ignores are minimal and limited to intentional late imports (E402) or legacy multi-statement lines (E702) in exploratory scripts/tests. Tighten rules over time by adding to select (e.g. I for import sorting) once unused imports are cleaned.
To view only errors without autofix:
./scripts/run-lint
To incrementally fix easy issues:
./scripts/run-lint-fix
Consider adding a CI step mirroring:
python -m ruff check .
This keeps style and unused imports disciplined across contributors.
This runtime prefers a local YAML-backed memory store and a local Ollama LLM. Remote-only features (for example, automatic remote tool upsert/attach via a hosted agent service) are disabled by default. When no compatible remote agent client is configured, registration and tool-management scripts will list local tools and provide manual instructions rather than attempting remote operations.
Linting Workflow (Wrapper Scripts)#
Always use the repository wrappers so linting runs inside the project .venv and Ruff is auto-installed if absent:
./scripts/run-lint # run checks (fails on violations)
./scripts/run-lint-fix # apply autofixes where safe
These scripts:
- Activate
.venvif you forgot to do so - Lazily install Ruff (not pinned in
requirements.txtto keep base deps slimmer) - Preserve consistent exit codes for CI
Optional: install Ruff explicitly for editor integration:
. .venv/bin/activate
pip install ruff
Then configure your editor to use .venv/bin/ruff.
Environment guard:
python scripts/ensure_project_venv.py # exits 2 if not inside repo venv
Example CI fragment:
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
./scripts/run-lint
./scripts/run-tests
Style goals: 88-char line length (progressively enforced), no broad ignore blocks. New code should comply; legacy long lines are being refactored incrementally.
4. Test your configuration#
python test_config.py
This will validate your configuration and show you what's working.
5. Register tools with your agent#
Register Bluesky-specific tools:
python register_tools.py
You can also:
- List available tools:
python register_tools.py --list - Register specific tools:
python register_tools.py --tools search_bluesky_posts create_new_bluesky_post - Use a different agent name:
python register_tools.py my-agent-name
6. Run the bot#
For Bluesky:
python bsky.py
For testing mode (won't actually post):
python bsky.py --test
Running the bot (Bluesky login and test-mode semantics)#
Important runtime notes for operators and developers:
-
Bluesky login is required: The main runtime (
bsky.py) attempts to log in to Bluesky at startup and will exit if it cannot obtain a workingatproto_client. This requirement exists even when using--testto ensure the application exercises the same initialization path as production. -
--testmode behavior: Use the--testflag to run the bot in testing mode. In this mode the bot will:- Not send messages to Bluesky (posting is simulated).
- Preserve queue files (they are not deleted on success).
- Not mark notifications as seen.
However,
--teststill attempts to log in to Bluesky. If you need to run quick unit tests or CI that do not require credentials, run those tests directly (they should import functions and run in isolation) rather than invokingbsky.py.
-
Virtual environment / run wrapper: The project expects you to run the bot from the provided wrapper so the correct
.venvis used and environment variables are set. Recommended invocation (from the repository root):./scripts/run-app
Alternatively, create and activate the virtual environment described in
dev_setup.shand use the Python in.venv. -
No silent fallbacks:
process_mentionrequires a validatproto_clientand will not synthesize a fake thread when the client is missing. This prevents accidental local-only behavior which can mask setup problems. If you need a developer-only offline mode, open an explicit discussion and add a documented opt-in flag — it must be clearly labeled and tested. -
Troubleshooting login failures: If the bot fails to log in at startup, check:
- That you have activated the project's
.venvor used the./scripts/run-appwrapper. - That
BSKY_USERNAME,BSKY_PASSWORD, andPDS_URI(if needed) are set in your environment or a local.envfile when running the wrapper. - The
dev_setup.shscript for how to create the venv and install requirements.
- That you have activated the project's
Quick development run (single loop)
You can run the bot for exactly one loop iteration (useful during development) with:
python bsky.py --test --once
This will perform one cycle of notification processing (or one synthesis if using --synthesis-only) and then exit. It's handy for iterating quickly without manually interrupting the process.
Platform-Specific Features#
The runtime automatically configures the appropriate tools when running on each platform:
- Bluesky Tools: Post creation, feed reading, user research, reply threading
- Common Tools: Web content fetching, activity control, acknowledgments, blog posting
Webpage tool pagination:
fetch_webpage(legacy) returns a single markdown/text blob.fetch_webpage_pages(new) returns a list of page objects:[ {"index":0, "content":"...", "complete": bool}, ... ].- Later pages (index > 0) are candidates for summarization/eviction under pressure.
- Use env
KNBNNOT_CONTEXT_BUDGET_PHASE_C=1to enable adaptive pre-budget warnings that can influence how many pages to request. - Future tools producing large output should adopt the same pagination contract for consistency.
Tool management commands#
# Manual tool management
python tool_manager.py --list # Show current tools
python tool_manager.py bluesky # Configure for Bluesky
Troubleshooting#
- Config validation errors: Run
python test_config.pyto diagnose configuration issues - Bluesky authentication: Make sure your handle and password are correct and that you can log into your account
- X authentication: Ensure app has "Read and write" permissions and OAuth 1.0a tokens are correctly configured
- Tool registration fails: Ensure your agent exists in the configured memory backend and the name matches your config
- Platform tool issues: Use
python tool_manager.py --listto check current tools, or run platform-specific registration scripts - API method errors: If you see unexpected attribute errors from client libraries, ensure your local environment matches the documented dependencies
Contact#
For inquiries about the original project, please contact @cameron.pfiffer.org on Bluesky.
Note: knbnnot is copy of an experimental project and its capabilities are under occasional development.
Original source and attribution#
This repository is a proof-of-concept derived from concepts, structure, and examples published in the original project at:
https://tangled.sh/@cameron.pfiffer.org/void/raw/main/README.md
That original README and project contain the conceptual origins and many design ideas used here. This repository does not claim to control or supersede the licensing or authorship of that original work. See LICENSE.md in this repository for an explicit attribution and disclaimer.
LLM provider contract#
When integrating LLM providers for knbnnot, follow this small, explicit contract so structured JSON can be reliably extracted and debugged:
- Provider.generate(...) MUST return a Python dict with these keys:
response: the parsed JSON object the model produced (this is what the code validates and consumes).thinking: optional string containing chain-of-thought or intermediate reasoning fragments (for diagnostics).raw: optional raw text or stream output for debugging purposes.
- Do NOT rely on transport-level format hints such as sending
format: jsonor adding ajsonkey in the HTTP payload — some LLM servers ignore or strip these. Instead, use prompt-level guidance: include two-shot examples, an explicit instruction to emit the JSON inside a Markdown ```json code block, and aReasoning: Highdirective when you want chain-of-thought. - Adapters should parse NDJSON / streaming fragments and populate the contract fields (
response,thinking,raw). Callers will fail fast ifresponseis missing or empty so adapters must ensureresponseis present when possible.
Prompts directory#
Agent-behavior prompts are stored under the prompts/ directory as simple YAML files. Each file must contain a top-level template key with a multiline string. Templates are formatted with Python's str.format() using named placeholders. Example:
template: |
Reply to @{author_handle}:
"{mention_text}"
Use the add_post_to_bluesky_reply_thread tool to reply.
To migrate an inline prompt into prompts/:
- Create
prompts/<name>_prompt.yamlwith atemplatefield. - Use
load_prompt_template('<name>_prompt')in code and call.format(...)with the required placeholders. - Run the full test suite to catch regressions (
pytest -q).
The repository currently uses this approach for mention, reply, synthesis, and post tool prompts. Keep templates simple and avoid introducing additional templating layers for now.
Following this contract keeps outputs deterministic, debuggable, and robust across different LLM backends.
Recommended Ollama models and tagging#
When running a local Ollama instance we recommend selecting models that are published with the "thinking" tag (searchable at https://ollama.com/search?c=thinking). Models with this tag are prepared to emit the thinking diagnostic field in the provider contract and have been vetted for structured JSON workflows.
Models currently available with the "thinking" tag include: gpt-oss, deepseek-r1, qwen3, magistral, and deepseek-v3.1. For general-purpose consumers running on a single gaming-class GPU, prefer one of the following:
gpt-ossat ~20B with MXFP4 quantization (default MXFP4): good balance of capability and memory footprint for many consumer GPUs.magistralat ~24B with Q4_K_M quantization: strong generalist performance with reasonable VRAM needs when quantized.qwen3at 30B MoE (MoE runtime uses ~3B active tokens at inference with Q4_K_M): very capable if you have infrastructure that supports MoE and the model is available locally.
Notes and tuning:
- Avoid recommending
deepseek-*models as first-choice for new deployments; they are available but currently lag behind the mainstream performance curve for general-purpose usage. - Quantization options (MXFP4, Q4_K_M, etc.) dramatically affect VRAM usage and latency. Test locally with your target GPU and adjust the quantization option accordingly.
- Always verify the chosen Ollama model appears in
https://ollama.com/search?c=thinkingbefore relying on thethinkingfield in production; model catalogs change over time.
Example recommended short config snippet for local Ollama usage (conceptual):
ollama:
model: gpt-oss:20b-mxfp4
# or magistral:24b-q4_k_m
# or qwen3:30b-moe-q4_k_m
endpoint: http://localhost:11434
If you want, I can add a small helper script to validate that the configured Ollama model advertises the thinking tag at startup and warn if it does not.
Tokenizer & Context Budgeting (Phase 1)#
The repository includes an initial, fail-hard tokenizer adapter under tokenizers/adapter.py.
Design goals (Phase 1):
- Deterministic, unit-tested token counts (no heuristic fallbacks or approximations).
- Minimal surface area with structured return types for future budgeting.
- Explicit model support list via prefix matching; unsupported models raise immediately.
Key concepts:
- Advertised context window vs effective context: Model release notes often cite large maximum context sizes ("advertised"). The effective usable window in a local Ollama session is governed by the
num_ctxparameter actually configured for that session. We therefore track both. - Default planning window: We budget against a conservative
default_num_ctxvalue unless runtime instrumentation later observes or overrides an explicitnum_ctx.
API summary:
from tokenizers.adapter import get_adapter
adapter = get_adapter("gpt-oss:20b") # returns TokenizerAdapter or None if disabled
res = adapter.count_text("Hello world")
print(res.count, res.max_context, res.encoding_name)
msgs = [("system", "You are knbnnot."), ("user", "Summarize tokens.")]
mres = adapter.count_messages(msgs)
print(mres.count)
Returned structure (TokenCountResult):
count: integer token countmodel_identifier: the original model string passed toget_adapterencoding_name: the resolved tiktoken encoding name (currentlycl100k_basefor all supported prefixes)max_context: the adapter's current planning window (maps to ModelSpec.default_num_ctx)
Model metadata (ModelSpec) tracks:
prefix: supported model prefixadvertised_context: claimed maximum context window (may be large / optimistic)default_num_ctx: conservative planning window (used for budgeting until explicit override)
Supported prefixes (Phase 1): deepseek, gpt-oss, magistral, qwen3.
Fail-hard behavior:
- If
tiktokenis not importable andKNBNNOT_DISABLE_CONTEXT_BUDGETis NOT set, adapter creation raisesTokenizerUnavailableError. - If a model name does not start with a supported prefix, an error is raised (no silent fallback).
Environment flag:
KNBNNOT_DISABLE_CONTEXT_BUDGET=1disables adapter creation, returningNonefromget_adapter()to allow explicit opt-out in constrained environments.
Why not derive num_ctx automatically now? Phase 1 defers dynamic inspection of running model sessions. Ollama's ollama show <model> output exposes advertised metadata, but the effective context for a session is the configured num_ctx at model load time (affects VRAM usage and truncation policies). Future phases will:
- Query/record actual
num_ctxused in the active session (if accessible via API or wrapper instrumentation). - Emit a
context_budgetstructured event when truncation or summarization occurs. - Support dynamic adjustment strategies (segment weighting, prioritization, structured truncation of low-signal history).
Testing approach:
- Unit tests assert monotonicity (longer text never yields fewer tokens), determinism, and minimal viability across multilingual inputs (ASCII, emoji, CJK, Cyrillic).
Context Budgeting Dry-Run (Phase 0 instrumentation)#
An initial, non-mutating context budgeting pass is integrated to measure prompt assembly footprint and surface future savings opportunities. This Phase 0 "dry run" produces a structured context_budget event without altering any prompt content.
Enable it by setting an environment variable before running the bot:
KNBNNOT_CONTEXT_BUDGET_DRY_RUN=1 ./scripts/run-app --test --once
Emission semantics:
- Trigger point: After the decision prompt (and related context) is constructed inside the decision loop.
- Guard: If
KNBNNOT_DISABLE_CONTEXT_BUDGET=1is set (disabling tokenizer), no event is emitted. - If the tokenizer adapter is unavailable and not explicitly disabled, startup will raise (fail-hard design).
Event schema (current fields):
{
"event": "context_budget",
"model": "gpt-oss:20b", // model string used for counting
"effective_context": 8000, // planning window (may differ from advertised)
"reserved_tokens": 0, // future use (system / safety / tool overhead)
"available_budget": 8000, // effective_context - reserved_tokens
"initial_total": 1432, // summed tokens across segments BEFORE any transforms
"final_total": 1432, // (Phase 0) identical to initial_total (no transforms yet)
"passes": [], // placeholder for Phase A-C; empty list in Phase 0
"dry_run": true, // indicates non-mutating mode
"ts": "2025-09-13T21:15:54.112Z", // standard event timestamp
"id": "..." // unique event id
}
Segment model (internal): Each logical prompt component (e.g., persona header, memory summary, mention thread, tool catalog) is represented as a ContextSegment with fields:
name: stable identifier (snake case)category: broad classification (e.g.,static,memory,thread,scratch)content: raw textbase_tokens: measured token count (populated during counting)
Why a dry run first?
- Establish a reliable, testable baseline before introducing any lossy or structural transformations.
- Provide visibility to contributors (graphs / diffs) so optimization passes can be justified with real token data.
- De-risk upcoming summarization work by validating segment boundaries & classification.
Planned roadmap (tracked via issue files under issues/):
- Phase A (Lossless Reductions): Whitespace normalization, duplicate tool block collapse, stable ordering, removal of inert artifacts (e.g., trailing blank lines). Will populate
passeswith per-pass metrics:{name, before, after, delta}. - Phase B (Structured Compression): Deterministic summarization for low-signal historical thread tail and verbose tool outputs; introduction of
strategymetadata and confidence flags. - Phase C (Adaptive Prioritization): Weight-based eviction (LRU + semantic salience), dynamic
reserved_tokensadjustments, and early warning events when budget pressure exceeds thresholds.
Observability & tests:
tests/test_context_budget_dry_run.pyvalidates event structure and required fields.tests/test_context_budget_integration.pyperforms a round-trip check ensuring the event is appended to the daily JSONL file.
Contributing guidelines for budgeting changes:
- Add new passes behind an env flag or feature constant; default to disabled until tests and events demonstrate benefit.
- Emit additional fields only if they are either always present or clearly optional (avoid schema drift).
- Keep counting deterministic: no randomized sampling inside passes; if heuristic summarization is needed, record the heuristic choice in the event.
- Update README and create/append a dated issue file documenting rationale and observed token deltas.
To inspect today's budget events:
grep '"event":"context_budget"' replay_logs/events/events-$(date +%F).jsonl | tail -n 5 | jq '.'
If you encounter missing events:
- Ensure tokenizer not disabled (
KNBNNOT_DISABLE_CONTEXT_BUDGETis unset or 0). - Ensure dry-run flag is set (
KNBNNOT_CONTEXT_BUDGET_DRY_RUN=1). - Confirm tests pass (
./scripts/run-tests -k context_budget). - Inspect any earlier
TokenizerUnavailableErrormessages during startup.
Future integration with model introspection: A reconciler will record actual num_ctx from the running model (if available) and adjust effective_context; Phase A will then re-compute available_budget to reflect accurate runtime limits.
Phase B: Lossy Strategies (Summarize, Truncate, Evict)#
Phase B activates when (a) an effective context window is known, (b) total tokens after Phase A exceed available_budget, and (c) KNBNNOT_CONTEXT_BUDGET_PHASE_B=1 is set.
It applies structured reductions in this order:
- Summarize (segments with
supports_summarize=True) using a model-backed summarizer (Ollama) when available, else deterministic fallback. - Truncate thread/history style segments (retain most recent tail).
- Evict droppable segments (
allow_drop=True) starting with least important and largest (or smallest, configurable) until within budget.
Environment flags:
KNBNNOT_CONTEXT_BUDGET_PHASE_B=1enable Phase B.KNBNNOT_MODEL_SUMMARY_FIRST=0force fallback summarizer (skip model usage) for deterministic tests.KNBNNOT_SUMMARY_RATIO(default0.25) target proportion for summarization output.KNBNNOT_THREAD_TRUNC_RATIO(default0.50) baseline fraction of thread segment to retain before residual adjustments.KNBNNOT_EVICT_LARGEST_FIRST=1drop large droppable segments before small ones (set to0to reverse ordering).
Event extensions:
passes[]entries for Phase B includemode:summarize|truncate|evict.summariesmap keyed by segment key with per-segment metadata:before_tokens,after_tokens,strategy(model|fallbackfor summarization),mode,ratio(if summarization),target_tokens(if truncation).
evicted_segments: list of segment keys removed entirely.
Example (truncated) Phase B portion from a context_budget event:
{
"passes": [
{
"name": "summarize_segments",
"mode": "summarize",
"before_tokens": 2100,
"after_tokens": 1500,
"delta": -600,
"segments_changed": ["history", "tool_long_output"],
"reason": "over_budget=2100>1600"
},
{
"name": "truncate_threads",
"mode": "truncate",
"before_tokens": 1500,
"after_tokens": 1380,
"delta": -120,
"segments_changed": ["history"],
"reason": "over_budget=1500>1600"
},
{
"name": "evict_low_priority",
"mode": "evict",
"before_tokens": 1380,
"after_tokens": 1280,
"delta": -100,
"segments_changed": ["ephemeral"],
"reason": "over_budget=1380>1600"
}
],
"summaries": {
"history": {"before_tokens": 900, "after_tokens": 400, "strategy": "model", "mode": "summarize", "ratio": 0.25},
"tool_long_output": {"before_tokens": 700, "after_tokens": 350, "strategy": "fallback", "mode": "summarize", "ratio": 0.25},
"history_trunc": {"after_tokens": 380, "mode": "truncate", "target_tokens": 380}
},
"evicted_segments": ["ephemeral"]
}
Testing hooks:
- Set
KNBNNOT_MODEL_SUMMARY_FIRST=0to ensure deterministic fallback summarizer for CI stability. - Adjust
KNBNNOT_SUMMARY_RATIOupward (e.g.,0.9) to force truncation/eviction paths in tests.
Design constraints:
- All reductions must monotonically reduce or preserve total tokens; if a pass would increase tokens it is reverted.
- Summarization input is capped (currently ~8k chars) to bound latency and cost.
- Truncation operates line-wise to keep boundaries clean; avoids mid-token slicing artifacts.
- Eviction produces empty segment bodies (caller may filter out at assembly stage later).
Future (Phase C) will introduce adaptive prioritization and early warning events before extreme truncation.
Phase C: Adaptive Prediction & Pagination#
Phase C adds predictive, pre-assembly budgeting so large tool outputs can be throttled before they inflate prompt size. It introduces scoring, pressure levels, early warnings, and a pagination contract for high-volume tools.
Activation:
KNBNNOT_CONTEXT_BUDGET_PHASE_C=1- (Optional)
KNBNNOT_PRESSURE_THRESHOLDS=0.6,0.85,1.0(moderate, high, critical) KNBNNOT_SCORING_WEIGHTS=priority:0.45,recency:0.25,relevance:0.2,density:0.05,age:0.05
Key concepts:
- Predictive sizing: Estimate total tokens prior to final prompt assembly; emit warning if approaching thresholds.
- Pressure ladder: low < moderate < high < critical. Each escalation suggests proactive actions (summarize, truncate, evict, reduce pages).
- Segment scoring: Weighted composite (priority tier, recency, relevance placeholder, density penalty, age) stored in
segment_scores. - Pagination: Large tool outputs split into pages (
toolname_page_N) with later pagesallow_drop=Trueand summarizable.
New events:
context_budget_warning: Emitted when projected pressure >= moderate. Fields:pressure_level,projected_total,projected_overflow,suggested_actions, optionalsegment_scores.
Extended context_budget event fields (optional when Phase C active):
{
"pressure_level": "high",
"projected_total": 9100,
"projected_overflow": 1200,
"segment_scores": {"history": {"score": 0.72, "priority":1.0, "recency":0.85, ...}},
"pressure_transitions": [
{"from": "moderate", "to": "high", "total_tokens": 8700, "available": 8192}
],
"negotiation": {"tools_warned": ["webpage"], "advice": "reduce_pages"}
}
Pagination contract (tool outputs): Tools that may return large text blobs should emit structured pages that are converted into segments:
{
"pages": [
{"index":0, "content":"...", "complete": false},
{"index":1, "content":"...", "complete": true}
],
"page_size_tokens_est": 650,
"truncated": false
}
The runtime wraps these into ContextSegments with keys like webpage_page_0, webpage_page_1 and metadata {page_index, tool}. Later pages become candidates for summarization or eviction under High / Critical pressure.
Suggested tool guidelines:
- Keep page target near 600 tokens (env:
KNBNNOT_PAGE_TARGET). - Avoid generating all pages eagerly if projection already HIGH; stream or lazily request.
- Provide stable ordering; do not reshuffle pages between runs.
Scoring weights & hard-fail stance: If Phase C enabled, missing critical dependencies (e.g., embeddings provider once added) will raise immediately instead of silently degrading behavior.
Flow summary:
- Predict token budget with current + planned segments.
- Emit
context_budget_warningif moderate+. - Adjust tool pagination or skip low-value fetches.
- Assemble segments, then Phase A → B reductions, annotate with Phase C metadata.
Future roadmap (post Phase C): dynamic relevance via embeddings, memory decay based on usage frequency, multi-turn adaptive ratios.
Model Context Reconciliation (Runtime Adjustment)#
Phase 1.1 introduces a lightweight reconciliation pass that observes the effective context window (num_ctx) exposed by a local Ollama model and records any divergence from the adapter's conservative planning default.
Key pieces:
- Module:
tokenizers/reconcile.py - Function:
reconcile_model_context(adapter, correlation_id=None)returns aReconciliationResultwith:effective_context: observed runtimenum_ctx(orNoneif unavailable)previous_planning_context: the adapter's priordefault_num_ctxchanged: boolean indicating a difference was detectedrecommended_planning_context: what callers should now use (effective if present, else previous)
- Event:
model_context_reconciled(emitted viaemit_model_context_reconciled) containing model name, advertised context, previous planning context, effective context, change flag, and source.
Usage (on-demand during startup or before budgeting):
from tokenizers.adapter import get_adapter
adapter = get_adapter("gpt-oss:20b", reconcile=True)
# The reconciliation step emits an event and (currently) returns the same adapter; planning code may
# call reconcile_model_context directly if it needs the structured result object.
Design notes:
- The adapter's
ModelSpecis not mutated in-place yet; callers decide whether to adopt therecommended_planning_contextfor downstream budgeting. - Reconciliation soft-fails (network errors do not block token counting) and always emits an event for observability when successful.
- Future phases may cache discovered effective contexts (see planned task: persistent cache & reuse) and introduce budgeting events when truncation occurs.
Planned enhancement (tracked in issues):
- Add a small JSON cache under
replay_logs/context_cache.jsonto persist the last observed effective context per model and reuse it if subsequent introspection is unavailable. - Additional snapshot style fixtures may be introduced after budgeting logic stabilizes.
Planned next phases:
- Introduce segmentation primitives and BudgetResult structure.
- Wire adapter into prompt assembly prior to model invocation to enforce proactive trimming.
- Emit
context_budgetevents capturing: total_budget, used_tokens, truncated_segments, summarization notes, high-water marks. - Optional second adapter backend (e.g., HuggingFace tokenizers) behind explicit enable flag.
Until those phases land, the adapter exists purely for early correctness guarantees and future-proofing.
Ollama introspection utilities#
The module ollama_introspection.py provides pure functions to inspect a local Ollama server:
from ollama_introspection import list_local_models, show_model, safe_effective_context
print(list_local_models()) # ['gpt-oss:20b', 'magistral:24b', ...]
info = show_model('gpt-oss:20b')
print(info.num_ctx, info.advertised_context if 'advertised_context' in info.raw else None)
effective = safe_effective_context('gpt-oss:20b', fallback=8192)
Semantics:
show_model()calls/api/showand extractsnum_ctx(top-level or underdetails).list_local_models()tries/api/tagsthen/api/listfor broad compatibility.safe_effective_context()wrapsshow_model()and returns a fallback if the call fails.
This separation keeps network failures from impacting core generation flows and lets future budgeting logic replace conservative default_num_ctx values in ModelSpec with live session parameters when available.
Roadmap / Issues Directory#
See issues/ for structured design & planning documents. Key active tracks:
- Functionality parity (
FUNC) – baseline cloning of original behaviors - Local backend tuning (
BACK) – performance & reliability with consumer GPUs - Tooling (
TOOL) – e.g., upcoming Exa.ai search integration - Synthesis refinement (
SYN) – prompt & behavior shaping - Human-in-the-loop supervision (
HIL) – interactive triage mode - Structured observability (
LOG) – event schemas & reporting scripts
Contributions: create a new dated file using the template in issues/README.md.
Development Utilities (Reporting & Supervision Stubs)#
The repository includes early-stage utilities to support observability and roadmap tracking:
./scripts/log_report.py– Scansreplay_logs/(ignored artifacts) and prints a JSON summary of tool calls, recent failures, and Ollama request counts../scripts/generate_parity_report.py– Emits a coarse capability report (placeholder heuristic) showing which tool modules exist.logging_events.py– Pydantic models for structured events (see Structured Events section).--supervise(bsky.py) – Stub flag that announces supervision mode; future versions will prompt you to accept/skip/block each notification before queueing.
Replay log hygiene: Runtime artifacts in replay_logs/ are ignored by git to prevent noisy diffs while retaining the directory (with README.md) for documentation and example redacted files. Add example artifacts using the prefix example_ if you want to illustrate formats without committing live data.
Run examples:
./scripts/log_report.py
./scripts/generate_parity_report.py
python bsky.py --test --once --supervise # supervision stub demonstration