Releases

Release notes for 8gent Code. Format follows Keep a Changelog; versioning follows Semantic Versioning.

Work in progress for the next release is listed under Unreleased.

Unreleased

Added

  • Telegram multi-step task surface (#1906, #1913) - the @eaborobot bot now runs multi-tool agent plans behind a single live-edited progress message instead of blocking single-shot prompts. New modules under packages/telegram-bot/: task-runner (lifecycle), mobile-formatter (concise tool summaries + chunking), file-sender (sendDocument/sendPhoto for screenshots and code files), keyboards (Cancel / Retry / Continue / Approve), session-store (per-chat persistence at ~/.8gent/telegram-sessions.json), daemon-client (WebSocket wrapper with reconnect), and bridge-adapter (event glue). packages/daemon/telegram-bridge.ts opts into the new path by default; set EIGHT_TG_LEGACY=1 to fall back. New /cancel command stops the in-flight task. 38 unit tests covering the 5-step end-to-end flow with a fake socket. See docs/specs/TELEGRAM-MULTISTEP.md.

Fixed

  • Final CJK comment cleanup in `packages/tools/option.ts` - leftover the值 in a JSDoc comment for None.flatMap(), last trace of the merge corruption that #1893 cleaned up in identifiers. bun run typecheck and bun test packages/ apps/ (243 + 16 pass, 0 fail) confirm the repo-health work tracked in #1883 is complete.
  • CI test suite green - scoped bun test to packages/ and apps/ only, excluding benchmarks/ autoresearch-loop validators that depend on dynamically-generated code in benchmarks/autoresearch/work/ (which only exists at benchmark runtime, not in CI). Restored 241 pass / 10 skip / 0 fail from 243 pass / 21 skip / 439 fail / 16 errors. Added test:benchmarks script for running them explicitly when the work dir is populated. CI uses bun run test to honor the package.json scope.
  • Corrupted identifiers in `packages/tools/` - structured-log.ts referenced this.current位 (CJK garbage from a bad merge) instead of this.currentLevel. test-runner.ts returned total意图 instead of totalDuration. Both were causing tsc --noEmit to fail on main, blocking CI on every open PR. First payment toward the broader typecheck cleanup tracked in #1816.

Added

  • 8gent Computer Phase 3 cua loop - perceive → recall → decide → act loop at packages/eight/loops/computer-use.ts with channel-aware failover (default vision/tool tier = Qwen 3.6-27B), accessibility-tree-first perception (packages/eight/perception/tree.ts) with screenshot escalation (packages/eight/perception/screenshot.ts), Qwen-tuned vision prompt template (packages/eight/prompts/computer-use-vision.ts) with graceful no-vision fallback for the heavy cloud tier, system prompt at packages/eight/prompts/computer-use-system.ts, AppKit-based AX tree query (packages/daemon/tools/accessibility-tree.ts + Swift CLI helper at apps/8gent-computer/Sources/AccessibilityTreeCLI/main.swift) replacing the Phase 1 stub, and an 8-task headless smoke suite (packages/eight/scripts/computer-use-suite.ts) that runs in CI on every PR. NemoClaw policy gate preserved on the production hands path (#1864, #1865, #1866, #1867, #1882)
  • 8gent Computer Phase 2 scaffold - new apps/8gent-computer/ Swift package. NSApplication accessory shell, Cmd+Opt+Space global hotkey via NSEvent monitors (no Accessibility prompt), glass NSPanel anchored 80px from the bottom, static AudioWaveView placeholder, headless CLI --headless --intent "..." emitting structured JSON. macOS CI job builds the Swift target and runs the headless smoke (#1857, #1858, #1859)
  • 8gent Computer Phase 2.4-2.7 voice round-trip - on-device SFSpeechRecognizer streaming captions with mic + speech permission (SpeechCapture.swift), URLSessionWebSocketTask client connecting to the daemon /computer route with reconnect + exponential backoff (DaemonClient.swift), AVSpeechSynthesizer sentence-buffered TTS with hotkey-press interrupt (SpeechReply.swift), daemon protocol v1 wire types (Models/Event.swift), in-panel approval sheet for NemoClaw prompts, real HeadlessMode.swift that connects to the daemon and emits NDJSON token/tool/done events on stdout, mock daemon (scripts/mock-daemon.ts) used by the macOS CI swift smoke job (#1860, #1861, #1862, #1863)
  • Grove consent ceremony copy draft - 3-screen plain-English consent flow (What you share / What you receive / Confirm). Reading grade 9, no em-dashes, no purple/pink/violet, default-decline, signed local-only acknowledgement schema (#1568)
  • 8gent Computer Phase 4 trace capture - new packages/memory/computer-use-traces.ts API with startTrace, appendStep, closeTrace, getTrace, listRecent, purgeOlderThan, backed by SQLite migration 001-traces.sql (tables computer_use_traces + computer_use_trace_steps, FK cascade, ordered step indexing). Screenshots written to ~/Library/Application Support/8gent/traces/<sessionId>/<step>.png; rows hold the path. Local-only in v0, no sync. Headless smoke at packages/memory/scripts/smoke-trace-capture.ts. Trace viewer CLI at packages/memory/scripts/traces.ts with list, show, replay (protocol-version-1 NDJSON), and purge --older-than (#1868, #1869)

0.3.0

2026-04-11

Added

Harness Architecture

  • Brain/hands isolation - immutable JSONL audit logging, session file permission restrictions (#1410)
  • Context compression - proactive token threshold compression for long sessions (#1413, #1405)
  • Sub-agent spawn protocol - formal spawn protocol with governance hooks (#1411, #1406)
  • Skill compounding - completed tasks automatically become reusable skills (#1412, #1407)

Voice

  • KittenTTS neural voices - 8 local neural TTS voices (Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo) via KittenML/kitten-tts-nano-0.8. Free, no API key, runs on CPU
  • Voice onboarding - new onboarding step auto-detects Python/KittenTTS, offers one-click install, interactive voice picker. Bruno is the default voice
  • Full-duplex provider - FullDuplexProvider interface with machine-aware backend detector in @8gent/voice (#1252)
  • MoshiMLXProvider - full-duplex voice backend for Apple Silicon via Moshi/Kyutai (#1253)

Orchestration

  • RoleRegistry - role-based runner configs in @8gent/orchestration (#1294)
  • TaskDispatcher - atomic task dispatch with claimed map + state machine (#1279)
  • HyperAgent pipeline - extracted sequential pipeline into @8gent/orchestration (#1251)
  • Terminal tabs - node-pty based terminal tab support (#1276)

TUI

  • Skills loader - TUI loads skills from bundled packages/skills/*/SKILL.md and .claude/skills/*/SKILL.md. Supports triggers, aliases, and /alias resolution
  • Skill slash commands - unknown /commands resolve to loaded skills automatically. Ghost completions include all skill triggers
  • Crash-resilient sessions - session persistence with resume on crash (#1175)

Infrastructure

  • Cloud vessel deployment - dual inference mode for cloud-hosted vessels (#1161)
  • Adaptive sequential pipeline - Run D multi-inference pipeline (#1159)
  • Entity dedup - UNIQUE(type, name) constraint for entity deduplication (#1369, #1382)
  • Zod to JSON Schema - generation pipeline for schema drift prevention (#1370)

Fixed

  • TUI layout - 5 layout constraint fixes: status verb row, agent mode bar overflow, flexShrink on chrome, ProcessDetailView height, horizontal chrome columns (#1295-#1300)
  • TUI CLI - --provider=, --model=, --yes flag parsing. Implicit tui subcommand when first token is a flag
  • TUI narrow terminals - compact single-line status bar under 92 columns, scaled sidebar width
  • TUI model picker - fixed invisible cursor on Solarized themes, scroll init, empty model filtering
  • Destructive prompt mutations - disabled to prevent unintended prompt changes (#1129)
  • Array.from() for Map iterators in harness module

Security

  • Session file permissions restricted, vault sentinels redacted
  • YAML frontmatter sanitized in skill compounding
  • Command injection, path traversal, prototype pollution fixed in spawn protocol

Changed

  • Repository scope - governance docs moved to 8gi-governance, media assets to 8gent-world. This repo is kernel-only
  • TUI animations - ^A toggle now fully disables/enables all motion
  • TUI status bar - plain telemetry labels, "ready (awaiting input)" instead of "Done"

2.0.1

2026-03-25

Fixed

  • Double shebang bug - Build output had #!/usr/bin/env bun prepended twice, breaking execution on Linux. Now checks if shebang exists before prepending. This fix is critical for all Linux users and Docker-based benchmarks.
  • Cross-platform build script - Replaced macOS-only sed -i '' with cross-platform Node one-liner.
  • CI workflows - Added version sync check, secrets scan, policy engine integrity test to CI. Release workflow now auto-publishes to npm.
  • Harbor adapter - Fixed AgentContext API (pydantic data model, not message store). Added robust Bun + 8gent installation in Docker containers.

Added

  • Terminal-Bench Harbor adapter (benchmarks/harbor_adapter/) - Runs 8gent through Terminal-Bench 2.0 via Harbor framework. Oracle baseline validated at 80%.
  • Ollama timeout increase - 5 minute timeout for 14B model pulls.

2.0.0

2026-03-25

Added

  • Computer Use - Power #10 (packages/computer/) - Desktop automation via usecomputer bridge. Screenshot, click, type, press, scroll, drag, hover, clipboard, window list. Security-gated with point validation, max limits, dangerous key detection.
  • Process Manager (packages/computer/process-manager.ts) - Process listing with memory/CPU, graceful/force quit, safe list, system-critical protection (22 blocked processes), suggest quittable apps.
  • 13 desktop tools in ToolExecutor - desktop_screenshot, desktop_click, desktop_type, desktop_press, desktop_scroll, desktop_drag, desktop_hover, desktop_windows, desktop_clipboard, desktop_processes, desktop_quit_app, desktop_suggest_quit, desktop_safe_list.
  • Running Apps menu in Lil Eight's macOS menu bar - memory stats, per-app quit/force-quit dialogs, Quit All Non-Essential, protected app list.
  • CLUI tray upgrade (apps/clui/src-tauri/src/lib.rs) - Daemon status, session count, Resource Manager submenu, Settings submenu with daemon control, log viewer, config access.
  • AGENTS.md - Universal agent instructions for any AI coding harness (Pi, Hermes, OpenCode, Aider, Goose, Cline, Continue, SWE-Agent).
  • NemoClaw desktop policies - Desktop automation policy rules: read-only ops allowed, mutations require approval, dangerous key combos (cmd+q, alt+f4) hard-blocked.

Changed

  • README.md - Added usecomputer and Quitty to inspirations. Fixed Hermes credit to NousResearch (was incorrectly ArcadeAI).

Removed

  • 60+ quarantined tool files - Cleaned up unused utility packages and their quarantine docs. All recoverable from git history.

1.0.0

2026-03-22

Added

  • Daemon Protocol (docs/DAEMON-PROTOCOL.md) - WebSocket protocol specification for external clients (8gent.app, 8gent OS, Telegram). Defines connection handshake, auth, session lifecycle, prompt/response streaming, cron management, and health checks. The contract between the brain and the interfaces.
  • BRAND.md - Canonical brand reference copied from 8gent-world. Typography (Fraunces/Inter/JetBrains Mono), color palette, domain table.
  • Changesets - Added @changesets/cli for monorepo version management across 40+ packages.
  • Daemon gateway expansion (packages/daemon/gateway.ts) - New WebSocket message types: sessions:list, cron:list, cron:add, cron:remove, health. External clients can now manage cron jobs and query daemon state.
  • Session state persistence (packages/daemon/index.ts) - Saves active session metadata to ~/.8gent/daemon-state.json on graceful shutdown. Clients can resume sessions after daemon restart.
  • Idle session cleanup (packages/daemon/agent-pool.ts) - Sessions idle for 30+ minutes are automatically evicted. Cleanup runs every 5 minutes.

Fixed

  • Daemon log overwrite bug (packages/daemon/index.ts) - Bun.write() was overwriting the log file on every event instead of appending. Switched to appendFileSync().
  • Ecosystem references - CLAUDE.md and README.md now reference 8gent.dev as canonical domain and link to all ecosystem products (8gentos.com, 8gent.app, 8gentjr.com, 8gent.world, 8gent.games).

Changed

  • CLAUDE.md - Added ecosystem table and reference to BRAND.md. 8gent Code positioned as "the brain" and free on-ramp to 8gent OS.
  • Security (packages/validation/security-scanner.ts, secret-patterns.ts) - Static security scanner: detects leaked secrets (API keys, AWS credentials, DB connection strings, JWTs, private keys) and vulnerability patterns (eval injection, SQL concat, innerHTML XSS). scanFile, scanContent, scanDirectory API. Pre-commit gate via hasCriticalFindings. Credit: 0din-ai/ai-scanner for pattern taxonomy.
  • Ability Scorecards (packages/validation/ability-scorecard.ts) - Measurable metrics per ability: memory recall accuracy, worktree parallelization efficiency, policy violation rate, evolution improvement delta, healing recovery rate, entrepreneurship hit rate, AST blast radius accuracy, browser research relevance. JSONL persistence per session, baseline comparison.
  • Meta-Optimizer (benchmarks/autoresearch/meta-optimizer.ts) - Optimizes beyond system prompt: mutates few-shot examples, model routing priority, grading weights (exec/keyword split), and temperature sweep. Heuristic suggestions based on what worked per category. Inspired by Karpathy's program.md meta-optimization concept.
  • Macro Action Decomposer (packages/orchestration/macro-actions.ts) - Coarse-grained task delegation: topological sort on dependencies, parallel group detection, critical path analysis, speedup estimation. Kahn's algorithm for wave-based parallelization.
  • Actuator Tools (packages/tools/actuators/) - Write actuators for the physical/digital world: deploy (Vercel, Railway, Fly.io), publish (npm, git tag, GitHub release), notify (Telegram, GitHub issues). Dry-run by default. All actions return undo commands where reversible.
  • Token Throughput Tracker (packages/orchestration/throughput-tracker.ts) - Global tokens/sec metric across all parallel agents. Sliding window snapshots, daily reports, per-agent utilization, model and category breakdowns. JSONL persistence with 7-day retention.
  • Curriculum Skills (packages/eight/curriculum.ts) - Teachable curricula with step progression, exercises, and comprehension checks. Built-in: "8gent-architecture" (5 steps) and "writing-benchmarks" (4 steps). CurriculumRunner generates teaching prompts.
  • Persona Mutation (packages/self-autonomy/persona-mutation.ts) - Auto-tune SOUL.md calibration table from accumulated user feedback. Parses current persona parameters, records up/down feedback with evidence, suggests mutations (each feedback = +/-5, clamped 0-100). Never writes to SOUL.md directly (safety constraint).
  • Telegram Unified Portal (packages/telegram-bot/unified-portal.ts) - Single portal to all automation: /status, /agents, /benchmark, /deploy, /throughput, /scorecard, /soul, /help. Auth gating, inline keyboards, auto-split for long messages. Stub handlers document integration points.

0.8.0

2026-03-21

Added

Eight Core Abilities (8 new packages)

  • Memory (packages/memory/) - SQLite + FTS5 persistent recall with Ollama embeddings, 30-day decay, frequency-based promotion via PromotionManager, semantic search via SemanticRecall
  • Worktree (packages/orchestration/) - WorktreePool for multi-agent parallel execution via git worktrees, max 4 concurrent agents, filesystem-based inter-agent messaging
  • Policy (packages/permissions/) - YAML-driven policy engine with 11 default rules, approval gates for destructive operations, privacy-aware model routing
  • Evolution (packages/self-autonomy/) - Post-session reflection, Bayesian skill confidence scoring, self-improvement SQLite database, learns from successes and failures
  • Healing (packages/validation/) - Hypothesis Loop pattern: checkpoint-action-verify-revert. Git-stash atomic snapshots, failure log (~/.8gent/healing/failures.jsonl), configurable retry limits
  • Entrepreneurship (packages/proactive/) - GitHub bounty and help-wanted issue scanner, capability matcher, opportunity pipeline with full lifecycle tracking
  • AST (packages/ast-index/) - Blast Radius Engine: import dependency graph, test file mapping, change impact estimation before any edit
  • Browser (packages/tools/browser/) - Lightweight web access via fetch + DuckDuckGo HTML scraping, disk cache, no headless browser dependencies

Voice Chat Mode

  • `/voice chat` - Half-duplex voice conversation loop: listen -> transcribe -> agent -> speak -> listen
  • Sox-based recording with built-in silence detection (auto-stops when you stop talking)
  • Local transcription via whisper.cpp, OpenAI Whisper cloud fallback
  • macOS TTS via say command with sentence-chunked delivery for natural speech
  • ESC interrupts agent mid-speech, status bar shows VOICE CHAT / SPEAKING / THINKING states
  • VoiceChatLoop class (packages/voice/voice-chat.ts), useVoiceChat React hook

TUI Overhaul

  • Neumorphic folder tabs - Chat, Notes, Ideas, BTW, Questions, Music workspace tabs
  • Activity monitor - Real tool-call feed replacing decorative spinner
  • Responsive chat bubbles - Terminal-width-aware indent (10%, capped at 12 cols)
  • Folder frame - Vertical borders on content area
  • ADHD mode - Bionic text boldening + ACE-Step LoFi music generation

Infrastructure

  • SOUL.md - Agent persona definition: "The Infinite Gentleman" identity, voice calibration, 11 principles, anti-patterns, heartbeat system, daily schedule
  • GitHub integration - Token management (Keychain/encrypted), REST API helpers, /github slash command, gh CLI auto-config
  • Ability showcase benchmark - Single task exercising all 8 abilities end-to-end
  • Long-horizon benchmarks (LH001-LH005) - Review bot, migration, scheduler, API gateway, CLI framework
  • Competition infrastructure - overnight-competition.ts, overnight-orchestrator.sh, sync-results.ts
  • Tenant Convex persistence - tenants table with CRUD mutations, in-memory fallback
  • Session sync - SessionSyncManager batches token/tool-call deltas, flushes every 10s
  • Real Stripe billing - Real SDK calls, webhook verification, Hono+Express handlers, lazy init
  • Knowledge graph - SQLite entity/relationship store with BFS traversal, heuristic extraction
  • Memory v2 - SQLite+FTS5+embeddings replacing JSONL, 5 memory types, hybrid search, version history

Fixed

  • Voice TUI freeze - Replaced blocking while(running) with non-blocking setTimeout scheduler; fixed useInput consuming all keyboard input
  • Recording never stopping - VoiceEngine audio levels were simulated (random numbers); switched to sox native silence detection
  • Duplicate voice messages - Voice hook and agent.chat() both adding messages to state
  • [_EOT_] tokens in display - Strip qwen3.5 end-of-turn markers from transcript, display, and TTS
  • Chat bubbles disappearing - useStdout() called conditionally, violating React hook rules; moved to top of component
  • Voice messages not showing - agent.chat() doesn't update React state; added explicit setMessages() calls
  • Text overlap in message list - Hardcoded marginLeft={20} replaced with responsive terminal-width-based indent
  • Security - Removed all hardcoded Telegram tokens, moved to .env

0.7.0

2026-03-18

Added

  • Smart onboarding - auto-detects git config, Ollama models, GitHub auth; reduces from 8 questions to 3
  • Preferences cloud sync - PreferencesSyncManager pulls/pushes preferences via Convex after auth; updatedAt wins merge strategy
  • Adaptive system prompt - USER_CONTEXT_SEGMENT injects user name, role, communication style into system prompt
  • Session history & resume - /history, /continue, /resume, /compact slash commands; checkpoints every 5 messages
  • Conversations table - Convex schema for cross-device session persistence with checkpoint data
  • Personal LoRA collector - PersonalCollector quality-filters session traces for fine-tuning (score >= 0.7, no corrections)
  • ESC to interrupt - pressing Escape during generation aborts the AI SDK stream immediately
  • User-scoped memory - userId field on MemoryBase and SearchOptions for per-user memory recall
  • HistoryScreen - TUI screen for browsing and resuming past sessions with keyboard navigation
  • Comprehensive personalization docs - docs/PERSONALIZATION.md covering all 5 phases

0.6.0

2026-03-17

Added

  • `apps/clui/` - Tauri 2.0 desktop overlay - branded 8gent desktop app with Alt+Space toggle, multi-tab sessions, transparent floating overlay, Rust backend for process management, React 19 frontend with Tailwind CSS 4, real-time NDJSON streaming from agent subprocess, permission server for human-in-the-loop tool approval
  • `packages/auth/` - Clerk authentication - device code flow for CLI login (8gent auth login), macOS Keychain token storage with AES-256-GCM encrypted file fallback, JWT validation via jose, automatic token refresh, non-blocking anonymous mode (auth never blocks local usage)
  • `packages/db/` - Convex database - reactive database with users, sessions, usage, and preferences tables; real-time sync; Clerk auth integration; offline mutation queuing; ConvexClient wrapper for Bun
  • `packages/voice/` - Speech-to-Text via Whisper - local transcription via whisper.cpp CLI (no cloud dependency), sox-based mic recording, model manager with streaming downloads from Hugging Face (tiny/base/small), voice activity detection, OpenAI Whisper API cloud fallback, VoiceEngine with EventEmitter API
  • `packages/control-plane/` - Multi-tenant management - tenant provisioning with subdomain routing (username.8gent.app), usage analytics, billing plan definitions (free/pro/team), Stripe integration stubs, admin dashboard data layer
  • `apps/dashboard/` - Admin dashboard - Next.js 16 admin panel with Clerk auth + RBAC, user management with search/filter, session monitoring, usage charts (recharts), system health, model distribution, plan management
  • CLUI integration components - ThinkingView, EvidencePanel, PlanKanban, AuthGate, SettingsPanel adapted from TUI to React DOM with Framer Motion animations
  • TUI voice components - useVoiceInput hook (Ctrl+Space toggle) and VoiceIndicator component with recording status, audio levels, and download progress
  • CLI auth commands - 8gent auth login, 8gent auth logout, 8gent auth status, 8gent auth whoami
  • 20 BMAD planning documents - project briefs, PRDs, architecture docs, and epics for all 5 phases across docs/bmad/
  • Local vision & OCR model support - vision router now auto-discovers OCR-specialized models (dots.ocr, deepseek-ocr, glm-ocr) alongside general vision models (qwen2.5-vl, minicpm-v, internvl2)
  • `/vision` slash command - configure vision/OCR models from TUI: /vision status, /vision model <name>, /vision ocr <name>, /vision pull for recommendations
  • Vision config in `.8gent/config.json` - user-configurable defaultModel, ocrModel, fallback chains, provider preference (local/cloud), timeout
  • OCR-specific routing - findOCRModel() prefers dedicated OCR models for text extraction, falls back to general vision models with strong OCR
  • OCR prompt in VisionInterpreter - dedicated OCR prompt preserves formatting, tables, code indentation, and LaTeX formulas
  • OpenRouter free vision fallback - vision router now checks OpenRouter free models even without API key as additional fallback

Fixed

  • Installer color violations - replaced forbidden color="gray" and color="white" with dimColor and default text in apps/installer/src/index.tsx
  • write_file path bug - models sometimes pass absolute-looking paths like /8gent-code/server.ts that resolve outside the working directory; these are now auto-stripped to relative paths instead of throwing a path-traversal error; tool description and system prompt updated to instruct models to use relative paths

Added

  • Dynamic free model router - getBestFreeModel() in packages/providers/index.ts queries OpenRouter's /api/v1/models endpoint to find the best available free model (filtered by :free suffix, sorted by context length); results cached for 1 hour; spawn_agent now accepts model: "auto:free" to automatically pick the best free model
  • Evidence collection visible in TUI - real-time evidence badges (pass/fail) appear in the chat stream after write_file, edit_file, run_command, and git_commit; one-line summary shown at end of each response; /evidence command shows full session breakdown with per-type counts
  • Three-layer model architecture - base model (qwen3) + Eight LoRA (centralized training from benchmarks) + Personal LoRA (user's local fine-tune on their patterns); personal module retrains when a new Eight version releases
  • Eight model version manager (version-manager.ts) - manages model promotion lifecycle with naming convention eight-{major.minor.patch}-q{gen}:{params}, Gemini Flash judge validates checkpoints before promotion
  • 8gent as default provider - eight-1.0-q3:14b is now the primary recommended model across all documentation and quick-start guides
  • Auto-open files on macOS - files referenced in agent output are opened automatically in the default editor
  • TUI accepts any model name - /model command now accepts arbitrary model identifiers, not just predefined options

Fixed

  • Security fixes ported to `packages/eight` - hardened command execution, input sanitization, and permission checks carried over from agent package

Added (prior)

  • `@8gent/kernel` package - full 4-phase RL fine-tuning pipeline via training proxy - Phase 1: Proxy manager (proxy.ts) - start/stop training proxy, health checks, latency overhead monitoring with configurable threshold - Phase 2: Judge scoring (judge.ts) - PRM wiring via Gemini Flash (free), score distribution tracking, per-model stats, daily trend analysis - Phase 3: Training orchestration (training.ts) - GRPO batch collection with score filtering, checkpoint creation, benchmark validation gate, auto-rollback on regression - Phase 4: Production loop (loop.ts) - MadMax scheduling (sleep/idle windows), auto-promotion of improved checkpoints into model-router, health monitoring, score trend alerts - Kernel manager (manager.ts) - unified entry point, reads .8gent/config.json, safe no-op when disabled
  • RL fine-tuning exploration - architecture doc, proxy config, and integration plan for continuous GRPO fine-tuning of local Ollama models via training proxy
  • Training proxy toggle - TRAINING_PROXY_URL env var and .8gent/config.json training_proxy section to route Ollama calls through the OpenAI-compatible training proxy
  • RL checkpoint validation gate - benchmarks/autoresearch/validate-checkpoint.ts runs benchmark suite against fine-tuned models and compares against baseline scores to prevent regressions
  • Kernel Fine-Tuning section in README - documents proxy architecture, base model recommendations, and how to enable
  • Remotion video demos (apps/demos/) - React-based video generation for product reels and landing page content - 3 ready-to-render compositions: HeroIntro, FeatureShowcase, CostComparison - 9:16 vertical (reels) and 16:9 landscape variants for each - Reusable component library: Logo, TerminalWindow, GlowCard, CodeBlock, Background - Animation utilities: fade-in, scale-in, typewriter, glow pulse, counter - Branded design tokens matching 8gent visual identity - Scripts: studio, render:hero, render:features, render:cost, render:all - Media preview page (bun run demos:media) - Vite-powered browser preview with Remotion Player

0.5.0

2026-03-14

Added

  • Universal BMAD planning - system prompt now classifies tasks as Code, Creative, Research, Planning, or Communication with tailored approaches for each
  • Proactive planner wired into agent loop - updates prediction context on every tool call, tracks modified files and errors
  • Evidence collection in agent core - fire-and-forget evidence gathering after file writes, commands, and git commits; session summary on finish
  • AST `indexFolder()` implementation - recursively parses TS/JS files, populates symbol maps and file outlines
  • AST `getSymbolSource()` implementation - reads file and extracts lines for a specific symbol with optional context
  • AST `estimateTokenSavings()` implementation - calculates full-file vs symbol-only token estimates
  • Momentum tracking in ProactivePlanner - tracks steps completed, rate (steps/min), and streak
  • Universal step categories - added creative, research, communication, planning to StepCategory
  • Creative/research prediction methods - predictCreativeSteps() and predictResearchSteps() for non-code tasks
  • REPL commands: /board (kanban view), /predict (confidence-scored predictions), /momentum (velocity stats)
  • bmad-method as devDependency (v6.1.0) with auto-init on postinstall

Fixed

  • EvidenceCollector constructor now accepts optional config with process.cwd() default (was required, crashed without args)
  • PredictionContext.currentPlan type inlined (was referencing undefined ExecutionPlan)
  • indexRepo() now throws descriptive error instead of generic "Not implemented"
  • Removed ...config spread in EvidenceCollector that was overwriting defaults

Changed

  • Version bump to 0.5.0 (new features: BMAD wiring, evidence, AST, momentum)

0.3.1

2026-03-14

Added

  • Agent mode cycling (Ctrl+T): Planning, Researching, Implementing, Testing, Debugging
  • Kanban auto-population from agent PLAN: output - parses numbered steps into cards
  • Kanban auto-advancement: Ready → In Progress on tool start, → Done on tool end
  • Dynamic model fetching per provider (Ollama, OpenRouter, LM Studio)

Fixed

  • ADHD mode toggle (stale closure - only toggled on, never off)
  • Scroll jumping - removed overflow:hidden, capped visible messages to 50
  • Re-planning loop - agent now plans once then executes immediately
  • Replaced "Demoing" mode with "Debugging"

0.3.0

2026-03-14

Added

  • packages/eight/ - New core agent engine (replaces packages/agent/) - Non-blocking agent with always-visible input and message queue - Real-time streaming of assistant reasoning into chat - Ollama, LM Studio, and OpenRouter client modules - Context engineering and prompt system - Full REPL with tool loop
  • packages/ai/ - Vercel AI SDK integration - ToolLoopAgent with multi-turn conversation support - Provider abstraction (Ollama, OpenRouter, LM Studio) - Toolshed bridge for dynamic tool loading
  • packages/harness-cli/ - Headless CLI for running and inspecting 8gent sessions - harness run / harness inspect / harness doctor / harness sessions
  • packages/specifications/ - Session spec v2 with full AI SDK data model - JSON schema, reader, writer for session persistence
  • apps/debugger/ - Next.js session debugger app - Session list, viewer, streaming, copy-as-JSON
  • benchmarks/ - Full v2 benchmark suite (39 benchmarks, 7 categories) - Autoresearch harness with Ollama + OpenRouter fallback - Experience-based model router (learns best model per domain) - Execution grader (SWE-bench style, 70% exec + 30% keyword) - 15 battle-test benchmarks across professional domains - Prompt mutation system with failure analysis - Overnight runner for continuous improvement
  • packages/dreams/ - Creative scripts for video generation
  • TUI overhaul - Design-system-first architecture with primitives layer - Process sidebar (Ctrl+B) for background tasks - useLayout hook for centralized panel/pane state - Theme tokens and semantic color system - Pinned process sidebar with overflow scroll fix
  • 8 CLI alias (short for 8gent)
  • Background task auto-promotion for long-running commands
  • Spatial awareness and "orient first" rules in system prompt
  • Loop detection and lightweight run log

Changed

  • Breaking: packages/agent/ renamed to packages/eight/
  • Agent now uses Vercel AI SDK ToolLoopAgent instead of raw fetch
  • Session spec upgraded to v2 (incompatible with v1 sessions)
  • System prompt refined with scaffolding guidance, dev server warnings
  • All TUI components migrated from raw colors to design system primitives

Fixed

  • .env loading from repo root and ~/.8gent when running from another directory
  • Tool call visibility in message stream
  • Command failures now shown inline
  • list_files no longer hides directories
  • JSON tool format removed from prompt (uses native function calling)

Battle Test Scores (v0.3.0)

BenchmarkDomainScore
BT001Auth System94
BT002Event Architecture92
BT003Data Pipeline100
BT005State Machine92
BT007SEO Audit96
BT011Video Production100
BT012Music Theory81
BT014AI Consulting95

0.2.0

2026-03-10

Added

  • OpenRouter provider wired into TUI and agent runtime
  • Benchmark suite v1 (bug-fixing, file-manipulation, feature-implementation)
  • Autoresearch loop (Karpathy methodology)
  • Few-shot examples per benchmark category
  • Temperature sweep (0.3, 0.5, 0.7)
  • Fullstack benchmarks (FS001-FS003, FS-MEGA-001)
  • Agentic benchmarks (TC001, DP001, RE001, SD001, AR001, CB001, MR001)
  • UI design benchmarks (UI001-UI008)
  • Reporting module with token savings calculator

Changed

  • Prompt mutation system with deduplication (exact + 70% word overlap)

0.1.0

2026-02-28

Added

  • Initial release
  • Ink v6 TUI with chat interface
  • Ollama integration (local LLM inference)
  • Basic tool system (file read/write, shell commands)
  • System prompt with coding agent persona
  • Demo savings calculator