8gent Code
Architecture

Motivation

8gent normally routes to static model weights. Models never learn from your sessions. The kernel fine-tuning pipeline closes this loop: every coding session becomes training data, and GRPO continuously evolves a LoRA adapter on top of your base model. The model gets better at your workflows over time.

Architecture

+-------------+      +------------------+      +--------------+
|  8gent TUI  |----->|  Training Proxy  |----->|    Ollama     |
|  (Bun/Ink)  |<-----|  :30000          |<-----|  :11434       |
+-------------+      +--------+---------+      +--------------+
                              |
                     +--------v---------+
                     |  Judge LLM (PRM) |  <-- scores responses
                     |  gemini-2.5-flash|      asynchronously
                     +--------+---------+
                              |
                     +--------v---------+
                     |  GRPO Trainer    |  <-- LoRA fine-tuning
                     |  (MinT backend)  |      during idle/sleep
                     +--------+---------+
                              |
                     +--------v---------+
                     |  Hot-swap LoRA   |  <-- adapter merged
                     |  back to Ollama  |      without restart
                     +-----------------+

Three-Layer Model Architecture

8gent models stack three layers at inference time:

LayerWhatSourceLocation
Layer 1: Base ModelUpstream weights (e.g., qwen3:14b)Ollama registryNever modified locally
Layer 2: Eight LoRACentralized fine-tune from autoresearch benchmarksShipped with each Eight releaseValidated by Gemini Flash judge
Layer 3: Personal LoRAUser's local fine-tune on their coding patterns (Preview - Q2 2026)Kernel pipeline~/.8gent/personal-lora/

When a new Eight version releases (Layer 2 update), users are prompted to retrain their Personal LoRA (Layer 3) so it aligns with the updated adapter weights.

Model Versioning

Eight models follow a strict naming convention: eight-{major.minor.patch}-q{gen}:{params}

SegmentMeaningBumps when...
majorBase model changeSwitching upstream weights (e.g., Qwen 3 to Qwen 3.5)
minorJudge-validated improvementGemini Flash confirms score gain on autoresearch suite
patchNightly buildEvery GRPO training batch produces a new patch
q{gen}Quantization generationQuantization method changes
{params}Parameter countModel size changes

Promotion Flow

  1. Nightly training produces a new patch (e.g., eight-1.0.42-q3:14b)
  2. Gemini Flash judge scores the checkpoint against the autoresearch benchmark suite
  3. If the checkpoint outperforms the current release, version-manager.ts promotes it to a new minor version (e.g., eight-1.1-q3:14b)
  4. If it regresses, the checkpoint is rolled back automatically

The version-manager.ts module in packages/eight/ manages this lifecycle. The Gemini Flash judge (google/gemini-2.5-flash:free via OpenRouter) provides zero-cost semantic evaluation.

The Four Phases

The @8gent/kernel package implements the full pipeline in four phases.

Phase 1: Proxy Management

File: packages/kernel/proxy.ts

Manages the training proxy process that sits between 8gent and Ollama. The proxy intercepts requests to collect conversation traces for training.

  • Start/stop training proxy process
  • Health checks with configurable timeout
  • Latency overhead monitoring (direct vs proxied requests)
  • Configurable latency threshold with alerting
const proxy = new TrainingProxy(config);
await proxy.start();
const acceptable = await proxy.isLatencyAcceptable(); // compare direct vs proxied

Phase 2: Judge Scoring

File: packages/kernel/judge.ts

Scores every agent response using a Process Reward Model (PRM) via Gemini Flash through OpenRouter. The judge evaluates on four criteria:

CriterionWeightWhat it measures
Execution success40%Did the code work?
Code quality20%Clean, readable, idiomatic?
Tool efficiency20%Minimal tool calls, no wasted reads?
Directness20%Did the agent get to the point?
const scorer = new JudgeScorer(config);
const score = await scorer.score(sessionId, turn, model, prompt, response);
const trend = scorer.getScoreTrend(7); // 7-day rolling window

Score history is persisted to .8gent/kernel/score-history.json.

Phase 3: Training Orchestration

File: packages/kernel/training.ts

Collects scored responses into GRPO training batches. Trivial responses (perfect scores) and very poor responses are filtered out - the model learns most from challenging-but-achievable tasks.

  • Automatic training trigger when batch is full
  • Checkpoint creation and lifecycle tracking
  • Benchmark validation gate via the autoresearch suite
  • Auto-rollback on regression
const trainer = new TrainingOrchestrator(config);
trainer.addSample(scoreRecord); // buffers, auto-triggers when batch full
const checkpoints = trainer.getCheckpoints(); // list all with status

Training state is persisted to .8gent/kernel/training/state.json.

Phase 4: Production Loop

File: packages/kernel/loop.ts

Ties everything together. Handles MadMax scheduling (training only during idle/sleep windows), auto-promotion of improved checkpoints into the model router, and health monitoring.

const loop = new ProductionLoop(config);
await loop.processTurn(sessionId, turnIndex, model, prompt, response);
const active = loop.getActiveModel(); // base or fine-tuned
const health = loop.getHealthStatus(); // improving/stable/declining

MadMax scheduling: Weight updates are deferred to idle periods and sleep hours (default: 23:00 to 07:00) so they never interrupt active coding sessions.

Unified Entry Point

The KernelManager class (@8gent/kernel) provides start(), processTurn(), getHealth(), getActiveModel(), and stop() methods. It reads from .8gent/config.json.

The pipeline is off by default. Enable it in .8gent/config.json:

{
  "trainingProxy": {
    "enabled": true,
    "proxyUrl": "http://localhost:30000",
    "autoStart": false
  }
}

How to Enable

# 1. Install the training proxy
pip install -e ".[rl,evolve,scheduler]"

# 2. Point 8gent through the proxy
export TRAINING_PROXY_URL=http://localhost:30000

# 3. Start the training proxy
8gent-proxy start

# 4. Run 8gent normally - sessions now generate training signal
8gent

# 5. Validate a checkpoint against benchmarks
bun run benchmarks/autoresearch/validate-checkpoint.ts

Safety Rails

  1. Checkpoint before every LoRA swap - always rollback-able
  2. Benchmark gate - new weights must match or beat baseline on the autoresearch suite
  3. MadMax scheduling - training never happens during active sessions
  4. LoRA isolation - base model weights are never modified, only adapter layers
  5. A/B routing - the model router can split traffic between base and fine-tuned to measure real impact

Configuration

The training configuration lives at config/training.yaml. Key settings: MadMax scheduling mode, Gemini Flash judge via OpenRouter, MinT backend (local, no cloud dependency), LoRA rank 32.