Motivation

8gent normally routes to static model weights. Models never learn from your sessions. The kernel fine-tuning pipeline closes this loop: every coding session becomes training data, and GRPO continuously evolves a LoRA adapter on top of your base model. The model gets better at your workflows over time.

Architecture

+-------------+      +------------------+      +--------------+
|  8gent TUI  |----->|  Training Proxy  |----->|    Ollama     |
|  (Bun/Ink)  |<-----|  :30000          |<-----|  :11434       |
+-------------+      +--------+---------+      +--------------+
                              |
                     +--------v---------+
                     |  Judge LLM (PRM) |  <-- scores responses
                     |  gemini-2.5-flash|      asynchronously
                     +--------+---------+
                              |
                     +--------v---------+
                     |  GRPO Trainer    |  <-- LoRA fine-tuning
                     |  (MinT backend)  |      during idle/sleep
                     +--------+---------+
                              |
                     +--------v---------+
                     |  Hot-swap LoRA   |  <-- adapter merged
                     |  back to Ollama  |      without restart
                     +-----------------+

Three-Layer Model Architecture

8gent models stack three layers at inference time:

Layer	What	Source	Location
Layer 1: Base Model	Upstream weights (e.g., `qwen3:14b`)	Ollama registry	Never modified locally
Layer 2: Eight LoRA	Centralized fine-tune from autoresearch benchmarks	Shipped with each Eight release	Validated by Gemini Flash judge
Layer 3: Personal LoRA	User's local fine-tune on their coding patterns (Preview - Q2 2026)	Kernel pipeline	`~/.8gent/personal-lora/`

When a new Eight version releases (Layer 2 update), users are prompted to retrain their Personal LoRA (Layer 3) so it aligns with the updated adapter weights.

Model Versioning

Eight models follow a strict naming convention: eight-{major.minor.patch}-q{gen}:{params}

Segment	Meaning	Bumps when...
`major`	Base model change	Switching upstream weights (e.g., Qwen 3 to Qwen 3.5)
`minor`	Judge-validated improvement	Gemini Flash confirms score gain on autoresearch suite
`patch`	Nightly build	Every GRPO training batch produces a new patch
`q{gen}`	Quantization generation	Quantization method changes
`{params}`	Parameter count	Model size changes

Promotion Flow

Nightly training produces a new patch (e.g., eight-1.0.42-q3:14b)
Gemini Flash judge scores the checkpoint against the autoresearch benchmark suite
If the checkpoint outperforms the current release, version-manager.ts promotes it to a new minor version (e.g., eight-1.1-q3:14b)
If it regresses, the checkpoint is rolled back automatically

The version-manager.ts module in packages/eight/ manages this lifecycle. The Gemini Flash judge (google/gemini-2.5-flash:free via OpenRouter) provides zero-cost semantic evaluation.

The Four Phases

The @8gent/kernel package implements the full pipeline in four phases.

Phase 1: Proxy Management

File: packages/kernel/proxy.ts

Manages the training proxy process that sits between 8gent and Ollama. The proxy intercepts requests to collect conversation traces for training.

Start/stop training proxy process
Health checks with configurable timeout
Latency overhead monitoring (direct vs proxied requests)
Configurable latency threshold with alerting

const proxy = new TrainingProxy(config);
await proxy.start();
const acceptable = await proxy.isLatencyAcceptable(); // compare direct vs proxied

Phase 2: Judge Scoring

File: packages/kernel/judge.ts

Scores every agent response using a Process Reward Model (PRM) via Gemini Flash through OpenRouter. The judge evaluates on four criteria:

Criterion	Weight	What it measures
Execution success	40%	Did the code work?
Code quality	20%	Clean, readable, idiomatic?
Tool efficiency	20%	Minimal tool calls, no wasted reads?
Directness	20%	Did the agent get to the point?

const scorer = new JudgeScorer(config);
const score = await scorer.score(sessionId, turn, model, prompt, response);
const trend = scorer.getScoreTrend(7); // 7-day rolling window

Score history is persisted to .8gent/kernel/score-history.json.

Phase 3: Training Orchestration

File: packages/kernel/training.ts

Collects scored responses into GRPO training batches. Trivial responses (perfect scores) and very poor responses are filtered out - the model learns most from challenging-but-achievable tasks.

Automatic training trigger when batch is full
Checkpoint creation and lifecycle tracking
Benchmark validation gate via the autoresearch suite
Auto-rollback on regression

const trainer = new TrainingOrchestrator(config);
trainer.addSample(scoreRecord); // buffers, auto-triggers when batch full
const checkpoints = trainer.getCheckpoints(); // list all with status

Training state is persisted to .8gent/kernel/training/state.json.

Phase 4: Production Loop

File: packages/kernel/loop.ts

Ties everything together. Handles MadMax scheduling (training only during idle/sleep windows), auto-promotion of improved checkpoints into the model router, and health monitoring.

const loop = new ProductionLoop(config);
await loop.processTurn(sessionId, turnIndex, model, prompt, response);
const active = loop.getActiveModel(); // base or fine-tuned
const health = loop.getHealthStatus(); // improving/stable/declining

MadMax scheduling: Weight updates are deferred to idle periods and sleep hours (default: 23:00 to 07:00) so they never interrupt active coding sessions.

Unified Entry Point

The KernelManager class (@8gent/kernel) provides start(), processTurn(), getHealth(), getActiveModel(), and stop() methods. It reads from .8gent/config.json.

The pipeline is off by default. Enable it in .8gent/config.json:

{
  "trainingProxy": {
    "enabled": true,
    "proxyUrl": "http://localhost:30000",
    "autoStart": false
  }
}

How to Enable

# 1. Install the training proxy
pip install -e ".[rl,evolve,scheduler]"

# 2. Point 8gent through the proxy
export TRAINING_PROXY_URL=http://localhost:30000

# 3. Start the training proxy
8gent-proxy start

# 4. Run 8gent normally - sessions now generate training signal
8gent

# 5. Validate a checkpoint against benchmarks
bun run benchmarks/autoresearch/validate-checkpoint.ts

Safety Rails

Checkpoint before every LoRA swap - always rollback-able
Benchmark gate - new weights must match or beat baseline on the autoresearch suite
MadMax scheduling - training never happens during active sessions
LoRA isolation - base model weights are never modified, only adapter layers
A/B routing - the model router can split traffic between base and fine-tuned to measure real impact

Configuration

The training configuration lives at config/training.yaml. Key settings: MadMax scheduling mode, Gemini Flash judge via OpenRouter, MinT backend (local, no cloud dependency), LoRA rank 32.

On this page