octa² - the infrastructure around the model

octa². The infrastructure around octa.

The model is one component. The infrastructure is what compounds. octa² combines six algorithms running in parallel - online experimentation, ensemble signal ranking, multi-model rotation, sequence policy learning, model distillation, and the macro analysis loop. Each loop validates inputs for the others. Validated learnings flow back into the corpus that retrains octa. Outcomes improve every campaign even between model releases.

See the six algorithms ->
octa
Online experimentation
Ensemble ranking
Model distillation
Macro analysis
Multi-model rotation
Sequence policy
Model vs. process

The model gets retrained on a release cadence. The process gets sharper every hour.

Most "AI for sales" pitches hand-wave at "the model keeps learning." It doesn't, not in real time. Model weights change when training runs land. What changes continuously is the orchestration around the model: which variant ships, which signal gets ranked, which contact gets routed, which exemplar gets retrieved, which experiment gets greenlit. That is octa². The model is one component. The process is the moat.

One Model component (octa)

octa retrains on a release cadence: continued pre-training plus long-horizon RL on the corpus. Weights change weekly to monthly, not per campaign.

Six Algorithms running in parallel

Online experimentation, ensemble signal ranking, multi-model rotation, sequence policy learning, model distillation, and the macro analysis loop. Each one runs its own loop at its own cadence.

Every Campaign generates data

Every send, reply, meeting, landing-page click, deal stage transition. The infrastructure turns the raw stream into validated learnings per segment.

Back in To the training data

Validated learnings flow back into the next octa pretraining + fine-tune pass. The process feeds the model. The model feeds the process. Compound.

The lineage

Great compounding systems were never one algorithm.

The systems that defined search, research, and game-playing were not single models or single algorithms. They were infrastructures combining many, with outcomes feeding back into the inputs. octa² follows the same shape, applied to GTM.

Modern web search

Hundreds of weak signals, ranked continuously

No single signal decides. An ensemble combines many, the ranking re-tunes from click-through outcomes, and the index is re-scored continuously. The ranking infrastructure - not any single signal - is the moat.

Self-play game agents

Search + self-play + a learned model

Three algorithms feeding each other: tree search proposes moves, self-play generates new training positions, the network learns from the outcomes. Remove any one and the system collapses. The combination is what learns, not the network alone.

Online experimentation platforms

Continuous controlled tests, policy updates from outcomes

Variants compete for traffic. Outcomes are measured. Winners get more allocation per segment. The platform itself is the learner; any model inside is just a tool the platform uses to draft the variants.

octa²

Six GTM algorithms running around the octa model

Online experimentation, ensemble ranking, multi-model rotation, sequence policy learning, model distillation, and the macro analysis loop. Each loop validates inputs for the others. Validated learnings flow back into the corpus that retrains octa.

The six algorithms

Six loops running in parallel. Each one sharpens the others.

None of these are new ideas in isolation. The combination is the point. Online experimentation tunes the variables. Ensemble ranking decides priority. Multi-model rotation picks the output. Sequence policy decides the next step. Distillation drops the cost. The macro analysis loop watches the whole thing and adjusts. The infrastructure is the moat.

01 Online experimentation

Controlled experiments per segment. Variable locks. Hypothesis loops.

graph8 runs continuous controlled experiments on every campaign variable that matters - subject lines, send-times, cadence depth, channel mix, landing page layout, voice openers. Each experiment runs on a track (fast / medium / slow) based on volume. Each validated outcome records the variable, the segment, the effect size, the direction, the confidence, the sample size. Winners get locked per segment. The knowledge base accumulates. Future experiments are proposed against the prior learnings, not from scratch.

02 Ensemble signal ranking

Many weak signals, one ranked output, outcomes re-weight the signals.

Accounts to prospect, contacts inside accounts, intent signals worth firing on, sequence variants worth shipping - each one is scored by combining many weak signals into a single ranking. No single signal decides. Outcomes - opens, clicks, replies, meetings, closed-won - flow back and re-weight the signal contributions. The ranking sharpens continuously without any model retraining required.

03 Multi-model rotation

Generator. Critic. Editor. Roles rotate across models per task.

For every output that matters - a cold email, a reply draft, a landing page, a call script - multiple models compete. Roles rotate: Generator proposes, Critic attacks, Editor polishes. Open-source models, frontier models, octa-mini, octa, octa-reasoning all play. Deterministic verification decides the winner where possible; structured scoring decides where not. The winning output ships. The competition log feeds back into training.

04 Sequence policy learning

Each touch is a state. Next action is a learned policy.

A campaign is a sequence of states - cold contact, opened email, positive reply, meeting set, qualified, closed. From each state there are many possible next actions. octa² models the state-to-action policy per segment and updates it from real outcomes. The model picks the next-best action; the orchestration enforces guardrails; the realized outcome updates the policy. The next campaign starts from a sharper routing table.

05 Model distillation

Frontier teacher generates exemplars. Open-source students retrieve.

Frontier models generate canonical exemplars for hard GTM tasks. A nearest-neighbor retrieval pool serves those exemplars in-context to cheaper open-source students. Quality holds. Cost drops materially. The exemplar library grows over time, so even with static student weights, the output gets sharper as the pool deepens.

06 Macro analysis loop

Weekly + monthly reports close the human-in-the-loop.

A macro analysis service generates weekly and monthly reports across the whole platform - capacity, customer outcomes, algorithm performance, segment shifts. Humans review, the system course-corrects, the algorithms get re-weighted, the corpus gets retagged. The macro loop sits above everything else and keeps the whole system honest. The other five loops run hourly. This one runs weekly. Both are needed.

The compounding

Each algorithm's output is another algorithm's input.

Six loops, but the loops are connected. Validated learnings from online experimentation become new ranking signals. Winning outputs from multi-model rotation become teacher exemplars in distillation. Sparse or low-confidence states from the sequence policy become hypothesis candidates for the next experiment. The macro analysis loop re-tunes which algorithm runs for which task. Compounding is structural, not metaphor.

From
To
Online experimentation
->
Ensemble ranking
Ensemble ranking
->
Sequence policy
Multi-model rotation
->
Model distillation
Sequence policy
->
Online experimentation
Model distillation
->
Multi-model rotation
Macro analysis loop
->
All five above
Back into the model

The infrastructure feeds the next training run.

The day-to-day loop is the infrastructure. But the infrastructure does not stay sealed off from the model. Validated learnings, winning exemplars, surviving sequence policies, re-tuned rankings - all of it flows back into the corpus that retrains octa. Every model release sits on top of a richer, sharper, more-segmented training set than the one before it.

The model release cadence is weekly to monthly. The infrastructure cadence is hourly. The two cadences feed each other. That is octa².

The feedback pipeline

Capture. Label. Validate. Train. Ship.

The five steps that turn yesterday's campaigns into next week's model.

01

Capture

Every campaign outcome, every winning variant, every state-to-action transition, every distilled exemplar lands in the corpus.

02

Label

The infrastructure tags structure (segment, variable, channel, intent) without humans in the hot path. Humans review aggregates.

03

Validate

Held-out replay on octa Bench. If the new exemplar would have won historical campaigns, it survives.

04

Train

Surviving exemplars enter the next octa continued pre-training and long-horizon RL pass.

05

Ship

New octa weights deploy. The six algorithms now run with a sharper component. The next loop starts.

In production

What octa² is doing right now.

Live across every graph8 customer org. Every loop running on its own cadence. The combination is what compounds.

6 algorithms

Running in production today across every graph8 customer org.

Hourly

Cadence at which experiment results, rankings, and routing decisions update.

Weekly

Cadence at which macro analysis re-tunes the infrastructure.

See octa² at work

Bring your motion. Watch the infrastructure spin.

Talk to our team. We will show you the six algorithms running on one of your past campaigns - which variables locked, which model won which task, which signal got ranked up, which sequence policy fired - and what the next loop would have shipped.

Read about the model ->

octa² is the technical infrastructure around octa, the GTM foundation model. The two pages are a pair: this one tells the process story; the heritage page tells the model story. The model gets retrained on a release cadence. The infrastructure compounds every hour.