octa². The infrastructure around octa.
The model is one component. The infrastructure is what compounds. octa² combines six algorithms running in parallel - online experimentation, ensemble signal ranking, multi-model rotation, sequence policy learning, model distillation, and the macro analysis loop. Each loop validates inputs for the others. Validated learnings flow back into the corpus that retrains octa. Outcomes improve every campaign even between model releases.
The model gets retrained on a release cadence. The process gets sharper every hour.
Most "AI for sales" pitches hand-wave at "the model keeps learning." It doesn't, not in real time. Model weights change when training runs land. What changes continuously is the orchestration around the model: which variant ships, which signal gets ranked, which contact gets routed, which exemplar gets retrieved, which experiment gets greenlit. That is octa². The model is one component. The process is the moat.
octa retrains on a release cadence: continued pre-training plus long-horizon RL on the corpus. Weights change weekly to monthly, not per campaign.
Online experimentation, ensemble signal ranking, multi-model rotation, sequence policy learning, model distillation, and the macro analysis loop. Each one runs its own loop at its own cadence.
Every send, reply, meeting, landing-page click, deal stage transition. The infrastructure turns the raw stream into validated learnings per segment.
Validated learnings flow back into the next octa pretraining + fine-tune pass. The process feeds the model. The model feeds the process. Compound.
Great compounding systems were never one algorithm.
The systems that defined search, research, and game-playing were not single models or single algorithms. They were infrastructures combining many, with outcomes feeding back into the inputs. octa² follows the same shape, applied to GTM.
Hundreds of weak signals, ranked continuously
No single signal decides. An ensemble combines many, the ranking re-tunes from click-through outcomes, and the index is re-scored continuously. The ranking infrastructure - not any single signal - is the moat.
Search + self-play + a learned model
Three algorithms feeding each other: tree search proposes moves, self-play generates new training positions, the network learns from the outcomes. Remove any one and the system collapses. The combination is what learns, not the network alone.
Continuous controlled tests, policy updates from outcomes
Variants compete for traffic. Outcomes are measured. Winners get more allocation per segment. The platform itself is the learner; any model inside is just a tool the platform uses to draft the variants.
Six GTM algorithms running around the octa model
Online experimentation, ensemble ranking, multi-model rotation, sequence policy learning, model distillation, and the macro analysis loop. Each loop validates inputs for the others. Validated learnings flow back into the corpus that retrains octa.
Six loops running in parallel. Each one sharpens the others.
None of these are new ideas in isolation. The combination is the point. Online experimentation tunes the variables. Ensemble ranking decides priority. Multi-model rotation picks the output. Sequence policy decides the next step. Distillation drops the cost. The macro analysis loop watches the whole thing and adjusts. The infrastructure is the moat.
Controlled experiments per segment. Variable locks. Hypothesis loops.
graph8 runs continuous controlled experiments on every campaign variable that matters - subject lines, send-times, cadence depth, channel mix, landing page layout, voice openers. Each experiment runs on a track (fast / medium / slow) based on volume. Each validated outcome records the variable, the segment, the effect size, the direction, the confidence, the sample size. Winners get locked per segment. The knowledge base accumulates. Future experiments are proposed against the prior learnings, not from scratch.
Many weak signals, one ranked output, outcomes re-weight the signals.
Accounts to prospect, contacts inside accounts, intent signals worth firing on, sequence variants worth shipping - each one is scored by combining many weak signals into a single ranking. No single signal decides. Outcomes - opens, clicks, replies, meetings, closed-won - flow back and re-weight the signal contributions. The ranking sharpens continuously without any model retraining required.
Generator. Critic. Editor. Roles rotate across models per task.
For every output that matters - a cold email, a reply draft, a landing page, a call script - multiple models compete. Roles rotate: Generator proposes, Critic attacks, Editor polishes. Open-source models, frontier models, octa-mini, octa, octa-reasoning all play. Deterministic verification decides the winner where possible; structured scoring decides where not. The winning output ships. The competition log feeds back into training.
Each touch is a state. Next action is a learned policy.
A campaign is a sequence of states - cold contact, opened email, positive reply, meeting set, qualified, closed. From each state there are many possible next actions. octa² models the state-to-action policy per segment and updates it from real outcomes. The model picks the next-best action; the orchestration enforces guardrails; the realized outcome updates the policy. The next campaign starts from a sharper routing table.
Frontier teacher generates exemplars. Open-source students retrieve.
Frontier models generate canonical exemplars for hard GTM tasks. A nearest-neighbor retrieval pool serves those exemplars in-context to cheaper open-source students. Quality holds. Cost drops materially. The exemplar library grows over time, so even with static student weights, the output gets sharper as the pool deepens.
Weekly + monthly reports close the human-in-the-loop.
A macro analysis service generates weekly and monthly reports across the whole platform - capacity, customer outcomes, algorithm performance, segment shifts. Humans review, the system course-corrects, the algorithms get re-weighted, the corpus gets retagged. The macro loop sits above everything else and keeps the whole system honest. The other five loops run hourly. This one runs weekly. Both are needed.
Each algorithm's output is another algorithm's input.
Six loops, but the loops are connected. Validated learnings from online experimentation become new ranking signals. Winning outputs from multi-model rotation become teacher exemplars in distillation. Sparse or low-confidence states from the sequence policy become hypothesis candidates for the next experiment. The macro analysis loop re-tunes which algorithm runs for which task. Compounding is structural, not metaphor.
The infrastructure feeds the next training run.
The day-to-day loop is the infrastructure. But the infrastructure does not stay sealed off from the model. Validated learnings, winning exemplars, surviving sequence policies, re-tuned rankings - all of it flows back into the corpus that retrains octa. Every model release sits on top of a richer, sharper, more-segmented training set than the one before it.
The model release cadence is weekly to monthly. The infrastructure cadence is hourly. The two cadences feed each other. That is octa².
Capture. Label. Validate. Train. Ship.
The five steps that turn yesterday's campaigns into next week's model.
Capture
Every campaign outcome, every winning variant, every state-to-action transition, every distilled exemplar lands in the corpus.
Label
The infrastructure tags structure (segment, variable, channel, intent) without humans in the hot path. Humans review aggregates.
Validate
Held-out replay on octa Bench. If the new exemplar would have won historical campaigns, it survives.
Train
Surviving exemplars enter the next octa continued pre-training and long-horizon RL pass.
Ship
New octa weights deploy. The six algorithms now run with a sharper component. The next loop starts.
What octa² is doing right now.
Live across every graph8 customer org. Every loop running on its own cadence. The combination is what compounds.
Running in production today across every graph8 customer org.
Cadence at which experiment results, rankings, and routing decisions update.
Cadence at which macro analysis re-tunes the infrastructure.
Bring your motion. Watch the infrastructure spin.
Talk to our team. We will show you the six algorithms running on one of your past campaigns - which variables locked, which model won which task, which signal got ranked up, which sequence policy fired - and what the next loop would have shipped.
octa² is the technical infrastructure around octa, the GTM foundation model. The two pages are a pair: this one tells the process story; the heritage page tells the model story. The model gets retrained on a release cadence. The infrastructure compounds every hour.