OpenAI compatible API. Attested gateway. Public status.

Chasing Mythos-level Fusion in the open

A live engineering note on the first frontier Fusion attempt: what ran, what failed, and why we are not claiming a benchmark win yet.

Verify gateway
1 URLbase_url migration
100smodels and routes
0prompt logs by default

Chasing Mythos-level Fusion in the open

2026-06-14

Source context: Open Fusion methodology.

We tried to push TrustedRouter Fusion toward Mythos and Fable-class DRACO performance. The current target panel is GPT-5.5, Claude Opus 4.8, Kimi K2.7 Code, GLM 5.1, MiniMax M3, Gemini 3 Flash, and Gemini 3.1 Pro, with Opus 4.8 synthesizing the final answer and Gemini 3.1 Pro judging against DRACO criteria.

That exact run is not publishable yet. The main blocker is GPT-5.5 long-reasoning behavior on DRACO prompts: it can spend the completion budget on reasoning and return no usable answer. GLM 5.2 is not enabled for the current Z.AI account yet, so the reproducible run uses GLM 5.1 until a direct GLM 5.2 smoke passes.

What actually ran

RunTask sliceResultStatus
Current 7-model targetNon-financial DRACO pilotNo scoreWaiting on GPT-5.5 long-reasoning handling
Available 6-model fallbackFirst completed non-financial DRACO task19.85Completed, far below target

The first fallback panel used Opus 4.8, Kimi K2.7 Code, GLM 5.1, MiniMax M3, Gemini 3 Flash, and Gemini 3.1 Pro. It completed one task before the pilot was stopped for speed and reliability. A score of 19.85 is not close to the target, and we are not presenting it as a win.

What changed in the harness

  • GPT-5.5 eval calls now omit temperature and use max_completion_tokens.
  • Panel and final synthesis calls stream so long answers do not wait for full completion before parsing.
  • Analysis and judge calls stay non-streaming because they require structured JSON reliability.
  • The live runner now has explicit six-model and seven-model frontier Fusion configs behind a hard budget.
  • The recommended DRACO slice for this experiment is --task-filter non-financial.

Next gates

The next clean run needs two fixes before any headline claim: make GPT-5.5 long-reasoning responses produce useful content through the attested gateway, and finish a 10-task non-financial DRACO pilot without task-level hangs. GLM 5.2 can replace GLM 5.1 later when Z.AI enables it for the account.

This is the point of doing the work in the open. If TrustedRouter clears a Mythos/Fable-class target, the result should be reproducible from code, model ids, task filters, budget limits, and artifacts. Until then, the honest result is: not there yet.

Sign in

Choose a sign in method.