OpenAI compatible API. Attested gateway. Public status.

Nebius Token Factory

Nebius Token Factory models on TrustedRouter with prices, routes, policy notes, and source links.

Verify gateway

1 URLbase_url migration

100smodels and routes

0prompt logs by default

`nebius`

No logs

All providers

Provider	Nebius Token Factory
Models	20 public models
Prepaid routes	18
BYOK routes	20
Zero data retention	yes
Confidential compute	not claimed
Provider E2EE	not claimed
Policy note	Marked ZDR via TrustedRouter's arrangement — Nebius RETAINS inputs/outputs by default (for speculative decoding); zero retention is an opt-in control, which the deployed Nebius account has enabled. Nebius does not train on customer data. Policy source

Measured performance

320 samples

Continuously sampled across Nebius Token Factory's routed models — p50 TTFT, throughput, and success rate. Unsupported route and probe-configuration rows are separated from provider downtime. No prompt or output content stored.

p50 TTFT	—
Throughput	—
Uptime	0.00%

Model	p50 TTFT	p50 TTFB	Throughput	Uptime	Config excluded	Samples
Qwen/Qwen3-30B-A3B-Instruct-2507	—	—	—	0.00%	—	17
google/gemma-3-27b-it	—	—	—	0.00%	—	16
Qwen/Qwen3.5-397B-A17B	—	—	—	0.00%	—	17
Qwen/Qwen3-Next-80B-A3B-Thinking	—	—	—	0.00%	—	19
MiniMaxAI/MiniMax-M2.5	—	—	—	0.00%	—	18
NousResearch/Hermes-4-405B	—	—	—	0.00%	—	18
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B	—	—	—	0.00%	—	26
nvidia/nemotron-3-super-120b-a12b	—	—	—	0.00%	—	12
Qwen/Qwen3-32B	—	—	—	0.00%	—	17
NousResearch/Hermes-4-70B	—	—	—	0.00%	—	22
deepseek-ai/DeepSeek-V4-Pro	—	—	—	0.00%	—	21
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1	—	—	—	0.00%	—	16
openai/gpt-oss-120b	—	—	—	0.00%	—	20
nvidia/Nemotron-3-Nano-Omni	—	—	—	0.00%	—	23
Qwen/Qwen2.5-VL-72B-Instruct	—	—	—	0.00%	—	19
Qwen/Qwen3-235B-A22B-Instruct-2507	—	—	—	0.00%	—	13
zai-org/GLM-5.1	—	—	—	0.00%	—	10
meta-llama/Llama-3.3-70B-Instruct	—	—	—	0.00%	—	16

Full provider & model leaderboard.

Provider models

Models served by Nebius Token Factory.

Each row links to pricing, provider, benchmark, and API pages for the model.

Model	Context	Endpoints	Prompt	Completion	Routes
`MiniMaxAI/MiniMax-M2.5` MiniMax M2.5 benchmarks performance api	204,800	2	$0.33/1M	$1.32/1M	prepaid BYOK
`NousResearch/Hermes-4-405B` Hermes 4 405B benchmarks performance api	131,072	2	$1.1/1M	$3.3/1M	prepaid BYOK
`NousResearch/Hermes-4-70B` Hermes 4 70B benchmarks performance api	131,072	2	$0.143/1M	$0.44/1M	prepaid BYOK
`Qwen/Qwen2.5-VL-72B-Instruct` Qwen2.5 VL 72B Instruct benchmarks performance api	32,768	2	$0.22/1M	$0.77/1M	prepaid BYOK
`Qwen/Qwen3-235B-A22B-Instruct-2507` Qwen3 235B A22B Instruct 2507 benchmarks performance api	131,072	2	$0.22/1M	$0.66/1M	prepaid BYOK
`Qwen/Qwen3-30B-A3B-Instruct-2507` Qwen3 30B A3B Instruct 2507 benchmarks performance api	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`Qwen/Qwen3-32B` Qwen3 32B benchmarks performance api	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`Qwen/Qwen3-Next-80B-A3B-Thinking` Qwen3 Next 80B A3B Thinking benchmarks performance api	131,072	2	$0.165/1M	$1.65/1M	prepaid BYOK
`Qwen/Qwen3.5-397B-A17B` Qwen3.5 397B A17B benchmarks performance api	262,144	2	$0.66/1M	$3.96/1M	prepaid BYOK
`deepseek-ai/DeepSeek-V4-Pro` DeepSeek V4 Pro benchmarks performance api	1,048,576	2	$1.859/1M	$3.718/1M	prepaid BYOK
`google/gemma-2-2b-it` gemma 2 2b it benchmarks performance api	8,192	1	$0.022/1M	$0.066/1M	BYOK
`google/gemma-3-27b-it` Google: Gemma 3 27B benchmarks performance api	131,072	2	$0.1309/1M	$0.22/1M	prepaid BYOK
`meta-llama/Llama-3.3-70B-Instruct` Llama 3.3 70B Instruct benchmarks performance api	131,072	2	$0.143/1M	$0.44/1M	prepaid BYOK
`meta-llama/Meta-Llama-3.1-8B-Instruct` Meta Llama 3.1 8B Instruct benchmarks performance api	128,000	1	$0.022/1M	$0.066/1M	BYOK
`nvidia/Llama-3_1-Nemotron-Ultra-253B-v1` Llama 3_1 Nemotron Ultra 253B v1 benchmarks performance api	128,000	2	$0.66/1M	$1.98/1M	prepaid BYOK
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B` NVIDIA Nemotron 3 Nano 30B A3B benchmarks performance api	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`nvidia/Nemotron-3-Nano-Omni` Nemotron 3 Nano Omni benchmarks performance api	131,072	2	$0.165/1M	$0.495/1M	prepaid BYOK
`nvidia/nemotron-3-super-120b-a12b` nemotron 3 super 120b a12b benchmarks performance api	131,072	2	$0.66/1M	$1.98/1M	prepaid BYOK
`openai/gpt-oss-120b` OpenAI: gpt-oss-120b benchmarks performance api	131,072	2	$0.165/1M	$0.66/1M	prepaid BYOK
`zai-org/GLM-5.1` GLM 5.1 benchmarks performance api	204,800	2	$1.54/1M	$4.84/1M	prepaid BYOK