← ShowcaseFrontierProvocative / Expressive Maximalist / Authority / Resonant
View other archetypesView source brief ↗
Spec
Typefaces
Inter Tight, Inter, JetBrains Mono
Color tokens
9
Sections
6
Body words
~550
Voice
manifesto-adjacent, declarative about progress, careful about safety, present-tense

Frontier · Model release · v3.0

Atlas-3

A new generation of capable, considered models.

Atlas-3 is a frontier foundation model with capability gains across reasoning, coding, tool use, and long context. The technical report and the safety statement publish today, alongside the API.

Capabilities

Six places where Atlas-3 takes a step the field has not.

Each capability is paired with one number from the technical report. The full evaluation suite, methodology, and known failure modes are in the report.

Benchmark snapshot

A small selection. The full table is in the technical report.

Numbers are pass@1 unless otherwise noted. We publish the evaluation harness, the prompts, and the seeds. Reproduce, do not recite.

BenchmarkAtlas-3Atlas-2Delta
MMLU-Redux0.8920.864+2.8 pts
AGI-Eval-Bench0.8420.781+6.1 pts
SWE-Bench Verified0.7120.604+10.8 pts
Tau-Bench (retail)0.6110.523+8.8 pts
RULER NIAH-Multi @256K0.9610.907+5.4 pts

Read the full technical report for ablations, failure analyses, and the evaluation cards we wrote for each benchmark.

Safety statement

We publish what the model will not do, and why.

The safety statement is not a compliance afterthought. It is the second document we wrote, after the technical report. The third document was the API rate limits.

01

Refusal coverage

Atlas-3 declines categories of harm we have decided not to serve. The list is published, with examples, in the safety statement. We will not change that list quietly.

02

Pre-deployment red team

External red teams ran adversarial evaluation against Atlas-3 before the launch. Findings, mitigations, and residual risk are in the safety statement.

03

Misuse monitoring

API traffic is monitored for misuse patterns under our published policy. Persistent abuse triggers escalation, not a silent rate-limit. We document our actions.

04

What we still do not know

The safety statement names open questions on long-horizon agentic behavior, on jailbreak generalization, and on tool misuse. We do not pretend they are solved.

From the research team

A note on what shipped and what did not.

Atlas-3 ships with the capabilities our internal evaluations say it ships with, no more. Earlier in development we held back two demos that polled well in user studies but failed under adversarial probing. They are not in the launch. We will revisit them when the safety case is stronger.

We are publishing the technical report, the evaluation harness, and the safety statement on the same day as the API because we want the same audience to see all three. The model is a step. Steps need to be measurable.

· the research leads

API access

Join the waitlist. We bring partners on weekly.

The first wave is research labs, evaluation teams, and builders working on agentic systems. We open broader access in cohorts, with onboarding calls, after the first wave is steady.

We read every request. Tell us what you would build.