• Tech Support ⤴
  • Projects
  • Services
    • AI Development
    • UI/UX Design
    • Web Development
    • Technology Support
    • Mobile App Development
    • Banking ATM Interfaces
    • Process Automation
    • Security Auditing
    • Local AI Servers
  • odoo ERP
get in touchStart with Eva
logo
Tech Support ⤴
Projects
Services
AI DevelopmentUI/UX DesignWeb DevelopmentTechnology SupportMobile App DevelopmentBanking ATM InterfacesProcess AutomationSecurity AuditingLocal AI Servers
odoo ERP
get in touchStart with Eva
Loading…
logo

Transforming businesses through AI-powered digital innovation and creative excellence.

Quick Links

BlogAinexProjectsContact us

Contact Us

pinDubai Digital Park, A5, DTEC - Silicon Oasisemail[email protected]phone+971 55 7538087
© 2026 aratech. All rights reserved.
Privacy PolicyTerms of ServiceCookie Policy
Home / Blog / Ornith 1.0: The Open-Source AI Coding Model That Writes Its Own RL Scaffolds

Ornith 1.0: The Open-Source AI Coding Model That Writes Its Own RL Scaffolds

DeepReinforce's Ornith 1.0 introduces self-scaffolding LLMs for agentic coding — models that learn to write their own reinforcement learning harnesses. With a 397B MoE matching Claude Opus 4.7 on SWE-Bench and a 9B variant outperforming models 3x its size, this is a paradigm shift for open-source AI development.

June 26, 2026 - 12 min read

Key Takeaways

ExpandCollapse
  • - Ornith 1.0 is the first open-source model family that jointly learns to solve coding tasks AND write its own RL training scaffolds
  • - The 397B MoE variant scores 82.4 on SWE-Bench Verified and 77.5 on Terminal-Bench 2.1 — matching or exceeding Claude Opus 4.7
  • - The 9B dense model (43.1 TB-2.1, 69.4 SWE-Bench) outperforms Gemma 4-31B and Qwen 3.5-35B — models 3-4x its size
  • - All models released under MIT license with GGUF support for Ollama and local deployment
  • - Self-scaffolding represents a fundamental shift: the model evolves its own orchestration strategy rather than relying on human-designed harnesses
  • - Three-layer defense against reward hacking: fixed trust boundary, deterministic monitoring, and frozen LLM judge veto
Ornith 1.0 — Self-scaffolding AI coding model from DeepReinforce. YouTube video thumbnail featuring Sam Witteveen.

DeepReinforce just dropped something that changes the game for open-source AI coding. Ornith 1.0 isn't just another model release — it's a new paradigm for how AI agents learn to write code.

The headline: a fully open-source family of models (9B to 397B parameters, all MIT licensed) that teaches itself to write its own reinforcement learning scaffolds. The largest variant matches Claude Opus 4.7 on SWE-Bench Verified. The smallest 9B model outperforms Gemma 4-31B — a model 3x its size.

Let's break down what makes this release different.


What Is Ornith 1.0?

Ornith 1.0 is a family of self-improving open-source models purpose-built for agentic coding tasks, developed by DeepReinforce. It spans four sizes:

  • Ornith 1.0 9B Dense — Edge-deployable, runs on consumer hardware
  • Ornith 1.0 31B Dense — Balanced performance for workstation deployment
  • Ornith 1.0 35B MoE — Mixture-of-experts for efficient inference
  • Ornith 1.0 397B MoE — Frontier-scale, matching closed-source leaders

Built on pretrained Gemma 4 and Qwen 3.5 checkpoints, these models achieve state-of-the-art results among open-source models of comparable size across the major coding benchmarks.


The Core Innovation: Self-Scaffolding

Here's where it gets interesting. Every agentic coding system — whether it's Claude Code, Cursor, or an open-source agent — relies on a scaffold: the orchestration logic that structures how the model interacts with tools, manages context, retries on failure, and delivers a final solution.

Until now, scaffolds were hand-designed by humans. You write the harness, you define the tool-use protocol, you structure the error recovery. The model just fills in the code.

Ornith 1.0 flips this. Its training framework jointly optimizes the scaffold AND the solution. Each RL step works in two stages:

  1. Propose a refined scaffold — conditioned on the task and the scaffold previously used for it
  2. Generate a solution rollout — conditioned on that scaffold and the task description

Reward from the rollout propagates to both stages. The model isn't just learning to write better answers — it's learning to author the orchestration that elicits those answers.

Self-scaffolding training framework

Ornith's dual-stage RL loop: scaffold proposal and solution generation are jointly optimized, creating a feedback loop where the model continually improves its own orchestration strategy.

Sam Witteveen's deep dive on Ornith 1.0 puts it well — this isn't an incremental improvement. It's a structural shift from "train the solver" to "train the scaffold + solver together."


Benchmark Performance: Punching Well Above Weight

The numbers speak for themselves. Let's look at how Ornith stacks up against the competition.

Frontier Scale (397B MoE)

BenchmarkOrnith 1.0 397BClaude Opus 4.7DeepSeek-V4-ProMiniMax M3
Terminal-Bench 2.1 (Terminus-2)77.570.367.966.0
SWE-Bench Verified82.480.880.680.5
SWE-Bench Pro62.264.355.459.0
SWE-Bench Multilingual78.9—76.2—
NL2Repo48.2——42.1

Ornith 1.0 397B beats Claude Opus 4.7 on both Terminal-Bench 2.1 and SWE-Bench Verified, and leads DeepSeek-V4-Pro and MiniMax M3 across almost every metric.

397B Evaluation Results

Ornith 1.0 397B vs. leading frontier models — note the across-the-board leadership on agentic coding benchmarks.

Mid-Scale (35B MoE)

BenchmarkOrnith 1.0 35BQwen 3.5-35BQwen 3.6-35BGemma 4-31B
Terminal-Bench 2.164.241.452.542.1
SWE-Bench Verified75.670.073.452.0
SWE-Bench Pro50.444.649.535.7
NL2Repo34.620.529.415.5

The 35B variant doesn't just beat similarly sized models — it surpasses Qwen 3.5's 397B model on Terminal-Bench 2.1 (64.2 vs 53.5). That's a 10x parameter disadvantage overcome by smarter training.

35B Evaluation Results

Edge Scale (9B Dense)

BenchmarkOrnith 1.0 9BQwen 3.5-9BGemma 4-12BGemma 4-31B
Terminal-Bench 2.143.121.321.042.1
SWE-Bench Verified69.453.244.252.0
SWE-Bench Pro42.931.327.635.7

A 9B model beating a 31B model on SWE-Bench Verified? That's the power of self-scaffolding training. For teams that need local, private, offline code agents, this is a watershed moment.

9B Evaluation Results


How It Works: The Self-Improving Training Framework

The technical architecture is worth understanding because it hints at where the entire field is heading.

The Feedback Loop

Traditional RL for coding uses a fixed harness. You define how the model interacts with the terminal, how it reads files, how it runs tests — and the model optimizes its code output within those constraints. The harness never changes.

Ornith treats the harness as a learnable object. Over training iterations:

  1. The model proposes a scaffold for a given task category
  2. It generates a solution using that scaffold
  3. Reward from the solution propagates back to update both the solution policy AND the scaffold policy
  4. Better scaffolds lead to better solutions, which further refine scaffolds

This creates an autonomous capability flywheel — one that doesn't require human engineers to manually redesign the agent loop every time the model improves.

Defending Against Reward Hacking

Giving the model control over its own scaffold introduces an obvious risk: reward hacking. What stops it from learning to cheat the benchmarks rather than actually solving coding problems?

DeepReinforce implements a three-layer defense:

Layer 1: Fixed Trust Boundary. The environment, tool surface, and test isolation are immutable and outside the model's reach. The model can only evolve its inner policy scaffold — memory, error-handling, orchestration logic.

Layer 2: Deterministic Monitoring. A monitor enforces the boundary, flagging attempts to read withheld paths, modify verification scripts, or invoke actions outside the sanctioned tool surface. Zero reward for violations.

Layer 3: Frozen LLM Judge. Because intent-level gaming can happen within allowed tool surfaces, a frozen LLM acts as a veto on top of the verifier. If the judge detects gaming behavior even within valid tool usage, the trajectory gets penalized.

This three-layer approach is a reference architecture for anyone building self-improving agent systems.

Asynchronous RL at Scale

Training was done with a pipeline-RL strategy to handle the off-policy problem created by long agentic rollouts. A staleness weight downweights older tokens and drops them entirely once a threshold is exceeded. This lets the training scale to the long-horizon trajectories that agentic coding requires.


Why This Matters for Enterprise AI

Ornith 1.0 isn't just a research milestone — it has immediate practical implications.

1. Open Weights Change the Risk Calculus

All Ornith 1.0 checkpoints carry the MIT license. GGUF versions run in Ollama and Unsloth with no gatekeeping. For regulated industries (finance, healthcare, defense), this means:

  • Code never has to leave your infrastructure
  • You can audit and modify the agent behavior
  • No dependency on API pricing or availability
  • Custom fine-tuning for proprietary codebases is possible

2. The Workflow, Not Just the Model, Determines Outcomes

Ornith 1.0 proves that scaffold design is now a competitive differentiator. Two teams using the same base model can get wildly different results depending on their orchestration logic. The model that can evolve its own orchestration will pull ahead.

3. Capability Is Flowing Downstream

The 9B model's performance is arguably the most important signal here. It means agentic coding capability — once the domain of massive data center deployments — is becoming accessible on laptops and edge devices. Private, offline, real-time code assistance is now feasible.

4. The Open-Source Gap Is Closing

CategoryClaude Opus 4.7Ornith 1.0 397BGap
SWE-Bench Verified80.882.4+1.6
Terminal-Bench 2.170.377.5+7.2
SWE-Bench Pro64.362.2-2.1

The gap between best-in-class closed-source and open-source on agentic coding benchmarks is effectively zero. For many use cases, Ornith 1.0 already leads.


The Bottom Line

Ornith 1.0 is the most important open-source agentic coding release of 2026 so far. It validates a thesis that many in the AI community suspected but no one had proven at scale: jointly optimizing the scaffold and the solver produces better results than optimizing either in isolation.

For CTOs and engineering leaders evaluating their AI strategy, the implications are clear:

  • You can now run production-grade agentic coding entirely on your own infrastructure with open weights
  • The competitive advantage shifts from model access to orchestration design and custom tooling
  • Self-improving agents that evolve their own workflows are no longer theoretical — they're shipping now

At aratech, we're tracking this space closely. If you're evaluating how self-scaffolding models fit into your AI architecture or want to benchmark Ornith 1.0 against your private codebase, get in touch.

Watch Sam Witteveen's full breakdown of Ornith 1.0 on YouTube for a hands-on walkthrough of the models and their capabilities.

Table of Contents

  • ↗What Is Ornith 1.0?
  • ↗The Core Innovation: Self-Scaffolding
  • ↗Benchmark Performance: Punching Well Above Weight
  • ↗Frontier Scale (397B MoE)
  • ↗Mid-Scale (35B MoE)
  • ↗Edge Scale (9B Dense)
  • ↗How It Works: The Self-Improving Training Framework
  • ↗The Feedback Loop
  • ↗Defending Against Reward Hacking
  • ↗Asynchronous RL at Scale
  • ↗Why This Matters for Enterprise AI
  • ↗1. Open Weights Change the Risk Calculus
  • ↗2. The Workflow, Not Just the Model, Determines Outcomes
  • ↗3. Capability Is Flowing Downstream
  • ↗4. The Open-Source Gap Is Closing
  • ↗The Bottom Line

Related Posts

Futuristic robotic hand touching a digital network representing multi-agent AI systems

Multi-Agent Systems: The Enterprise AI Trend Redefining Operations in 2026

Gartner named multi-agent systems a top strategic trend for 2026. With 327% growth in enterprise adoption and predictions that 15% of daily decisions will be made autonomously by 2028, here's what CTOs need to know.

Necolas HamwiNecolas Hamwi
June 22, 2026 - 8 min read
OpenRouter Fusion API: Fable-Level AI at Half the Price (2026)

OpenRouter Fusion API: Fable-Level AI at Half the Price (2026)

With Anthropic's Fable 5 suspended under a US government directive, developers are scrambling for alternatives. Enter OpenRouter Fusion — a compound-model API that parallelizes frontier LLMs with a judge synthesizer, delivering near-Fable 5 performance at roughly half the cost. Here's how it works and when to use it.

Necolas HamwiNecolas Hamwi
June 15, 2026 - 6 min read
AI-powered e-commerce shopping experience

AI in E-Commerce: Applications, Challenges & What's Next for Online Retail

Artificial intelligence is transforming e-commerce at an unprecedented pace — from hyper-personalized product recommendations and AI-powered search to dynamic pricing and automated customer service. This comprehensive guide explores the key AI applications reshaping online retail, the real challenges businesses face during adoption, and what the future holds for AI in e-commerce.

Necolas HamwiNecolas Hamwi
June 14, 2026 - 14 min read