New Framework Boosts AI Performance by 2.5x Using ai optimization framework

Researchers from Renmin University of China and Microsoft Research introduced Arbor, a novel ai optimization framework that dramatically improves AI coding agents, which they claim outperforms models like Claude Code and Codex by 2.5 times. This powerful new system transforms how researchers approach AI-driven improvement, moving it from random guesswork into a structured, cumulative learning process. The framework specifically addresses the difficulties in complex engineering tasks, such as fixing AI agents that produce errors or miss key rules when running in production.

Illustration of arbor optimization framework beats claude

By Devon Blackwell | June 20, 2026 |

Researchers from Renmin University of China and Microsoft Research introduced Arbor, a smart AI improvement framework that greatly enhances AI coding agents, which they claim outperforms models like Claude Code and Codex by 2.5 times. This powerful new system changes how researchers approach AI-driven improvement, moving the process from random guessing into a structured, learning cycle. The framework specifically addresses the problems found in complex engineering tasks, such as fixing AI agents that produce mistakes or miss key rules when running in production.

Addressing the Bottleneck in AI Improvement

Because large language models are becoming more capable, many people expect that these systems will perform complex operations like autonomous system improvement. Autonomous improvement captures the basic research loop, which lets an AI agent improve a starting piece of code or data without needing constant human help. Current engineering teams often struggle because giving a coding agent more processing time rarely leads to better results, which the researchers found out. Co-author Jiajie Jin noted that long running automation is not true progress, because vague goals only produce unwanted improvements quickly.

An artist s illustration of artificial intelligence AI

Standard AI agent setups often fail because they lack a proper data structure to maintain the lessons from each attempt, so knowledge cannot build up. These existing systems usually treat every trial as separate, which causes the agent to repeat the same mistakes over and over again. Because typical coding agents rely on chat transcripts for their memory, they struggle to track factual evidence over long history periods. This issue means they often lose the overall structure of the research process, sometimes stalling on early failures or following misleading data swings.

Existing AI frameworks also tend toward reward hacking, which means they may show fake progress without creating real-world improvements. General-purpose coding agents typically link their tool calls onto one shared working area, which limits their ability to test different ideas in isolation. This architectural weakness makes it hard to know which specific idea caused a certain outcome, creating a major problem for complex tasks. The new ai improvement framework addresses these issues by building a disciplined structure for AI research.

How Arbor’s Hypothesis Tree Refinement Works

Arbor solves these improvement challenges through a framework that automates the long process of exploration, testing, and abstraction that mirrors human research. The system separates the high-level research strategy from the low-level coding tasks using two distinct parts. The Coordinator acts like a chief researcher, owning the general state of the improvement work without ever directly changing the target code.

Computer Artificial Intelligence on

The Coordinator observes all the collected evidence and then generates new ideas or directions for the system to explore, while deciding what to do with the results of those tests. The Executors are short lived, highly focused AI agents that run when the Coordinator wants to try an idea. When the Coordinator sends a hypothesis, it places the Executor into an isolated area, much like a fresh code project. These Executors then put the idea into practice, run tests, debug errors, and report back to the Coordinator with all the results.

These two parts work together through a process called Hypothesis Tree Refinement, which represents the entire research process as a persistent, branching tree structure. Every node in this tree combines a hypothesis, the actual working code, the factual evidence gathered, and a distilled insight about the findings. This structure allows the Coordinator to explore many competing directions at once, without losing track of where it started. If an Executor’s test fails, the tree records that failure as a negative rule, which stops the system from endlessly repeating that same error.

Arbor Ability Isolate Experiments Very

Arbor’s ability to isolate experiments is very important, because researchers claim it is necessary for accurate results in enterprise systems. For example, when improving a Retrieval-Augmented Generation pipeline, a single agent would change many things at once, which entangles the changes and makes attribution impossible. Arbor solves this by treating each change, such as chunking or the prompt, as a separate branch within the ai improvement framework. This gives clean attribution, showing exactly which specific change helped or harmed the performance.

To keep the system honest and prevent fake progress, the Hypothesis Tree Refinement enforces a strict “merge gate.” Even if an Executor reports a fantastic score on a development test, the Coordinator still must validate the results before accepting them. This careful check ensures that the system makes real-world improvements, not just fools the testing metrics. The use of this ai improvement framework represents a big step for making AI tools more dependable in real work environments. For related coverage, see AI coverage.

Home

Newsletter.

Join our newsletter for the latest in tech trends, deals and industry news.

New Framework Boosts AI Performance by 2.5x Using ai optimization framework

By Devon Blackwell | June 20, 2026 |

Addressing the Bottleneck in AI Improvement

How Arbor’s Hypothesis Tree Refinement Works

Arbor Ability Isolate Experiments Very

The Hypernetwork Solution Addresses Context Limitations in AI Agent Context

The Best AI Uses in the Medical Field: A Deep Dive into Groundbreaking Innovations

OrCam MyEye Smart AI Device: Empowering Independence for People with Visual Impairment

SPOTLIGHT

REVIEWS

TECH

COMPANY