AI agents now perform business tasks with increasing independence, creating excitement across many companies. However, technology leaders still worry about granting these automated systems access to sensitive enterprise systems. This caution arises because current industry standards do not fully measure AI reliability. Bryan Silverthorn, who directs Amazon’s AGI Autonomy research lab, suggests that existing metrics, like EVAL scores, only give a static view of performance. These scores fail to capture true consistency across different inputs, environments, and prompts. Developing trustworthy ai agents requires moving past simple performance numbers toward understanding system behavior. Organizations must prioritize building trustworthy ai agents that operate safely and reliably within strict boundaries.
Amazon’s research lab shifts its focus away from simply measuring raw performance benchmarks. Instead, the organization builds a structured framework centered on four core concepts: consistency, robustness, reliability, and safety. Silverthorn discussed this framework during his session at VB Transform 2026. This approach acknowledges that building models alone cannot guarantee safety without careful design. The framework guides companies toward creating trustworthy ai agents that function within secure limits.

Why Current AI Metrics Fall Short
Industry standards often use static scores to measure an AI’s performance on one specific task. Silverthorn suggests these metrics do not truly capture overall reliability across all situations. A system might score highly on a test set, but it may not stay consistent when facing real-world data changes. Companies genuinely need an AI system that maintains steady behavior across varied conditions. For IT leaders, consistent behavior directly relates to financial risk and stable operations.
The worries about unchecked AI access spread among senior technology leaders. VentureBeat conducted a Q2 Pulse Research survey involving over 100 senior leaders and buyers. The survey showed only 4% of professionals felt safe relying solely on model safety limits. This low number shows a major gap between AI capability and organizational trust. When asked about their primary concerns, 40% cited unauthorized data or tool access. Another 27% worried about prompt manipulation or injection attacks.
These results clearly show that relying only on internal model safety features is insufficient for high-stakes work. Companies require external, checkable controls to manage the risks associated with powerful autonomous agents. Pursuing trustworthy ai agents demands that organizations change how they view risk management.
Engineering Trust in Trustworthy AI Agents
Amazon’s approach actively avoids the idea that models become safe through design alone. Instead, the organization emphasizes using decoupled systems for agent work. These systems involve sandboxed environments where the AI agent suggests changes to the system. Crucially, a human must review and approve these suggested changes before they become active. This strategy directly addresses the risk of unauthorized access mentioned in the industry survey.
This method prioritizes checkable interactions, which becomes especially vital in areas like finance. In finance, an AI agent could cause immense damage if it malfunctions or misuses its functions. By requiring human oversight at the point of action, Amazon’s framework builds a strong layer of assurance. It shifts the focus from merely what the AI does to how the AI suggests and performs actions. This sandboxed architecture offers several benefits for businesses automating critical processes:
Human review gates govern all proposed system changes. The agent stays isolated within a controlled digital space. Clear logging tracks all agent actions and proposed changes. Structured feedback loops allow for immediate safety checks.

These controls help close the trust gap between AI potential and corporate adoption.
How Companies Move Toward Multi-Tool Architectures
At VB Transform 2026, Silverthorn will share details on Amazon’s method for creating trustworthy agentic AI. He will explain how companies transition from using simple single-agent wrappers to deploying complex multi-tool architectures. These advanced systems allow agents to correct their own mistakes during the operational process. This ability to autonomously fix errors, while staying within human-defined safety limits, marks a major technical shift.
Moving to multi-tool architecture means the agent does not depend on one single large model for every task. Instead, it uses various specialized tools, each designed for a specific function, such as data retrieval, computation, or external API calls. This component-based approach increases the system’s overall reliability and consistency. If one tool fails or provides poor data, the agent can switch to another tool or flag the issue for human review. This movement toward component-based design directly counters prompt injection and unauthorized data access risks. If an agent only receives permission to use a highly specific, sandboxed tool, the chance of it accessing unauthorized enterprise data shrinks considerably. This design discipline makes the concept of trustworthy ai agents possible. For related coverage, see AI coverage.
