OyaAI Documentation

The deterministic runtime for AI Employees

Name: Oya
Author: Oya

AI agents quietly rewrite the values they pass around, run steps out of order, and re-bill every byte at token rates, because the model reads state it never needed to see. OyaAI runs AI Employees on a deterministic runtime that closes that root cause, so they do real operational work reliably, run after run, with the full execution on record.

What is OyaAI?

OyaAI is a runtime platform for AI Employees. You describe the role: an SDR that emails 300 leads a day, an executive assistant that triages your inbox, an SEO manager that audits sites every Monday. The platform assembles the agent: a soul (persona plus behavior rules), the skills it needs (web search, email, Slack, your CRM), the routines that run on a schedule, and the knowledge base it consults. Each AI Employee gets a chat surface, an OpenAI-compatible API key, webhooks, and integrations into wherever your work happens, and runs continuously in an isolated sandbox where every step is recorded and replayable.

A run's full execution trace in OyaAI — Every run is on record: the full LLM trace, tool calls, and sandbox output, replayable step by step.

The problem we are solving

Most AI agents today are prompt wrappers around an LLM. They demo well and break in production, because token-only systems are non-deterministic: the same input can take a different path, retries cascade, and small drift compounds across a long workflow. Three failure modes do the real damage:

State corruption: values get silently rewritten as they pass through the model. A URL's .io becomes .com, an ID's digits transpose, and the agent acts on the corrupted value with no exception raised.
Ordering drift: steps that must run in sequence (validate, then send) get interleaved or skipped, because the model chooses the next action from a partially-observed context.
Token waste: every intermediate value flows back through the model and is re-billed at token rates, even values it never needed to read. Cost scales linearly with workflow length.

All three share one root cause: the model read state it should never have read. OyaAI fixes it at the architecture level. A planner emits one typed plan; a deterministic runtime executes it and passes values from skill to skill by reference, so a value the agent does not need to read never re-enters the model, and cannot be corrupted, reordered around, or re-billed. Roughly 20% of work runs on tokens and the other 80% is structured, deterministic compute, and every run is recorded and replayable. That is the difference between a demo and an AI Employee that handles real load.

Same task, same model. The architecture is the only thing that changes.

What the benchmarks show

Measured against six published agent frameworks

100% vs 19-81%

Critical values preserved byte-for-byte: OyaAI vs a ReAct loop, across six models.

+15 / 15

Perfect asymmetric score on all three frontier models. Every published framework scored negative on at least two of three.

~4.7x

Fewer LLM tokens per task than the leanest token-loop baseline (4-5x on every model).

State preserved (higher is better)

% of critical values that survive byte-for-byte

OyaAI ReAct loop

LLM tokens per task (lower is better)

Total tokens billed per benchmark task

OyaAI ReAct loop

In the worst case, a standard ReAct agent passed all 15 benchmark tasks while silently corrupting ~81% of the critical identifiers. Passing tests is not the same as preserving state. Results are from our paper Plan, Don't React: Projection Types for LLM Agent Runtimes, evaluated on the open PlanBench benchmark (120 projection-annotated tasks) with the MIT-licensed oya-planner runtime, across GPT-5, Claude, Gemini, and GLM.

Why OyaAI

80% structured compute, 20% tokens

LLMs handle reasoning. Everything else (data fetching, schema validation, scheduling, integrations, file IO, retries) runs on deterministic infrastructure. Same input, same path, run after run. In our benchmark that comes out to ~4.7x fewer LLM tokens per task than a ReAct loop, because the runtime passes values by reference instead of re-reading them into the model.

State and order stay correct

The runtime executes a typed dataflow plan, so values cannot be rewritten in transit and steps cannot run out of sequence. The two failure modes that quietly corrupt agent output (read-induced edits and ordering drift) are handled by the platform, not by a reminder in the prompt. In testing, critical values survived byte-for-byte 100% of the time versus 19-81% under a ReAct loop.

Secrets never reach the model

Credentials and sensitive data live in the deterministic layer, not in the prompt. Skills authenticate inside an isolated sandbox; the model sees results, not API keys or raw PII. Because the planner never reads opaque values, a class of indirect prompt-injection payloads cannot reach it either. Data isolation is structural, not a reminder in the system prompt.

Skills are the deterministic primitives

Every integration is a versioned skill in a catalog: Python in an isolated sandbox with a typed input/output schema. Drop them into an agent like LEGO. When something is missing, write a 50-line skill, import it, and reuse it across every future agent.

One identity, many channels

The same AI Employee chats in your web app, replies on Slack, takes Telegram messages, fires on webhooks, and runs scheduled jobs. One memory, one personality, one API key.

No black box

Read every persona, behavior rule, skill, and routine an agent uses. Run history replays the full LLM trace, tool calls, and sandbox stdout for any run. Export any agent as a YAML spec and version it in git.

Getting started

Build your first AI Employee

Walk through the Agent Builder: describe the role, pick skills, connect platforms, deploy. Five steps end to end.

Or pick a pre-built AI Employee from the gallery in the app and customize what you need.

Pick your path

Three audiences, three focused manuals.

For builders

Product

Create, configure, and operate AI Employees. Agent builder, knowledge base, chat, channels, routines, and billing.

Agent Builder
Knowledge Base
Channels & Apps
Routines

Open Product

For engineers

Developers

Integrate OyaAI into your code. CLI, OpenAI-compatible API, skill authoring, MCP, webhooks, and GitHub versioning.

Claude Code CLI
API Keys
Custom skills
Triggers & webhooks

Open Developers

For agencies

Partners

Operate OyaAI as an agency. Multi-customer accounts, the `oya account` CLI, authoring and deploying templates to customers, and consolidated billing.

Agency mode
Account CLI
Authoring templates
Customer impersonation

Open Partners