Skip to main content
Benchmarks

AgentBay benchmark hub

We publish the buyer-facing benchmark and the internal search benchmark side by side so the numbers are inspectable.

Public
LOCOMO

LOCOMO is the public benchmark for long-context conversational memory. It is the benchmark memory-literate buyers already know.

Judge accuracy 2.8%
Exact match 3.2%
Latency p95 134.0ms
Internal
Memory search

This is the internal synthetic benchmark we use to track precision, recall, and latency across labeled memory-search scenarios.

Precision@5 81.8%
Recall@5 81.8%
Latency p95 10.0ms