Back to blog
6 min readGateco Team

Under 25ms p95: How Gateco Keeps Policy Enforcement Fast Across 12 Vector DBs

The latency question comes up in almost every evaluation: "If Gateco sits between my application and my vector DB, how much does it slow things down?" The answer is <25ms p95 policy overhead — measured separately from the vector DB latency itself. The end-to-end p95, including the vector DB round-trip, is <50ms for cloud-hosted connectors under standard load. Here is how we get there and what drives the variance.

What "policy overhead" means

Policy overhead is the latency Gateco adds to a retrieval request that does not include the vector DB round-trip. It covers: principal resolution (looking up the principal's attributes from the local cache), policy evaluation (evaluating all matching policies against the principal and each returned chunk), audit record write (appending the decision to the audit log), and response assembly (filtering the result set and formatting the response). The vector DB latency is a separate dimension — Gateco does not change how fast your vector DB executes a query.

Principal resolution: the fast path

The most expensive part of policy evaluation is principal resolution — looking up a principal's groups, department, and attributes from the identity provider. Gateco solves this with a local principal cache that is populated and updated by IDP sync. When a query arrives, principal resolution is a local lookup (in-memory or local DB), not an outbound IDP API call. A full IDP sync adds principal data to the local store; the retrieval path reads it. This means even for organizations with complex IDP graphs, the per-query resolution is sub-millisecond.

Policy evaluation: linear in policy count

Policy evaluation time is linear in the number of active policies and the number of returned chunks. For a typical deployment with 5-15 active policies and top_k=10, evaluation takes 1-3ms. The most expensive single operation in policy evaluation is a ReBAC lookup — checking whether a principal has a named relation to a resource requires a database read (with a 60-second result cache). First-query latency for ReBAC is 2-5ms; cached lookups are sub-millisecond. Classification ceiling checks and RBAC group membership checks are purely in-memory and add negligible overhead.

Variance by connector

The vector DB round-trip is the largest source of end-to-end latency variance. Postgres-family connectors (pgvector, Supabase, Neon) on the same cloud region typically add 5-15ms for a vector query. Pinecone's managed API adds 15-30ms. Qdrant's managed cloud adds 10-25ms. Azure AI Search and Google Vertex AI Search add 20-50ms depending on index size and query complexity. OpenSearch adds 15-40ms. The policy overhead — the <25ms p95 claim — is consistent across all connectors because it is measured before and after the connector adapter call, not including the adapter's own latency.

What happens when policy evaluation is slow

The fail-closed guarantee means that if the policy engine encounters an error — a database connection failure, a timeout, an unexpected exception — the retrieval is denied and the decision is logged as `decision=error_deny`. There is no "policy evaluation timed out, allow anyway" path. The circuit breaker (5 errors in 30 seconds, half-open after 2 minutes) prevents a degraded policy engine from adding indefinite latency to every request. If the circuit is open, requests are denied immediately with a circuit-breaker reason in the audit log — which is a better outcome than a slow deny.

The latency data above is from production measurements. If your deployment shows higher policy overhead, the most common causes are: excessive active policies (>50), large top_k values (>25), or ReBAC policies against un-cached relations. The [docs performance page](/docs/performance) has the full per-connector methodology and measurement setup.


Ready to secure your AI retrieval?

Start with the free tier — 100 retrievals/month, no credit card required.