03 · OwnIQ · Sovereign model gateway

Every AI call. Routed, metered, redacted.

OwnIQ is the third layer of the stack: one gateway between everything that thinks and every model that answers. Apps call one OpenAI-compatible endpoint; OwnIQ decides the provider, enforces the guardrails, meters the spend, and writes the audit trail — per app, per tenant, per call.

See it live →How it’s licensed

11Providers behind one endpoint

8Routing aliases · 8 guardrails · 3 caches

100%Calls metered, attributed, and audited

0Prompts leaving your perimeter (local mode)

Inside OwnIQ · actual interface

The routing decision chain

Six layers decide every request.

OwnIQ resolves the upstream provider and model through a layered chain. Each layer can short-circuit — and each decision stamps a human-readable reason onto the usage row, so “why did this go to that model” is never a mystery.

01 Virtual alias resolution

owniq/smart, owniq/reasoning, … resolve to a tier or capability

02 Advanced router rules

Conditional · load-balance · canary · cheapest — JSON rules per app

03 Complexity router

Deterministic prompt scorer picks cheap / standard / premium — no extra LLM call

04 Allowed-models whitelist

Per-app allowlist enforced after routing, not before

05 Data-residency gate

Region pins survive every routing rewrite

06 Vision-capability gate

Image inputs require a model that declares vision

Plus cost-aware routing: strategy “cheapest” picks the lowest blended-price model that meets your capability and context bar.

Virtual aliases

Apps ask for intent. OwnIQ picks the model.

Application code never hardcodes a model. It asks for an alias, and the gateway resolves it against the live catalog with health-aware fallbacks — a provider behind an open circuit breaker simply steps aside.

Alias	Resolves to	Built for
owniq/smart	Premium tier	For the hardest reasoning — best available model first
owniq/standard	Standard tier	The everyday workhorse for app AI features
owniq/fast	Cheap tier, latency-first	Sub-second UX paths — autocomplete, hints, triage
owniq/cheap	Cheap tier, cost-first	Bulk jobs where unit cost wins
owniq/embed	Embeddings	Vector search, semantic cache, RAG pipelines
owniq/reasoning	Reasoning-capable models	Chain-of-thought workloads, agent planning
owniq/code	Pinned code models	Code generation and review tasks
owniq/vision	Vision-capable models	Screenshot reading, document and image inputs

The provider catalog

Eleven providers. One integration.

Every provider is wired through the same plumbing — same key management, same catalog shape, same routing layers. Adding one is a config entry, not a project.

Provider	Region	Notes
OpenAI	us	GPT-4o family, o1, embeddings
Anthropic	us	Claude Opus / Sonnet / Haiku
Google Gemini	us	Gemini Pro / Flash
Perplexity	us	Sonar — search-grounded
Mistral	eu	Mistral Large, Codestral, embeddings
Cohere	us	Command-R family, embeddings
Groq	us	Llama / Mixtral / Gemma — fastest inference
DeepSeek	apac	Reasoning + code models
xAI	us	Grok family
OpenRouter	global	Long-tail model access
Ollama	local	Self-hosted — data never leaves your perimeter

Data residency, enforced: pin an app to us / eu / apac / global / localand the gate runs after every routing rewrite, so the pin survives the whole decision chain. Self-hosted providers are always permitted — that data never leaves your perimeter. Unknown providers fail safe, not open.

Guardrails

Eight gates in front of every model.

Guardrail	What it does	On failure
Moderation	Pre-call content moderation on user text	Blocks + audit row
Prompt firewall	Scores instruction-override, persona injection, and smuggling attempts	Blocks + audit row
PII redaction	Scrubs outbound message text before it reaches any provider	Rewrites — never blocks
Content deny-list	Per-app regex blocking of sensitive phrases	Blocks + audit row
Allowed endpoints	Restrict an app to a subset of the API — e.g. embeddings-only	403
Allowed models	Whitelist of resolved model ids, enforced after routing	403
IP allowlist	Source-IP CIDR allowlist per app	403
Vision gate	Image inputs require a vision-capable model	400 with diagnostics

Every block writes an audit row with the request id, app, rule, and the truncated input. Moderation fails open if its upstream is unreachable — inference keeps running when a sidecar is down.

Cost control

Three caches. Three rate limits. Hard spend caps.

Caching

Idempotency replay for 24h, exact-match response cache for deterministic calls, and a per-app semantic cache that serves near-duplicate prompts from vectors — evaluated in that order.

Rate limits

Requests-per-minute and tokens-per-minute per app, plus a per-end-user limit so one noisy tenant inside a shared app can’t starve its peers.

Spend caps

Daily and monthly USD caps per app and per end-user, with webhooks at 50 / 80 / 100% so finance hears about it before the cap does the talking. Usage is reconciled against actual provider counts.

BYOK:any app can register its own provider keys — encrypted at rest — and the gateway forwards them upstream for pass-through billing. Every response is flagged so cost attribution can split the bill cleanly. Metering flows into OwnUsage as first-class usage events.

Observability & the Labs

See every decision it makes.

Prometheus metrics per provider and model, distributed traces with W3C propagation spanning auth → rate-limit → route → upstream → cache → guardrail, and an append-only audit log of operator actions and policy decisions. The built-in Labs make the invisible visible:

PlaygroundRouter TraceCost CompareFallback ChainPrompt CacheModerationStreaming RaceStructured OutputTool UseVision

Integration

Two environment variables. Any SDK.

The gateway speaks the OpenAI-compatible surface, so existing OpenAI and Anthropic SDK code points at OwnIQ without rewrites. First-party Python and TypeScript SDKs ship with the platform.

OWNIQ_GATEWAY_URL=https://app.own360.ai/iq
OWNIQ_APP_KEY=owniq_sk_...        # rotatable, hashed at rest

Where it sits

Between every app and every model.

OwnApps call OwnIQ for their in-context AI features (level 2 of the three access levels). OwnAgentsreason through it for every autonomous action (level 3) — permissions scoped by OwnCentral, every call in the audit log. Change the model behind the gateway and the whole platform changes with it; nothing else needs to know.

Get sandbox access →The four layers

Every AI call. Routed, metered, redacted.

Six layers decide every request.

01

Virtual alias resolution

02

Advanced router rules

03

Complexity router

04

Allowed-models whitelist

05

Data-residency gate

06

Vision-capability gate