Every AI call. Routed, metered, redacted.

Six layers decide every request.
OwnIQ resolves the upstream provider and model through a layered chain. Each layer can short-circuit — and each decision stamps a human-readable reason onto the usage row, so “why did this go to that model” is never a mystery.
01
Virtual alias resolution
owniq/smart, owniq/reasoning, … resolve to a tier or capability
02
Advanced router rules
Conditional · load-balance · canary · cheapest — JSON rules per app
03
Complexity router
Deterministic prompt scorer picks cheap / standard / premium — no extra LLM call
04
Allowed-models whitelist
Per-app allowlist enforced after routing, not before
05
Data-residency gate
Region pins survive every routing rewrite
06
Vision-capability gate
Image inputs require a model that declares vision
Plus cost-aware routing: strategy “cheapest” picks the lowest blended-price model that meets your capability and context bar.
Apps ask for intent. OwnIQ picks the model.
Application code never hardcodes a model. It asks for an alias, and the gateway resolves it against the live catalog with health-aware fallbacks — a provider behind an open circuit breaker simply steps aside.
| Alias | Resolves to | Built for |
|---|---|---|
| owniq/smart | Premium tier | For the hardest reasoning — best available model first |
| owniq/standard | Standard tier | The everyday workhorse for app AI features |
| owniq/fast | Cheap tier, latency-first | Sub-second UX paths — autocomplete, hints, triage |
| owniq/cheap | Cheap tier, cost-first | Bulk jobs where unit cost wins |
| owniq/embed | Embeddings | Vector search, semantic cache, RAG pipelines |
| owniq/reasoning | Reasoning-capable models | Chain-of-thought workloads, agent planning |
| owniq/code | Pinned code models | Code generation and review tasks |
| owniq/vision | Vision-capable models | Screenshot reading, document and image inputs |
Eleven providers. One integration.
Every provider is wired through the same plumbing — same key management, same catalog shape, same routing layers. Adding one is a config entry, not a project.
| Provider | Region | Notes |
|---|---|---|
| OpenAI | us | GPT-4o family, o1, embeddings |
| Anthropic | us | Claude Opus / Sonnet / Haiku |
| Google Gemini | us | Gemini Pro / Flash |
| Perplexity | us | Sonar — search-grounded |
| Mistral | eu | Mistral Large, Codestral, embeddings |
| Cohere | us | Command-R family, embeddings |
| Groq | us | Llama / Mixtral / Gemma — fastest inference |
| DeepSeek | apac | Reasoning + code models |
| xAI | us | Grok family |
| OpenRouter | global | Long-tail model access |
| Ollama | local | Self-hosted — data never leaves your perimeter |
Data residency, enforced: pin an app to us / eu / apac / global / localand the gate runs after every routing rewrite, so the pin survives the whole decision chain. Self-hosted providers are always permitted — that data never leaves your perimeter. Unknown providers fail safe, not open.
Eight gates in front of every model.
| Guardrail | What it does | On failure |
|---|---|---|
| Moderation | Pre-call content moderation on user text | Blocks + audit row |
| Prompt firewall | Scores instruction-override, persona injection, and smuggling attempts | Blocks + audit row |
| PII redaction | Scrubs outbound message text before it reaches any provider | Rewrites — never blocks |
| Content deny-list | Per-app regex blocking of sensitive phrases | Blocks + audit row |
| Allowed endpoints | Restrict an app to a subset of the API — e.g. embeddings-only | 403 |
| Allowed models | Whitelist of resolved model ids, enforced after routing | 403 |
| IP allowlist | Source-IP CIDR allowlist per app | 403 |
| Vision gate | Image inputs require a vision-capable model | 400 with diagnostics |
Every block writes an audit row with the request id, app, rule, and the truncated input. Moderation fails open if its upstream is unreachable — inference keeps running when a sidecar is down.
Three caches. Three rate limits. Hard spend caps.
Caching
Idempotency replay for 24h, exact-match response cache for deterministic calls, and a per-app semantic cache that serves near-duplicate prompts from vectors — evaluated in that order.
Rate limits
Requests-per-minute and tokens-per-minute per app, plus a per-end-user limit so one noisy tenant inside a shared app can’t starve its peers.
Spend caps
Daily and monthly USD caps per app and per end-user, with webhooks at 50 / 80 / 100% so finance hears about it before the cap does the talking. Usage is reconciled against actual provider counts.
BYOK:any app can register its own provider keys — encrypted at rest — and the gateway forwards them upstream for pass-through billing. Every response is flagged so cost attribution can split the bill cleanly. Metering flows into OwnUsage as first-class usage events.
See every decision it makes.
Prometheus metrics per provider and model, distributed traces with W3C propagation spanning auth → rate-limit → route → upstream → cache → guardrail, and an append-only audit log of operator actions and policy decisions. The built-in Labs make the invisible visible:
Two environment variables. Any SDK.
The gateway speaks the OpenAI-compatible surface, so existing OpenAI and Anthropic SDK code points at OwnIQ without rewrites. First-party Python and TypeScript SDKs ship with the platform.
OWNIQ_GATEWAY_URL=https://app.own360.ai/iq OWNIQ_APP_KEY=owniq_sk_... # rotatable, hashed at rest
Between every app and every model.
OwnApps call OwnIQ for their in-context AI features (level 2 of the three access levels). OwnAgentsreason through it for every autonomous action (level 3) — permissions scoped by OwnCentral, every call in the audit log. Change the model behind the gateway and the whole platform changes with it; nothing else needs to know.