For Ops & Platform

One collector. One pager. Zero drama.

Most observability tools feel like a second production system. Oculis doesn't. 50MB of RAM, runs as a single container or binary, and stays out of the way — so your AI observability doesn't become your next on-call nightmare.

“Great, one more thing to keep running. Make it small and don't page me.”
What You'll See

The three views ops teams actually use.

Built for the people who have to make AI run reliably in production.

01

Collector fleet status

See every Oculis collector running across your infrastructure — version, uptime, memory footprint, and last-ping. If a collector goes dark, you know in under 60 seconds.

  • Self-reporting health: collectors tell on themselves
  • 50MB RAM per collector, monitored continuously
  • Stale-collector alerts with escalation policies
Collector fleet 12 running · 1 stale
prod-k8s-1 v1.24.3 42 MB 14d
prod-k8s-2 v1.24.3 38 MB 14d
dev-ci-runner v1.24.3 28 MB 7d
laptop-sam v1.23.1 2h stale
+ 9 more · all healthy
02

Resource footprint over time

Track CPU, memory, and network usage of the collector fleet. Know before you complain: it uses less than your log shipper.

  • CPU: < 0.5% average per host
  • Memory: 50MB peak, 30MB average
  • Network: 100KB/min outbound, batched
Fleet resource usage · 24h
CPU 0.4% avg
Memory 34 MB avg · 48 peak
Network 94 KB/min batched outbound
03

Alerts wired to your pager

Route alerts to the tools your on-call team already uses. Slack, Teams, PagerDuty, and custom webhooks — with cooldowns, escalation, and deduplication built in.

  • Webhook with full run context — link from PagerDuty straight to Oculis
  • Cooldown and burst-protection prevent alert fatigue
  • Runbook URLs attach per alert type
Alert routing healthy
S
Slack · #alerts-ai
cost-spike, anomaly, failures
Active
P
PagerDuty · ai-oncall
CRITICAL burst + stale-collector
Active
W
Webhook · ops-automation
auto-restart on failure cluster
Active
What You Can Do

Observability without the operational cost.

Four outcomes Oculis delivers for operations and platform teams.

Deploy in one line

Docker container, systemd binary, or Helm chart. No agents to instrument, no SDKs to install.

Keep the footprint tiny

50MB RAM cap per collector. Less than your log shipper, less than your metrics agent, less than your IDE.

Recover automatically

Self-reporting collectors auto-restart on crash. Alert only fires if the restart loop fails — not on every hiccup.

Route to your existing tools

Slack, PagerDuty, Teams, and arbitrary webhooks. Don't learn another on-call system — use yours.

Ready?

Ship it. Forget it.

30-minute demo. We'll install the collector on one of your hosts and show you the fleet view before we even finish the call.