Performance Analysis (Theoretical): 1-Minute Check-ins on 4 vCPU / 8 GB / SSD
These projections are based on benchmarked per-request costs from the codebase, not measured end-to-end under production load. Actual performance will vary with hardware, network conditions, policy complexity, and fleet composition.
This document analyzes Vigo server performance under 1-minute check-in intervals on modest hardware. All numbers reference benchmarked values from the codebase or are derived from measured operations.
See also: 30-second intervals | 15-second intervals
Hardware Assumptions
- 4 vCPU (modern x86_64, e.g., Hetzner CPX21, DigitalOcean c-4)
- 8 GB RAM
- SSD storage (NVMe or SATA SSD, ~50k random IOPS)
- 1 Gbps network
Architecture Recap
The check-in hot path is designed for zero synchronous database operations:
- FleetIndex: In-memory envoy index with O(1) lookups. Rehydrated from SQLite on startup, dirty state flushed every 10 seconds in 500-row batches.
- Policy cache: One cached bundle per match pattern (not per envoy). Cache hit: 17 ns. Miss triggers rebuild at ~3 us.
- No-change short-circuit: When policy hasn't changed and the envoy isn't force-pushed, the server returns a 100-byte "nothing changed" response in ~55 us total.
- Async writes: Last-seen timestamps, traits, and flag changes are batched and flushed to SQLite every 10 seconds, outside any hot-path lock.
Check-in Cost Breakdown
Per-Request Costs (Benchmarked)
| Operation | Cost | Notes |
|---|---|---|
| ED25519 signature verification | 52.5 us | Interceptor, non-negotiable |
| FleetIndex GetEnvoyLite | 0.36 us | RLock, shallow copy |
| FleetIndex GetPubKey | 0.15 us | Pre-parsed key, O(1) map |
| Policy version compare + flags | ~0.5 us | In-memory comparisons |
| TraitsBatcher hash-dedup | ~1 us | SHA-256 of traits JSON |
| MarkSeen (dirty flag) | 0.46 us | Queued for async flush |
| Total (no-change path) | ~55 us | 95%+ of check-ins |
| Operation | Cost | Notes |
|---|---|---|
| Config pattern match (cached) | <1 us | Glob cache hit after first lookup |
| Policy cache hit + stub customize | 0.3 us | RWMutex read lock |
| Policy cache miss (full rebuild) | ~3 us | Protobuf serialization |
| Bundle signing (ED25519) | ~100 us | Only on policy delivery |
| Total (full bundle path) | ~155 us | After config change |
Costs NOT on the Hot Path
| Operation | Frequency | Cost |
|---|---|---|
| TLS handshake (mTLS, TLS 1.3) | Once per connection (~every 15 min) | ~1-2 ms |
| SQLite batch UPDATE (last_seen) | Every 10 seconds | ~5-20 ms per 500-row chunk |
| Traits INSERT (changed only) | Every 10 seconds, hash-deduped | ~2-10 ms per 200-row chunk |
| FleetIndex flusher lock hold | Every 10 seconds | <5 ms (snapshot + release) |
| Go GC pause | ~every 2-5 seconds under load | <1 ms (Go 1.25 concurrent GC) |
Scaling Projections
Request Rate at 1-Minute Intervals
With 10% jitter (default), agents spread check-ins across a 54-66 second window (60s +/- 10%). This prevents thundering herd:
| Fleet Size | Avg RPS | Peak RPS (burst) | Notes |
|---|---|---|---|
| 100 | 1.7 | ~3 | Trivial |
| 1,000 | 17 | ~25 | Trivial |
| 5,000 | 83 | ~120 | Comfortable |
| 10,000 | 167 | ~250 | Comfortable |
| 25,000 | 417 | ~600 | Tuning recommended |
| 50,000 | 833 | ~1,200 | Streaming or spanner recommended |
CPU Usage
ED25519 verification dominates at 52.5 us/op. With full handler overhead (~55 us no-change, ~155 us full-bundle), the realistic per-check-in CPU cost including gRPC framing, protobuf deserialization, goroutine scheduling, and TLS record decryption is approximately 200-300 us for no-change and 500-800 us for full bundle delivery.
| Fleet Size | Avg RPS | CPU (no-change) | CPU (burst after config change) |
|---|---|---|---|
| 1,000 | 17 | <1% of 1 core | ~1% of 1 core |
| 5,000 | 83 | ~2% of 1 core | ~5% of 1 core |
| 10,000 | 167 | ~5% of 1 core | ~12% of 1 core |
| 25,000 | 417 | ~12% of 1 core | ~30% of 1 core |
| 50,000 | 833 | ~25% of 1 core | ~60% of 1 core |
Go's goroutine scheduler distributes work across all 4 cores. A single core saturates at roughly 3,000-5,000 no-change check-ins/sec or 1,200-2,000 full-bundle check-ins/sec. With 4 cores, theoretical throughput is 12,000-20,000 RPS before CPU saturation.
Config change storm: When config is published, all envoys receive a full bundle on their next check-in. For a 10,000-envoy fleet at 1-minute intervals, this means ~167 full-bundle responses per second sustained over ~60 seconds. CPU impact: ~12% of one core. Not a concern.
Memory Usage
| Component | 1,000 envoys | 10,000 envoys | 50,000 envoys |
|---|---|---|---|
| Go runtime + gRPC server | ~150 MiB | ~200 MiB | ~300 MiB |
| FleetIndex (envoy state) | 1.4 MiB | 14 MiB | 70 MiB |
| FleetIndex (inverted indexes) | 0.2 MiB | 1.6 MiB | 8 MiB |
| Policy cache | <1 MiB | <1 MiB | <1 MiB |
| gRPC buffers (32 KiB r+w per conn) | 64 MiB | 640 MiB | 3.2 GiB |
| Goroutine stacks (active handlers) | <1 MiB | ~1.5 MiB | ~8 MiB |
| SQLite page cache | 100-500 MiB | 100-500 MiB | 100-500 MiB |
| Total estimate | ~350 MiB | ~1.4 GiB | ~4.1 GiB |
The gRPC connection buffer is the largest memory consumer, not the FleetIndex. Each persistent mTLS connection allocates 64 KiB (32 KiB read + 32 KiB write) by default. At 10,000 connections this is 640 MiB.
Mitigation options for 25,000+ envoys:
- Reduce
tuning.grpc_read_bufferandtuning.grpc_write_bufferto 16 KiB (halves connection memory) - Set
tuning.max_connection_ageto force periodic reconnects, reducing idle connection count - Use adaptive stream promotion (default) so only active envoys hold streams
- Set
tuning.memory_limit: "6GiB"to cap Go heap and trigger earlier GC
Network Bandwidth
| Fleet Size | No-change traffic | Post-config-change burst |
|---|---|---|
| 1,000 | ~5 KB/s | ~500 KB/s (30s burst) |
| 5,000 | ~25 KB/s | ~2.5 MB/s (30s burst) |
| 10,000 | ~50 KB/s | ~5 MB/s (30s burst) |
| 50,000 | ~250 KB/s | ~25 MB/s (30s burst) |
Assumes 100-byte no-change response, 30 KiB average full bundle (with stub optimization reducing unchanged modules to ~100 bytes each). Even at 50,000 envoys, network is not the bottleneck on a 1 Gbps link.
SQLite Write Load
The flusher batches all writes into 10-second windows. Three flushers are staggered at 0s, 3.3s, and 6.6s offsets to avoid lock contention.
| Fleet Size | Dirty envoys per flush | Batch transactions per flush | SQLite write time |
|---|---|---|---|
| 1,000 | ~1,000 | 2 (500/chunk) | ~5 ms |
| 10,000 | ~10,000 | 20 | ~50 ms |
| 50,000 | ~50,000 | 100 | ~250 ms |
At 1-minute intervals, every envoy checks in within each 10-second flush window, so the full fleet is dirty each cycle. WAL mode allows concurrent reads during writes, so check-in handlers are never blocked by the flusher.
Traits writes are hash-deduplicated: if a envoy's traits JSON hasn't changed (SHA-256 match), no write occurs. For a stable fleet, 95%+ of check-ins skip the traits write entirely. Only environment changes (new package installed, IP change, etc.) trigger a traits INSERT.
Capacity Recommendations
Sweet Spot: Up to 10,000 Envoys
No tuning required. Default settings handle 10,000 envoys at 1-minute check-ins with:
- ~5% CPU utilization (sustained)
- ~1.4 GiB memory
- ~50 KB/s network
- 50 ms SQLite flush every 10 seconds
Comfortable: 10,000-25,000 Envoys
Add these tuning options to server.yaml:
tuning:
gogc: 200
memory_limit: "6GiB"
grpc_read_buffer: 16384
grpc_write_buffer: 16384
This reduces gRPC buffer memory by half and gives Go more heap room before GC kicks in.
Scaling Beyond: 25,000+ Envoys
Two options:
- Upgrade to 8 vCPU / 16 GB and apply tuning above. Handles ~50,000 envoys.
- Use spanner (hub-spoke) to shard the fleet across multiple spoke servers. Each spoke handles its own subset of envoys with independent FleetIndex and SQLite database. The hub aggregates compliance, routes enrollment, and fans out queries.
When to Use Streaming Instead of Polling
Adaptive stream promotion is already the default: agents poll via unary CheckIn() and only open persistent streams when the server has dispatched work (tasks, queries, workflows). For pure state enforcement (no orchestration), polling at 1-minute intervals is optimal — it avoids the 64 KiB per-connection memory overhead of idle streams.
If the fleet runs frequent ad-hoc tasks or live queries, streaming reduces latency from "up to 1 minute" (next poll) to instant dispatch.
Bottleneck Hierarchy
From most to least likely to saturate first on 4 vCPU / 8 GB:
- Memory (gRPC connection buffers) — 640 MiB at 10k connections. First constraint hit at ~25k envoys on 8 GB.
- CPU (ED25519 verification) — 52.5 us per check-in. Saturates one core at ~19k RPS. With 4 cores, theoretical limit ~76k RPS.
- SQLite flusher — 250 ms per flush at 50k envoys. Still well within the 10-second window.
- Network — 25 MB/s burst at 50k envoys after config change. Under 3% of 1 Gbps.
- Config pattern matching — Cached after first lookup. Negligible.
Key Design Properties
- Zero synchronous DB ops on check-in: FleetIndex serves all reads; writes are async.
- Policy cache scales by pattern count, not fleet size: 100 match patterns = 100 cache entries, regardless of whether 10 or 10,000 envoys match each one.
- No-change short-circuit: Dominates idle fleet behavior (95%+ of cycles), returning in ~55 us with a 100-byte response.
- Jitter prevents thundering herd: Default 10% jitter spreads 1-minute check-ins across a 12-second window. No synchronization between agents.
- Flusher lock is non-blocking: Snapshots dirty state in <5 ms, then performs DB writes outside the lock. Check-in handlers never wait on SQLite.
Confidential -- Alexander4, LLC. Not for redistribution. See documentation-license.