Releasing soon Vigo is in alpha and closing in on its first stable release. Expect breaking changes between releases until then — we're looking for testing partners with meaningful fleets across diverse architectures. Learn more →

title: server.yaml Reference

server.yaml Reference

Complete reference for all server.yaml configuration sections. The server looks for server.yaml in the current working directory. Override with VIGO_CONFIG_FILE=/path/to/server.yaml.

Fields that accept secret:path are resolved through the secrets backend at startup.

server

Ports and TLS configuration.

server:
  grpc_listen: ":1530"              # gRPC port for agent check-ins
  api_listen: ":8443"               # REST API + web UI port
  hostname: "myserver.example.com"  # server's enrolled envoy hostname
  tls:
    cert_file: "tls/cert"           # server TLS certificate
    key_file: "tls/key"             # server TLS private key
    ca_file: "tls/ca"               # certificate authority (verifies agents)
    ca_key_file: "tls/ca_key"       # CA private key (enables bootstrap CSR signing)
  tls_sans:                         # extra SANs for auto-generated certs
    - "192.168.1.2"                 # host LAN IP (needed in Docker)
    - "vigo.example.com"            # DNS name agents use
Field Default Description
grpc_listen :1530 gRPC listen address for agent communication (mTLS)
api_listen :8443 REST API, web UI, and metrics listen address (TLS)
hostname (auto) Server's enrolled envoy hostname, and the address agents are told to dial (GET /bootstrap advertises it). Set this when running in Docker — the container's own hostname is its container ID, not a name agents can reach. It is automatically added to the auto-generated server cert's SANs, so you do not also need to list it under tls_sans.
tls.cert_file tls/cert Path to server TLS certificate
tls.key_file tls/key Path to server TLS private key
tls.ca_file tls/ca Path to CA certificate for mTLS verification
tls.ca_key_file (optional) CA private key — enables bootstrap enrollment and CSR signing
tls_sans (empty) Additional Subject Alternative Names for auto-generated certs. localhost, the host's interface IPs, and server.hostname are already included automatically — list only names/IPs beyond those.

database

Storage backend. SQLite only — postgres is rejected at startup in this release.

database:
  dsn: "secret:vigo/db/dsn"         # SQLite file path
  driver: sqlite3                    # sqlite3 (default) — only supported driver
  max_open_conns: 0                  # read-pool size (0 = auto: CPU count, min 4)
  retention: "30d"                   # auto-prune old runs/task runs/workflows
Field Default Description
dsn (required) SQLite database file path. Supports secret: prefix
driver sqlite3 Database driver. Only sqlite3 is supported; the server rejects postgres at startup (the Postgres migration set is incomplete).
max_open_conns 0 Maximum concurrent database connections. SQLite runs two pools: writes always serialize on a dedicated single-writer connection, and this knob sizes the concurrent read pool — so it caps reader concurrency and bounds the CGO worker threads the driver pins per in-flight query. An unbounded pool is what lets a write storm exhaust the process's OS threads. 0 = auto (CPU count, min 4).
retention 30d Auto-prune data older than this duration. Applies to runs (cascades run_results), task_runs, workflow_runs, file_snapshots, convergence_history, compliance_history. audit_entries is never trimmed (tamper chain).

secrets

How secret: references in config values are resolved.

# Local backend (default) — encrypted files on disk
secrets:
  backend: local
  key_file: "/srv/vigo/master.key"   # AES-256-GCM key (0600); omit to auto-generate
  secrets_dir: "/srv/vigo/secrets"   # encrypted secret files

# Isopass backend — external secrets API
secrets:
  backend: isopass
  url: "https://isopass.internal:8443"
  token_file: "/srv/vigo/isopass-token"
  tls_skip_verify: false
Field Default Description
backend local local or isopass
key_file (auto-generated) AES-256-GCM encryption key for local backend
secrets_dir /srv/vigo/secrets Directory for encrypted secret files
url (required for isopass) Isopass API base URL
token_file (required for isopass) Bearer token file path (0600)
tls_skip_verify false Isopass: skip TLS verification (dev only)

checkin

Agent polling behavior.

checkin:
  interval: "1s"                     # agent poll frequency (default 1s; sets the fleet cadence)
  jitter_percent: 20                 # randomize timing by 0-N%
  bundle_max_age: "24h"              # signed bundle TTL on agents (0 = forever)
  max_concurrent: 0                  # cap on concurrent CheckIn handlers (0 = CPU-scaled default; always on)
  report_max_concurrent: 0           # cap on concurrent ReportResult handlers (0 = built-in 2048; always on)
  max_connections: 0                 # cap on concurrently-held agent streams (0 = auto-derive safe ceiling from host RAM; -1 = uncapped)
  stream_signature: all              # held-stream per-message verification: all (default) | handshake-only (security relaxation)
Field Default Description
interval 1s How often agents check in. The server pushes this to agents, so it sets the effective fleet poll cadence. The staleness threshold is not a fixed multiple of this — it's 2.5 × max(interval, observed-cycle), floored at 30s, so a fast cadence still tolerates a few missed beats (see offline_threshold to override)
jitter_percent 20 Random jitter to prevent thundering herd
bundle_max_age 24h How long agents trust their cached policy bundle
max_concurrent 0 Cap on concurrent CheckIn handlers. The cap gates the policy-bundle build (CPU-heavy) inside the handler, so a fleet-wide stale-checkin wave (post-publish, mass re-enroll) sheds via ResourceExhausted → agent backoff instead of CPU-cascading. Always on0 = a CPU-scaled built-in (max(128, GOMAXPROCS*16); GOMAXPROCS tracks the container's cgroup CPU limit on Go 1.25+), not disabled. Higher admits more before shedding (more CPU thrash under a wave); lower sheds sooner. Watch vigo_checkins_total{status="shed"}
report_max_concurrent 0 Cap on concurrent ReportResult handlers in front of the single SQLite writer. Overflow returns ResourceExhausted so the agent backs off. Always on0 = the built-in 2048, not disabled (unlike max_concurrent). Higher admits more before shedding (deeper queue, higher tail latency under sustained overload); lower sheds sooner (tighter tail). Watch vigo_reportresults_total{status="shed"}
max_connections 0 Cap on concurrently-held agent streams. A connected envoy costs ~623 KB of live server heap with a realistic ~150 KB inventory (FleetIndex + inventory cache + the held stream; see the sizing guide), so connection count — not request rate — is what drives the server toward its memory ceiling. Past this cap, new streams are refused with ResourceExhausted (the agent backs off and retries) so the server sheds gracefully instead of climbing toward the wall. 0 (default) = auto-derive a cap from host RAM; a positive value sets an explicit cap; -1 disables it (legacy uncapped). Caveat: the auto-derived cap is built from a connection-buffer estimate (~220 KB) and is optimistic against the real ~623 KB inventory-aware cost — on a memory-bound box it can sit above the actual safe ceiling (e.g. it auto-derives ~59,746 on a 32 GiB host, but the measured safe ceiling there is ~30,000). Don't treat the cap alone as your limit — size from the sizing guide and set an explicit value if needed. Watch vigo_streams_active against the effective value, and vigo_checkins_total{status="stream_shed"} for rejections
stream_signature all Per-message ed25519 verification on held agent streams. all (default) verifies every check-in/report, exactly as a unary request. handshake-only verifies the handshake plus any proxied (foreign-envoy) message, but skips re-verifying the stream owner's own subsequent check-ins — relying on the verified handshake + mTLS session integrity (the same handshake-binding model Delta already uses). Per-message verify continuously proves possession of the envoy's ed25519 key on a held stream. Independently, since 0.76.35 the server pins each envoy's mTLS client cert: the verify path rejects any connection whose TLS peer-cert SubjectPublicKeyInfo ≠ the SPKI captured at enrollment (cert_spki_sha256), so a TLS-terminating proxy/MITM presenting a different CA-signed cert — even one whose CN/SAN matches the hostname — is refused before any check-in is accepted. That closes the relay/forgery path for the realistic threat in all modes, which is what makes handshake-only safe by construction here rather than a trust-the-operator relaxation. Unary requests are always verified regardless. Saves ~0.10 ms/checkin (~13% of gated per-checkin CPU; measured 0.76.29) — the largest remaining steady-state app cost once persist is gated (ADR-033 Option C). The gRPC path must be direct or L4 pass-through mTLS — a TLS-terminating proxy is rejected by the cert pin (its cert isn't the enrolled one): vigo_sig_verify_failed_total{reason="cert_pin_mismatch"} increments and a throttled WARN names the envoy. So the relaxation is safe by construction rather than relying on the operator to avoid proxies. Grandfathered envoys enrolled before 0.76.35 carry no pin and skip enforcement until they re-enroll (a once-per-envoy WARN names them). The only residual the pin can't catch is an attacker who stole the envoy's cert private key but not its ed25519 key — same host filesystem, so unrealistic; closing even that would need RFC 5705 channel-binding on the signature (tracked in pending). Prefer raising interval or sharding via spanner first. When set, the server logs a startup WARN, records a security.stream_signature_relaxed audit event, and exposes vigo_checkin_signature_mode{mode="handshake-only"}; a detected proxy increments vigo_checkin_proxy_detected_total and logs a WARN once per process

all does not close the proxy gap for Delta — direct end-to-end mTLS is the content-integrity boundary in every mode. The proxy heuristic above runs in all modes (not just handshake-only): any held stream whose TLS peer-cert CN/SAN ≠ the envoy hostname increments vigo_checkin_proxy_detected_total and logs a WARN once per process. This matters because Delta run-result events (the lightweight convergence-report path) carry no per-message signature in any mode — they ride the same handshake-binding model, so the ed25519 layer proves authorship of the handshake, not the content of each Delta. all re-verifies every check-in, but it does not sign Delta, so a TLS-terminating proxy on the gRPC path can forge an envoy's convergence/run-status reports even under the default. The integrity guarantee for Delta content is direct agent↔server mTLS end-to-end, and since 0.76.35 the cert pin enforces it: a terminating proxy (an L7 LB or service mesh on :1530) presents a non-enrolled cert and is rejected, so it can no longer establish the stream to forge Delta — closing this for pinned (post-0.76.35-enrolled) envoys. Grandfathered envoys stay on the warn-only model until they re-enroll. (Cryptographic per-Delta content integrity that survives a terminating proxy would be a separate signed-convergence-evidence feature, not a stream_signature setting.)

Sizing max_concurrent

The cap defends against the cascading-overload regime: when offered load exceeds the server's serving rate, queued requests pile up faster than they drain, the client ticker re-fires on top, and throughput collapses (measured: 70 req/s on a 5K-capable host at 10K offered). Bounding concurrency caps throughput at max_concurrent / latency ≈ the host's actual capacity, gracefully degrading rather than collapsing.

Watch vigo_grpc_checkin_in_flight as the pre-cascade signal — a sustained climb under burst is the leading edge of trouble. Size the cap to roughly your host's measured req/s capacity × p99 latency in seconds (e.g. a 5K req/s host with 100 ms p99 → max_concurrent: 500).

Default 0 resolves to a CPU-scaled built-in (max(128, GOMAXPROCS*16), where GOMAXPROCS tracks the container's cgroup CPU limit on Go 1.25+) — the cap is always on. Operators tune the knob; the burstwave test (2026-05-29) showed that an unbounded read path will CPU-cascade under a fleet-wide stale-checkin wave (post-publish, mass re-enroll), so the always-on default is the safety net.

For per-host envoy capacity (how many envoys one vigosrv host can comfortably hold at a given check-in interval, and when to federate via spanner instead of scaling up), see How-to: Size a vigosrv host.

bootstrap

Agent enrollment configuration. Requires server.tls.ca_key_file.

bootstrap:
  cert_validity: "8760h"             # agent cert lifetime (default: 1 year)
  trusted_enrollment:                # token-free enrollment by pattern + source IP
    - pattern: "*"
      cidrs: ["192.168.0.0/16", "10.0.0.0/8"]
Field Default Description
cert_validity 8760h Lifetime of agent TLS certificates
trusted_enrollment (empty) Patterns + CIDRs for token-free enrollment

auth

Web UI and REST API authentication.

# Basic auth (default)
auth:
  method: basic
  session_idle_timeout: "15m"

# OIDC (OpenID Connect SSO)
auth:
  method: oidc
  oidc:
    issuer: "https://accounts.google.com"
    client_id: "your-client-id"
    client_secret: "secret:vigo/auth/oidc_client_secret"
    redirect_url: "https://localhost:8443/auth/callback"
    scopes: [openid, profile, email]
    # Optional: provision exactly one identity as admin on its first login.
    # Leave both unset (default) and all new OIDC users are viewers — grant
    # admin explicitly with `vigocli webusers set-role --role admin`.
    bootstrap_admin_email: "admin@example.com"

# Disable auth (development only)
auth:
  method: none
Field Default Description
method basic basic, oidc, isowebauth, or none
session_idle_timeout 15m Idle timeout before session expires
oidc.bootstrap_admin_email (none) Email of the OIDC identity to provision as admin on first login. Empty disables auto-admin.
oidc.bootstrap_admin_subject (none) OIDC sub claim of the identity to provision as admin (use when email is not stable).

smtp

Email notifications. Disabled when host is empty.

smtp:
  host: "smtp.example.com"
  port: 587
  from: "vigo@example.com"
  username: "vigo"
  password: "secret:vigo/smtp/password"
  tls: true
  recipients: ["admin@example.com"]
  events:
    drift.detected: {}
    run.failure:
      recipients: ["ops@example.com"]
    convergence.threshold:
      threshold: 90
    security.rootkit: {}
  digest:
    interval: "1h"
    events: ["run.success", "drift.detected"]
  evidence:
    schedule: "weekly"                # "weekly" or "monthly"
    recipients: ["compliance@example.com"]
Field Default Description
host (disabled) SMTP relay hostname
port 587 SMTP port
from (required) Sender email address
tls true Use STARTTLS
recipients (required) Default recipient list
events (all enabled) Per-event toggles and recipient overrides
digest.interval (disabled) Batch window for digest emails
evidence.schedule (disabled) weekly or monthly evidence delivery

integrations

External platform integrations. Each forwards events asynchronously. All API keys support secret: prefix. Empty events list = forward all events.

Alerting

integrations:
  slack:
    enabled: true
    webhook_url: "https://hooks.slack.com/services/T.../B.../xxx"
    events: ["drift.detected", "run.failure"]

  pagerduty:
    enabled: true
    routing_key: "secret:vigo/pagerduty/routing-key"
    events: ["security.rootkit", "run.failure"]

  opsgenie:
    enabled: true
    api_key: "secret:vigo/opsgenie/api-key"
    events: ["security.rootkit", "convergence.threshold"]

  teams:
    enabled: true
    webhook_url: "https://outlook.office.com/webhook/..."
    events: ["drift.detected", "run.failure"]

PSA / RMM

integrations:
  connectwise:
    enabled: true
    url: "https://api-na.myconnectwise.net/v4_6_release/apis/3.0"
    company_id: "mycompany"
    public_key: "publickey"
    private_key: "secret:vigo/connectwise/private-key"
    board_id: 1
    events: ["drift.detected", "run.failure", "envoy.stale"]

  autotask:
    enabled: true
    url: "https://webservices.autotask.net/ATServicesRest/v1.0"
    username: "apiuser"
    password: "secret:vigo/autotask/password"
    queue_id: 8
    events: ["drift.detected", "run.failure"]

SIEM

integrations:
  splunk:
    enabled: true
    url: "https://splunk.example.com:8088"
    token: "secret:vigo/splunk/hec-token"
    index: "vigo"
    source: "vigo"
    events: []

  elastic:
    enabled: true
    url: "https://elasticsearch.example.com:9200"
    index: "vigo-events"
    api_key: "secret:vigo/elastic/api-key"
    events: []

  datadog:
    enabled: true
    api_key: "secret:vigo/datadog/api-key"
    site: "datadoghq.com"
    events: []

  loki:
    enabled: true
    url: "https://logs-prod-us-central1.grafana.net"
    tenant_id: "123456"               # X-Scope-OrgID for multi-tenant Loki; omit for single-tenant
    username: "123456"                # basic-auth username (Grafana Cloud: instance ID)
    password: "secret:vigo/loki/password"
    # bearer_token: "secret:vigo/loki/token"  # alternative to basic-auth
    events: []

Loki receives events as labeled log streams. Labels stay low-cardinality (service=vigo, severity, event_name) per Loki best practice — high-cardinality fields (hostname, envoy_id) ride in the JSON log line body. Audit-trail rows arrive as audit.<eventType> events alongside envoy.offline / run.failure / security.* etc., so an operator can query the audit chain in LogQL: {service="vigo", event_name=~"audit\\..*"}.

CMDB

integrations:
  servicenow:
    enabled: true
    instance: "mycompany.service-now.com"
    username: "vigo-integration"
    password: "secret:vigo/servicenow/password"
    events: ["drift.detected", "run.failure"]

grc

Push compliance evidence to GRC platforms on a schedule.

grc:
  integrations:
    - name: vanta
      enabled: true
      endpoint: "https://api.vanta.com/v1/evidence"
      api_key: "secret:vigo/grc/vanta-api-key"
      push_interval: "6h"
      frameworks:
        hipaa: "hipaa-2013"
        soc2: "soc2-2017"
Field Default Description
name (required) Integration name for logging
endpoint (required) API URL to POST evidence to
api_key (required) Bearer token. Supports secret: prefix
push_interval 6h How often to push evidence
frameworks (required) Vigo standard key → GRC platform framework ID mapping

spanner

Peer-equal control-plane federation (ADR-026). See Set up Spanner for the walkthrough.

# A spanner bolt — a peer-equal vigosrv that owns a hostname-pattern
# partition and shares the admissions roster with every other bolt via
# CRDT gossip. Every bolt's config has the same shape: one operator
# founds the spanner with `vigocli spanner init`, the rest join with
# `vigocli spanner join`.
spanner:
  mode: spanner
  spanner_id: "alexander4-prod"
  patterns: ["*.us-west.*"]
  snapshot:
    tier1_interval: 30s
    tier2_interval: 5m
  transport:
    multicast_group: "224.0.0.45"
    multicast_port: 1534
  fallback_admin: local
Field Default Description
mode standalone standalone (single vigosrv) or spanner (a peer-equal bolt). The pre-Phase-3 founder/joiner values are retired — the loader hard-errors on them.
spanner_id (required when mode = spanner) Federation identifier — operator-chosen, immutable, DNS-safe (1-64 ASCII alphanumeric + - _ ., no leading dot). Must match across every bolt in the spanner; embedded in every signed admission row.
patterns (required when mode = spanner) Hostname glob patterns this bolt owns. First-match-wins across the spanner; loader refuses overlap with another bolt's patterns at publish time + boot time.
transport.multicast_group 224.0.0.45 IPv4 multicast group for Tier-1 counter gossip.
transport.multicast_port 1534 UDP port for Tier-1 counter gossip.
snapshot.tier1_interval 30s Tier-1 counter-gossip cadence.
snapshot.tier2_interval 5m Tier-2 roster-snapshot cadence.
fallback_admin local Fleet-wide admin-read source: local aggregates ListEnvoys / GetConvergenceSummary from the gossip-backed observability cache (every bolt's signed /fleet-snapshot); fanout queries every roster bolt's admin gRPC live and merges. Both produce identically-shaped responses. ListRuns is unaffected by this switch — runs are never gossiped, so a per-envoy query always routes to the owning bolt and a global runs list always fans out to every bolt.

Peer discovery is gossip-driven — there is no bootstrap_peers seed list. A joining bolt learns the roster through the vigocli spanner join admission ceremony, and the CRDT-replicated roster is the peer list from then on. Bolts are identified by their Ed25519 pubkey (server/bolt/identity.wrapped), not a YAML-configured ID.

Bolt identity

Each spanner-mode vigosrv generates a per-host Ed25519 keypair at /srv/vigo/bolt/identity.wrapped on first boot (HKDF purpose vigo-bolt-wrap-v1, mirrors the service-account puddle pattern). The bolt signs its admission row and every gossiped roster snapshot with this key; peers verify signatures end-to-end and pin each bolt's pubkey at admission time. vigocli spanner status shows the local bolt's first-16-char pubkey for operator cross-check.

The retired hub-spoke keys (role, bolt_id, hub_addr, hub_fallback_addrs, bolts, auto_failover, transport.bootstrap_peers) and the retired mode values founder/joiner are rejected at startup with a one-line migration pointer.

peers

Primary/peer server redundancy for HA.

peers:
  role: primary
  servers:
    - addr: "peer1.example.com:1530"
      pubkey: "ed25519:<peer1-pubkey-hex>"
    - addr: "peer2.example.com:1530"
      pubkey: "ed25519:<peer2-pubkey-hex>"
Field Default Description
role primary primary or peer
servers (empty) List of peer servers
servers[].addr (required) Peer gRPC address (host:port)
servers[].sync_interval 10s Push interval to this peer
servers[].pubkey (required) Peer's service-account (puddle) Ed25519 key, from its startup puddle service-account identity ready log line. Authorizes HA peer RPCs (SyncStream/Promote/Status); replication fails closed without it. Local secrets backend only. Configure the same peers block on every node.

license

License enforcement behavior.

license:
  mode: "local"                      # "local" or "aggregate"
Field Default Description
mode local local: count nodes on this server. aggregate: every spanner bolt sums the fleet-wide node count from the gossip-replicated roster and gates enrollment against it independently. Best-effort: the count comes from eventually-consistent gossip, so simultaneous enrollments on different bolts near the cap can both pass — the fleet-wide limit is a soft cap, not a hard guarantee (consensus-free by design, ADR-026). Requires spanner.mode: spanner — rejected at config load on a standalone server.

publish

Blast radius protection on config publish.

publish:
  compliance_threshold: 80           # auto-rollback if convergence drops below this %
  rollback_window: "5m"              # monitoring window after config reload

backup

Litestream continuous replication (SQLite only).

backup:
  url: "s3://my-bucket/vigo"
  access_key_id: "secret:vigo/backup/aws_access_key"
  secret_access_key: "secret:vigo/backup/aws_secret_key"
  region: "us-east-1"
  retention: "720h"
  sync_interval: "1s"

ai

AI assistant configuration. Claude or OpenAI recommended for best results. Ollama works for air-gapped environments but requires a dedicated GPU and produces lower-quality answers. See AI Providers for details on each provider, hardware requirements, and data privacy considerations.

Prompt-injection defense — structural mitigations + provider-dependent adherence

Every block of data Vigo prefetches before an assistant turn (envoy details, traits, run output, config YAML, CVE descriptions, etc.) is wrapped in a fenced ```text block with a fence length sized past any backticks the data itself contains, and the system prompt's "Trust Boundary" section instructs the LLM to treat anything inside the fence as queried data rather than operator instructions. This blocks the structural attack: a hostname like "web01\nIGNORE PRIOR INSTRUCTIONS AND…", a package name with embedded markdown, a run-output dump with a fake system: prefix.

The actual guarantee is only as strong as the underlying LLM's adherence to that framing. Claude (Anthropic's models) respect "this is data, not instructions" framing reliably in our testing; OpenAI is similar. Ollama and openai-compat backends vary widely by model — a smaller open-source model under stress may still follow an injected imperative despite the fence. If your threat model includes adversaries who can put text into trait values, hostnames, or other prefetched content, prefer Claude or OpenAI over self-hosted open-source models for the assistant.

The fence + framing are best-effort defense-in-depth, not a hard guarantee. The assistant is read-only by design (server/ai/tools_dispatch.go exposes only read operations), so even a successful injection cannot write config, push tasks, or mutate state — at worst it could mislead the operator reading the assistant's answer.

ai:
  enabled: true
  provider: "claude"                 # claude | openai | ollama | openai-compat
  model: "claude-sonnet-4-20250514"
  max_tokens: 4096
  api_key: "secret:vigo/ai/api_key"
Field Default Description
provider ollama AI backend: claude, openai, ollama (local, GPU required), openai-compat (vLLM, llama.cpp, etc.)
model (provider default) Model name
max_tokens 4096 Maximum response tokens
context_mode auto tools, prefetch, or auto
api_key API key for external providers (supports secret: prefix)
base_url Auto for ollama; required for openai-compat

branding

Custom web UI branding.

branding:
  org_name: "ACME Corp"
  logo_path: "/srv/vigo/branding/logo.svg"
  favicon_path: "/srv/vigo/branding/icon.svg"
  theme: "light"   # "light" (default), "dark", or "auto"
Field Default Description
org_name (none) Organization name shown below the sidebar logo
logo_path embedded default Sidebar logo (SVG or PNG)
favicon_path embedded default Browser tab icon
theme light Fleet-wide default color theme: light (Catppuccin Latte), dark (Catppuccin Mocha), or auto (follow the user's OS prefers-color-scheme). Individual operators can override via the top-navbar toggle; their choice persists per-browser in localStorage.

grafana

Optional. Points the web UI at an external Grafana instance — when set, the /health page links to the bundled Vigo dashboards. Vigo neither runs nor manages Grafana; this is purely a link target.

grafana:
  url: "https://grafana.example.com:3000"
Field Default Description
url (none) Base URL of a Grafana instance. When set, the /health page links to each bundled dashboard at <url>/d/<uid>. Unset = no links shown.

sandgorgon

Optional. Gates the sandgorgon bare-metal lifecycle REST surface (Redfish BMC power/disk control, the NIST 800-88 decommission workflow, CSV/NetBox asset import). The subsystem is deferred until after Vigo 1.0 and exposes destructive BMC operations against plaintext-stored BMC credentials, so its routes are not mounted unless explicitly enabled. Leave off.

sandgorgon:
  enabled: false
Field Default Description
enabled false Mount the sandgorgon REST routes. Deferred post-v1; off by default.

i18n

Optional web UI localization. Off by default: with i18n disabled the UI renders in English with zero overhead and no language picker. When enabled, each request's locale is resolved cookie → Accept-Languagedefault_locale: the operator's saved choice (a cookie set from the top-navbar language picker, like the theme toggle) wins, then the browser's Accept-Language header (matched on the primary subtag, so fr-CA matches fr), then the configured default. Translations are key-based with English fallback — a missing string degrades to English rather than a blank.

Operator-authored data (envoy names, run results, audit entries) and all commands, flags, and config keys stay English regardless of locale; localization covers UI chrome (nav, labels, buttons) and the in-app docs.

i18n:
  enabled: false
  default_locale: "en"
  supported_locales: ["en", "fr", "de", "es", "pl", "pt", "it"]
Field Default Description
enabled false Enable non-English locales and the top-navbar language picker. When false the UI is English-only with no per-request locale work.
default_locale en Locale used when no cookie or Accept-Language match applies, and the fallback for any untranslated string.
supported_locales all bundled Locales offered in the picker and matched against Accept-Language. Empty = every locale with a bundled catalog. Always include default_locale.

swarm

Peer-to-peer content distribution (six content subsystems: filecast, gitback, longdrawer, lockbox, curator, poolq). Each subsystem documents the full field set in its howto — the section below covers poolq (ADR-029); see server.yaml.example for the full template plus the other subsystems.

swarm:
  enabled: ["*"]                       # substrate-level gate (pattern list)
  poolq:
    enabled: ["*"]                     # which envoys may run poolq (founders + log-holders)
    retention: "7d"                    # per-topic age window; older messages prune
    max_msg_bytes: 16384               # per-message body cap (default 16 KiB; hard ceiling 64 KiB)

swarm.poolq

Field Default Description
enabled [] Hostname pattern list (first-match-wins; - prefix denies; empty list disables) of envoys allowed to run poolq — publishers (founders) and log-holders.
retention "7d" Per-topic message-age retention window. Messages older than this are pruned by the mesh aggregator, and stale-on-ingest messages are refused — together that's flap-stable pruning without tombstones. Accepts Nd / standard Go duration strings. Empty / unparseable falls back to poolqmesh.DefaultRetention (7d).
max_msg_bytes 16384 Per-message body size cap on the publish path, in bytes. 0 (or unset) = the 16 KiB compile-time default. Hard ceiling 65536 (64 KiB) — bigger payloads belong in a curator artifact that the message references by id.

Publishing additionally requires poolq: true on the user's usercrate AND an unlocked puddle (a heavier grant than gitback:). Reading is ungated — messages are fleet-readable.

The admin moderation backstop is vigocli swarm poolq block <topic_id>: the server stops serving the topic's /range and /log endpoints fleet-wide. See the poolq howto for the publish + consume flow.

tuning

Advanced performance tuning. All fields optional.

tuning:
  signature_window: "5m"             # signature verification time window
  last_seen_flush: "10s"             # batch cadence for last_seen + trait insert/prune
  run_store: "database"              # "database" or "memory"
  run_store_capacity: 20             # per-envoy run depth in memory mode
  max_concurrent_streams: 5000       # gRPC concurrent stream limit
  grpc_read_buffer: 32768            # per-conn read buffer bytes; 0 = gRPC default (32 KiB)
  grpc_write_buffer: 32768           # per-conn write buffer bytes; 0 = gRPC default (32 KiB)
  keepalive_time: "30s"              # how often the server pings each conn to check liveness
  keepalive_timeout: "30s"           # deadline for the ping ACK before the conn is closed
  gogc: 100                          # Go GC target percentage
  # GOMEMLIMIT has no key here — auto-derived to ~80% of the cgroup-aware RAM,
  # or set the GOMEMLIMIT env var to override (GOMEMLIMIT=off for no limit).

gRPC connection buffers — grpc_read_buffer + grpc_write_buffer

The held stream connection itself costs ~200 KB of the ~623 KB total per-envoy heap (the rest is the FleetIndex entry + cached inventory — see the sizing guide), of which gRPC's default per-connection read+write staging buffers are ~64 KiB (32 KiB each). On large fleets that buffer footprint is the biggest operator-tunable slice of per-connection RAM: halving both to 16384 cuts ~32 KiB/conn (~48 MB across 1,500 connections). Both default to 0 → gRPC's 32 KiB; the option is only applied when set > 0. The trade-off is memory vs. syscall efficiency — smaller buffers mean more read/write syscalls per byte moved — so lower them only when connection-count RAM is the binding constraint, and re-check that large config publish bundle bursts still deliver promptly (full policy bundles ride the held stream as the only high-throughput payload).

There is intentionally no initial_window / HTTP/2 flow-control-window knob: gRPC floors the window at 64 KiB (a smaller value is ignored), so it cannot shrink per-connection RAM, and pinning it would only disable gRPC's BDP auto-tuning and throttle bundle delivery. The read/write buffers above are the connection-RAM dial.

gRPC keepalive — keepalive_time + keepalive_timeout

The server sends an HTTP/2 PING to every conn every keepalive_time (default 30s). If the PING isn't ACKed within keepalive_timeout (default 30s), gRPC closes the conn with GOAWAY and any in-flight RPC on it returns Unavailable to the agent. gRPC also pushes keepalive_timeout to the kernel's TCP_USER_TIMEOUT for the conn, so kernel-level read timeouts use the same deadline.

The trade-off is dead-client detection latency vs. tolerance for transient Go scheduler queueing delays. At high concurrent connection counts the runqueue depth grows (per docs/howto/sizing.md's measurements, ~16k runnable goroutines at 20k conns), and the deepest tail of waiting goroutines can take ~20s to dispatch — tight keepalive_timeout values kill conns under that transient pressure. The 30s default gives ~50% margin over the observed worst-case dispatch latency at the comfortable conn-count envelope. Operators running large fleets that still see Unavailable errors growing should raise to 60s before chasing other tunables; operators who need faster dead-conn reaping (small fleets where every connection matters) can drop to 20s — gRPC's own default — without re-introducing the scheduler-queueing failure mode at sub-15k conn counts.

The pre-0.69.20 default was 10s, which scaling work in 2026-05-29 showed was the actual mechanism behind the matrix's Unavailable errors at 20k+ conns regardless of check-in interval. See How-to: Size a vigosrv host for the supporting evidence.

paths

Directory paths for server subsystems.

paths:
  custom_traits: "custom-traits"
  docs: "docs"
  tasks: "stacks/tasks"
  workflows: "stacks/workflows"
  agent_dist: "dist/agent"
  license_dir: "/srv/vigo/license"

watcher

Polls the secrets provider on a schedule and force-pushes affected envoys when a value changes — automatic secret rotation without waiting for the next check-in. See Secrets.

watcher:
  enabled: true
  poll_interval: "5m"

rate_limit

Per-envoy and global ceilings on check-in RPCs. Requests over the limit get a retryable error so agents back off on their own. Note that connection count, not request rate, drives the memory ceiling — see Size a vigosrv host; this section only caps offered request rate.

rate_limit:
  enabled: true
  checkin_per_envoy: 6      # max check-ins per envoy per window
  checkin_global: 500       # max check-ins fleet-wide per window

maintenance

Global change freeze — vigocli config publish is blocked until freeze_until passes. For code freezes, customer maintenance windows, or incident response. Bypass requires editing the file and reloading, so the freeze is real.

maintenance:
  freeze_until: "2026-07-01T00:00:00Z"   # RFC3339; unset = no freeze

task

Ad-hoc task-dispatch safety. With require_definition: true, every task sent via vigocli task dispatch must reference a reviewed definition under stacks/tasks/ — direct shell commands are refused. Turn this on in production to force review of anything fan-outable to the fleet.

task:
  require_definition: true

export

One-way outbound compliance feeds — SIEM audit-event ingestion, CMDB inventory sync, and OSCAL-format framework reports — each toggled independently. See Compliance reporting.

export:
  siem: true
  cmdb: false
  oscal: true

compliance

Scopes which compliance frameworks this deployment reports against. An empty standards list activates every framework Vigo knows; an explicit list restricts dashboards, reports, and waiver tooling to that subset. See the Compliance matrix for the framework catalog.

compliance:
  standards: ["hipaa", "soc2"]   # empty/unset = every framework

risk

Per-envoy risk scoring. An optional NVD CVE API key enriches scoring with CVSS metrics and vendor-attributed advisories; without it, CVE severity comes from scanner output alone. Risk scores drive the dashboard Risk Posture column and the cyber-insurance attestation PDF.

risk:
  nvd:
    api_key: "secret:vigo/risk/nvd_api_key"   # optional CVSS enrichment

stream_edit

Guardrails on the file resource's stream_edit: attribute, which pipes content through agent-local scripts before it's written. Controls whether stream-edits are allowed, the permitted script paths, and the per-transform timeout. See the configcrate language for the attribute itself.

stream_edit:
  enabled: true
  allowed_paths: ["/srv/vigo/scripts"]
  default_timeout: "10s"

scrier

Browser-based remote access (SSH, RDP, VNC) to any enrolled envoy, routed out through the envoy's outbound agent stream so no inbound ports are required on the target. Every session is logged in scrier_sessions and audited on disconnect. See Set up Scrier and Scrier.

scrier:
  enabled: true
  guacd_addr: "127.0.0.1:4822"
  allowed_ports: [22, 3389, 5900]
  max_sessions: 10
  recording_enabled: false

Event Types Reference

All event types available for SMTP, webhook, and integration subscriptions:

Event Trigger
envoy.enrolled New envoy enrolled
run.success Successful convergence run
run.failure Failed convergence run
drift.detected Configuration drift corrected
convergence.drift Drift affecting compliance controls
convergence.threshold Fleet convergence drops below threshold
compliance.evidence Scheduled evidence email delivery
envoy.stale Envoy hasn't checked in within expected interval
secret.rotated Secret value rotated
config.reload.failure Config publish/reload failed
security.rootkit Rootkit detected
security.malware Malware detected
security.integrity File integrity breach
security.cve_critical New critical CVE found
security.hardening_drop Hardening score dropped significantly