Disk Hygiene
Prevent disk and inode exhaustion with a role that bundles five focused configcrates. Each configcrate enforces one cleanup policy — assign the whole role or pick individual configcrates.
The Role
# stacks/roles.vgo
roles:
- name: disk-hygiene
configcrates: [logrotate, journal-cap, tmp-aging, package-cache, docker-prune]
Assign it to a node:
match:
- pattern: "*.web.example.com"
roles: [bastion, disk-hygiene]
Configcrates
logrotate (already exists)
Installs logrotate and enforces rotation for /var/log/*.log and /var/log/*/*.log. Many service configcrates (nginx, postgresql, haproxy, etc.) ship their own logrotate resources — this configcrate covers the system-wide catch-all.
journal-cap
Drops a systemd-journald config into /etc/systemd/journald.conf.d/ that caps total journal size and retention:
vars:
journal_max_use: "500M" # max disk space for journal files
journal_max_retention: "30day" # delete entries older than this
Notifies systemd-journald to reload after changes.
tmp-aging
Writes a tmpfiles.d config that removes old files from /tmp and /var/tmp:
vars:
tmp_max_age: "7d" # /tmp cleanup threshold
var_tmp_max_age: "30d" # /var/tmp cleanup threshold
Enforced by systemd-tmpfiles-clean.timer (runs daily on most distros, no extra service needed).
package-cache
OS-family conditional — prevents stale package downloads from accumulating:
- Debian/Ubuntu: Sets
APT::Periodic::AutocleanIntervalso apt removes superseded.debfiles every N days (default 7). - RHEL/Fedora: Sets
keepcache=0indnf.confand runsdnf clean allif the cache exceeds 500 MB.
docker-prune
Creates a systemd timer that runs docker system prune -af on a schedule. Removes stopped containers, dangling images, unused networks, and build cache older than prune_age:
vars:
prune_age: "168h" # 7 days
prune_schedule: "weekly" # systemd OnCalendar value
All resources are gated with when: "has_command('docker')" — the configcrate is a no-op on machines without Docker.
Per-node overrides
Override vars from the match block or from common.vgo:
match:
- pattern: "build-*.ci.example.com"
roles: [disk-hygiene]
vars:
journal_max_use: "200M"
tmp_max_age: "1d"
prune_schedule: "daily"
prune_age: "24h"
What this doesn't do
This role enforces preventive policies — it configures the system so disk usage stays under control. It does not:
- Scan for and delete arbitrary large files. Use
vigocli queryto find large files across the fleet when an alert fires. - Alert on disk/inode thresholds. That's the job of the disk trait collector + server-side alerting (webhooks/SMTP).
- Replace service-specific logrotate configs. Service configcrates (nginx, postgres, etc.) manage their own log rotation. This role handles system-wide defaults.