NixOS Deployment
This tutorial guides you through deploying NixOS servers to cloud providers. You’ll understand infrastructure provisioning with terranix, deployment orchestration with clan, and how secrets management works on NixOS with clan vars.
What you will learn
Section titled “What you will learn”By the end of this tutorial, you will understand:
- How terranix translates Nix expressions into Terraform configurations
- The multi-cloud strategy for Hetzner and GCP deployments
- How clan inventory coordinates service assignment across machines
- The complete deployment flow from infrastructure provisioning to system activation
- Secrets management with clan vars on NixOS
Prerequisites
Section titled “Prerequisites”Before starting, you should have:
- Completed the Bootstrap to Activation Tutorial
- Completed the Secrets Setup Tutorial
- Cloud provider credentials (Hetzner API token and/or GCP service account)
- SSH access configured for remote deployment
- The repository cloned with direnv activated
Estimated time
Section titled “Estimated time”90-120 minutes for your first deployment including infrastructure provisioning. Subsequent updates take 15-30 minutes.
Understanding NixOS in this infrastructure
Section titled “Understanding NixOS in this infrastructure”Before provisioning anything, let’s understand what makes NixOS deployment different from darwin.
NixOS vs darwin deployment models
Section titled “NixOS vs darwin deployment models”Darwin machines run darwin-rebuild locally. You sit at the machine, run a command, and watch it activate.
NixOS servers in this infrastructure use a push model through clan. You run commands from your workstation, and clan builds and deploys to remote machines over SSH.
This model enables:
- Central management: Deploy multiple servers from one workstation
- Consistent state: All configuration lives in git, not scattered across machines
- Coordination: Services that span machines can be configured together
The NixOS fleet
Section titled “The NixOS fleet”The infrastructure includes four NixOS machines across two cloud providers:
Hetzner Cloud:
- cinnabar - Zerotier controller, always-on coordinator (enabled by default)
- electrum - Zerotier peer, secondary test VM (disabled by default in terranix)
Google Cloud Platform:
- galena - CPU-only compute node (e2-standard-8, toggle-controlled)
- scheelite - GPU compute node (n1-standard-8 with Tesla T4, toggle-controlled)
Each cloud serves different purposes: Hetzner for cost-effective always-on infrastructure, GCP for burst compute and GPU workloads.
Toggle patterns for cost control
Section titled “Toggle patterns for cost control”Cloud VMs cost money when running. The infrastructure uses toggle patterns to enable/disable machines:
# In modules/terranix/hetzner.nix or gcp.nixcinnabar.enabled = true; # Always on - zerotier controllerelectrum.enabled = false; # Disabled by default - enable when neededgalena.enabled = false; # Enable when neededscheelite.enabled = false; # Enable when GPU neededWhen enabled = false, terraform destroys the resource but preserves the configuration.
Enable it again, run terraform, and the machine recreates with the same configuration.
Step 1: Understand terranix
Section titled “Step 1: Understand terranix”Terranix bridges Nix and Terraform, letting you write infrastructure as Nix expressions.
What terranix does
Section titled “What terranix does”Traditional Terraform uses HCL (HashiCorp Configuration Language). Terranix lets you write the same infrastructure definitions in Nix:
# Nix expression (terranix)resource.hcloud_server.cinnabar = { name = "cinnabar"; server_type = "cx22"; image = "ubuntu-24.04"; location = "fsn1";};This compiles to:
# Generated Terraform HCLresource "hcloud_server" "cinnabar" { name = "cinnabar" server_type = "cx22" image = "ubuntu-24.04" location = "fsn1"}Why use terranix?
Section titled “Why use terranix?”The benefits compound with complexity:
- Type checking: Nix catches configuration errors before terraform runs
- Code reuse: Define machine patterns once, instantiate them with parameters
- Integration: Infrastructure config lives alongside NixOS config in the same repo
- Consistency: Same language for infrastructure and system configuration
Module structure
Section titled “Module structure”ls modules/terranix/You’ll see:
hetzner.nix- Hetzner Cloud provider configurationgcp.nix- Google Cloud Platform configuration- Related helper modules for networking, firewalls, etc.
Step 2: Provision infrastructure
Section titled “Step 2: Provision infrastructure”Let’s provision a machine. We’ll use an existing configuration to understand the flow.
Check current state
Section titled “Check current state”First, see what infrastructure exists:
nix run .#terraform -- state listThis shows resources terraform currently manages. If you haven’t run terraform before, this may be empty or show only existing resources.
Review the configuration
Section titled “Review the configuration”Look at a machine definition. For cinnabar (Hetzner):
# View the terranix configurationcat modules/terranix/hetzner.nixKey elements:
- Provider configuration: Credentials, default settings
- Server resources: VM definitions with type, image, location
- Network resources: Firewall rules, SSH keys
- Output values: IP addresses for later use
Plan the changes
Section titled “Plan the changes”Before applying, always plan:
nix run .#terraform -- planTerraform shows what it will create, modify, or destroy. Review this carefully, especially for destroy operations.
Apply infrastructure
Section titled “Apply infrastructure”If the plan looks correct:
nix run .#terraform -- applyTerraform prompts for confirmation, then provisions resources. This takes a few minutes for new VMs.
Note the outputs
Section titled “Note the outputs”After apply, terraform shows outputs including IP addresses:
nix run .#terraform -- outputYou’ll need these IPs for the clan deployment step.
Step 3: Understand clan machine management
Section titled “Step 3: Understand clan machine management”Clan orchestrates NixOS deployment across multiple machines.
What clan manages
Section titled “What clan manages”Clan provides:
- Machine registry: Defines which machines exist and how to reach them
- Inventory system: Assigns roles and services to machines
- Secrets (vars): Generates and deploys machine-specific secrets
- Deployment tooling: Commands for install, update, and status
Inventory structure
Section titled “Inventory structure”The clan inventory defines service instances:
{ services = { zerotier = { controller.roles.default.machines = [ "cinnabar" ]; peers.roles.default.machines = [ "electrum" "galena" "scheelite" ]; }; users = { cameron.roles.default.machines = [ "cinnabar" "electrum" "galena" "scheelite" ]; }; };}This declares:
- cinnabar runs the zerotier controller
- Other machines are zerotier peers
- The cameron user exists on all NixOS machines
Machine configuration
Section titled “Machine configuration”Each NixOS machine has a configuration in modules/machines/nixos/:
ls modules/machines/nixos/These configurations import deferred module composition modules just like darwin, but include NixOS-specific elements like systemd services and disko disk layouts.
Step 4: Generate secrets
Section titled “Step 4: Generate secrets”Before deploying, generate machine-specific secrets.
What gets generated
Section titled “What gets generated”Clan vars creates:
- SSH host keys (ed25519)
- Zerotier identity secrets
- Any other service-specific credentials
These secrets are managed automatically by clan.
Generate vars
Section titled “Generate vars”# Generate secrets for a specific machineclan vars generate cinnabar
# Or generate for all machinesclan vars generateVerify generation
Section titled “Verify generation”# View generated secretsls vars/cinnabar/
# Check a specific secretclan vars get cinnabar ssh.id_ed25519.pubThe public keys are safe to view. Private keys are encrypted and only decrypted during deployment.
Step 5: Deploy to NixOS
Section titled “Step 5: Deploy to NixOS”Now deploy the configuration to your provisioned infrastructure.
First-time installation
Section titled “First-time installation”For a fresh VM with only the base image, use clan machines install:
# Install NixOS on a new machineclan machines install cinnabar --target-host root@<IP_ADDRESS>Replace <IP_ADDRESS> with the terraform output.
This command:
- Connects via SSH to the target
- Partitions disks using disko configuration
- Installs NixOS with your configuration
- Reboots into the new system
Subsequent updates
Section titled “Subsequent updates”For machines already running NixOS, use clan machines update:
# Update an existing machineclan machines update cinnabarThis:
- Builds the new configuration locally
- Copies closures to the remote machine
- Activates the new configuration
- Runs any activation scripts
Deployment options
Section titled “Deployment options”Common options:
# Update multiple machinesclan machines update cinnabar electrum
# Dry run (build but don't deploy)clan machines update cinnabar --dry-run
# Update all machines in inventoryclan machines updateStep 6: Verify zerotier mesh
Section titled “Step 6: Verify zerotier mesh”After deployment, verify mesh connectivity.
Check zerotier on the deployed machine
Section titled “Check zerotier on the deployed machine”SSH into the deployed machine:
ssh cinnabar.zt # If zerotier DNS works# Orssh root@<IP_ADDRESS>Then verify zerotier:
sudo zerotier-cli status# Should show "ONLINE"
sudo zerotier-cli listnetworks# Should show network db4344343b14b903 with status "OK"Verify mesh connectivity
Section titled “Verify mesh connectivity”From the deployed machine, ping other nodes:
ping electrum.ztping stibnite.ztIf zerotier DNS isn’t configured, use IP addresses:
# Get peer IPssudo zerotier-cli peersController vs peer
Section titled “Controller vs peer”If deploying cinnabar (the controller), other machines authorize through it. If deploying a peer, it needs authorization from cinnabar.
Check controller status:
# On cinnabarsudo zerotier-cli listnetworks# Look for "CONTROLLER" in the outputStep 7: Configure user secrets (legacy sops-nix)
Section titled “Step 7: Configure user secrets (legacy sops-nix)”NixOS machines can also use legacy sops-nix for user-specific credentials during the migration.
Understanding secrets on NixOS
Section titled “Understanding secrets on NixOS”Clan vars (target) provides all secrets:
- Generated automatically
- Machine-scoped
- Deployed to
/run/secrets/
sops-nix (legacy) provides user secrets during migration:
- Created manually
- User-scoped
- Deployed to
~/.config/sops-nix/secrets/
Darwin machines use legacy sops-nix patterns. NixOS is migrating toward clan vars for all secrets.
Setting up legacy sops-nix on NixOS
Section titled “Setting up legacy sops-nix on NixOS”The process mirrors darwin. On the NixOS machine:
# Create the sops directorymkdir -p ~/.config/sops/age
# Either copy your age key from another machinescp yourworkstation:~/.config/sops/age/keys.txt ~/.config/sops/age/
# Or derive a new one from your SSH keyssh-to-age -private-key -i ~/.ssh/id_ed25519 > ~/.config/sops/age/keys.txtchmod 600 ~/.config/sops/age/keys.txtAfter setting up the key, your user’s sops secrets will decrypt during home-manager activation.
Verify secrets
Section titled “Verify secrets”# Clan vars: System secretsls /run/secrets/
# Legacy sops-nix: User secrets (after home-manager activation)ls ~/.config/sops-nix/secrets/Step 8: Multi-cloud patterns
Section titled “Step 8: Multi-cloud patterns”Let’s understand the multi-cloud strategy.
Hetzner patterns
Section titled “Hetzner patterns”Hetzner provides cost-effective European hosting:
resource.hcloud_server.cinnabar = { server_type = "cx22"; # 2 vCPU, 4GB RAM, ~$5/month location = "fsn1"; # Falkenstein, Germany image = "ubuntu-24.04";};Use Hetzner for:
- Always-on infrastructure (zerotier controller)
- General-purpose workloads
- European data residency requirements
GCP patterns
Section titled “GCP patterns”GCP provides access to GPUs and burst compute:
resource.google_compute_instance.galena = { machine_type = "e2-standard-8"; # 8 vCPU, 32GB RAM zone = "us-west1-b";
# Toggle pattern count = var.galena_enabled ? 1 : 0;};Use GCP for:
- GPU workloads (scheelite with Tesla T4)
- Burst compute capacity
- US-based infrastructure
Cost management
Section titled “Cost management”The toggle pattern prevents surprise bills:
# Disable expensive resources when not needed# In gcp.nix, set: scheelite.enabled = false
# Then applynix run .#terraform -- apply
# Resources are destroyed but configuration preservedTo re-enable:
# Set: scheelite.enabled = truenix run .#terraform -- apply# Resources recreated with same configuration
# Then deploy NixOS configclan machines update scheeliteWhat you’ve learned
Section titled “What you’ve learned”You’ve now deployed NixOS to cloud infrastructure. Along the way, you learned:
- Terranix translates Nix to Terraform for infrastructure provisioning
- Clan orchestrates NixOS deployment with inventory-based service assignment
- Secrets on NixOS: clan vars for system secrets, legacy sops-nix for user secrets during migration
- Multi-cloud patterns for Hetzner (cost-effective) and GCP (burst/GPU)
- Toggle patterns manage costs by enabling/disabling resources
Next steps
Section titled “Next steps”Now that you’ve deployed NixOS:
-
Add more machines by creating configurations in
modules/machines/nixos/and terranix definitions -
Explore service patterns by examining how zerotier and users are assigned via inventory
-
Review operational procedures in the guides:
- Host Onboarding Guide for detailed procedures
- Secrets Management Guide for secret operations
-
Understand the architecture more deeply:
- Clan Integration for the full coordination model
- Deferred Module Composition for module organization
Troubleshooting
Section titled “Troubleshooting”Terraform fails to authenticate
Section titled “Terraform fails to authenticate”Check your cloud credentials:
# Hetznerecho $HCLOUD_TOKEN
# GCPecho $GOOGLE_APPLICATION_CREDENTIALSls -la $GOOGLE_APPLICATION_CREDENTIALSCredentials should be set via environment variables or secrets.
clan machines install fails with SSH error
Section titled “clan machines install fails with SSH error”Verify SSH connectivity:
# Test basic SSHssh root@<IP_ADDRESS> echo "connection works"
# Check SSH key is offeredssh -v root@<IP_ADDRESS> 2>&1 | grep "Offering"Ensure:
- The terraform-provisioned SSH key matches what you have locally
- Root login is enabled on the fresh VM image
- Firewall allows SSH (port 22)
Deployment times out
Section titled “Deployment times out”Large deployments may need more time:
# Increase timeoutclan machines update cinnabar --timeout 3600Or check if the machine is actually reachable:
ping <IP_ADDRESS>ssh root@<IP_ADDRESS>Zerotier not joining network
Section titled “Zerotier not joining network”Check zerotier service status:
sudo systemctl status zerotier-onesudo journalctl -u zerotier-one -n 50Common issues:
- Network ID wrong in configuration
- Firewall blocking UDP 9993
- Controller not authorizing the peer
Clan vars secrets not appearing
Section titled “Clan vars secrets not appearing”Ensure vars were generated:
# Check vars directoryls vars/cinnabar/
# Regenerate if neededclan vars generate cinnabar
# Redeployclan machines update cinnabarFor comprehensive troubleshooting, see the Host Onboarding Guide.