Skip to content

NixOS Deployment

This tutorial guides you through deploying NixOS servers to cloud providers. You’ll understand infrastructure provisioning with terranix, deployment orchestration with clan, and how secrets management works on NixOS with clan vars.

By the end of this tutorial, you will understand:

  • How terranix translates Nix expressions into Terraform configurations
  • The multi-cloud strategy for Hetzner and GCP deployments
  • How clan inventory coordinates service assignment across machines
  • The complete deployment flow from infrastructure provisioning to system activation
  • Secrets management with clan vars on NixOS

Before starting, you should have:

90-120 minutes for your first deployment including infrastructure provisioning. Subsequent updates take 15-30 minutes.

Understanding NixOS in this infrastructure

Section titled “Understanding NixOS in this infrastructure”

Before provisioning anything, let’s understand what makes NixOS deployment different from darwin.

Darwin machines run darwin-rebuild locally. You sit at the machine, run a command, and watch it activate.

NixOS servers in this infrastructure use a push model through clan. You run commands from your workstation, and clan builds and deploys to remote machines over SSH.

This model enables:

  • Central management: Deploy multiple servers from one workstation
  • Consistent state: All configuration lives in git, not scattered across machines
  • Coordination: Services that span machines can be configured together

The infrastructure includes four NixOS machines across two cloud providers:

Hetzner Cloud:

  • cinnabar - Zerotier controller, always-on coordinator (enabled by default)
  • electrum - Zerotier peer, secondary test VM (disabled by default in terranix)

Google Cloud Platform:

  • galena - CPU-only compute node (e2-standard-8, toggle-controlled)
  • scheelite - GPU compute node (n1-standard-8 with Tesla T4, toggle-controlled)

Each cloud serves different purposes: Hetzner for cost-effective always-on infrastructure, GCP for burst compute and GPU workloads.

Cloud VMs cost money when running. The infrastructure uses toggle patterns to enable/disable machines:

# In modules/terranix/hetzner.nix or gcp.nix
cinnabar.enabled = true; # Always on - zerotier controller
electrum.enabled = false; # Disabled by default - enable when needed
galena.enabled = false; # Enable when needed
scheelite.enabled = false; # Enable when GPU needed

When enabled = false, terraform destroys the resource but preserves the configuration. Enable it again, run terraform, and the machine recreates with the same configuration.

Terranix bridges Nix and Terraform, letting you write infrastructure as Nix expressions.

Traditional Terraform uses HCL (HashiCorp Configuration Language). Terranix lets you write the same infrastructure definitions in Nix:

# Nix expression (terranix)
resource.hcloud_server.cinnabar = {
name = "cinnabar";
server_type = "cx22";
image = "ubuntu-24.04";
location = "fsn1";
};

This compiles to:

# Generated Terraform HCL
resource "hcloud_server" "cinnabar" {
name = "cinnabar"
server_type = "cx22"
image = "ubuntu-24.04"
location = "fsn1"
}

The benefits compound with complexity:

  • Type checking: Nix catches configuration errors before terraform runs
  • Code reuse: Define machine patterns once, instantiate them with parameters
  • Integration: Infrastructure config lives alongside NixOS config in the same repo
  • Consistency: Same language for infrastructure and system configuration
Terminal window
ls modules/terranix/

You’ll see:

  • hetzner.nix - Hetzner Cloud provider configuration
  • gcp.nix - Google Cloud Platform configuration
  • Related helper modules for networking, firewalls, etc.

Let’s provision a machine. We’ll use an existing configuration to understand the flow.

First, see what infrastructure exists:

Terminal window
nix run .#terraform -- state list

This shows resources terraform currently manages. If you haven’t run terraform before, this may be empty or show only existing resources.

Look at a machine definition. For cinnabar (Hetzner):

Terminal window
# View the terranix configuration
cat modules/terranix/hetzner.nix

Key elements:

  • Provider configuration: Credentials, default settings
  • Server resources: VM definitions with type, image, location
  • Network resources: Firewall rules, SSH keys
  • Output values: IP addresses for later use

Before applying, always plan:

Terminal window
nix run .#terraform -- plan

Terraform shows what it will create, modify, or destroy. Review this carefully, especially for destroy operations.

If the plan looks correct:

Terminal window
nix run .#terraform -- apply

Terraform prompts for confirmation, then provisions resources. This takes a few minutes for new VMs.

After apply, terraform shows outputs including IP addresses:

Terminal window
nix run .#terraform -- output

You’ll need these IPs for the clan deployment step.

Step 3: Understand clan machine management

Section titled “Step 3: Understand clan machine management”

Clan orchestrates NixOS deployment across multiple machines.

Clan provides:

  • Machine registry: Defines which machines exist and how to reach them
  • Inventory system: Assigns roles and services to machines
  • Secrets (vars): Generates and deploys machine-specific secrets
  • Deployment tooling: Commands for install, update, and status

The clan inventory defines service instances:

modules/clan/inventory.nix
{
services = {
zerotier = {
controller.roles.default.machines = [ "cinnabar" ];
peers.roles.default.machines = [ "electrum" "galena" "scheelite" ];
};
users = {
cameron.roles.default.machines = [ "cinnabar" "electrum" "galena" "scheelite" ];
};
};
}

This declares:

  • cinnabar runs the zerotier controller
  • Other machines are zerotier peers
  • The cameron user exists on all NixOS machines

Each NixOS machine has a configuration in modules/machines/nixos/:

Terminal window
ls modules/machines/nixos/

These configurations import deferred module composition modules just like darwin, but include NixOS-specific elements like systemd services and disko disk layouts.

Before deploying, generate machine-specific secrets.

Clan vars creates:

  • SSH host keys (ed25519)
  • Zerotier identity secrets
  • Any other service-specific credentials

These secrets are managed automatically by clan.

Terminal window
# Generate secrets for a specific machine
clan vars generate cinnabar
# Or generate for all machines
clan vars generate
Terminal window
# View generated secrets
ls vars/cinnabar/
# Check a specific secret
clan vars get cinnabar ssh.id_ed25519.pub

The public keys are safe to view. Private keys are encrypted and only decrypted during deployment.

Now deploy the configuration to your provisioned infrastructure.

For a fresh VM with only the base image, use clan machines install:

Terminal window
# Install NixOS on a new machine
clan machines install cinnabar --target-host root@<IP_ADDRESS>

Replace <IP_ADDRESS> with the terraform output.

This command:

  1. Connects via SSH to the target
  2. Partitions disks using disko configuration
  3. Installs NixOS with your configuration
  4. Reboots into the new system

For machines already running NixOS, use clan machines update:

Terminal window
# Update an existing machine
clan machines update cinnabar

This:

  1. Builds the new configuration locally
  2. Copies closures to the remote machine
  3. Activates the new configuration
  4. Runs any activation scripts

Common options:

Terminal window
# Update multiple machines
clan machines update cinnabar electrum
# Dry run (build but don't deploy)
clan machines update cinnabar --dry-run
# Update all machines in inventory
clan machines update

After deployment, verify mesh connectivity.

SSH into the deployed machine:

Terminal window
ssh cinnabar.zt # If zerotier DNS works
# Or
ssh root@<IP_ADDRESS>

Then verify zerotier:

Terminal window
sudo zerotier-cli status
# Should show "ONLINE"
sudo zerotier-cli listnetworks
# Should show network db4344343b14b903 with status "OK"

From the deployed machine, ping other nodes:

Terminal window
ping electrum.zt
ping stibnite.zt

If zerotier DNS isn’t configured, use IP addresses:

Terminal window
# Get peer IPs
sudo zerotier-cli peers

If deploying cinnabar (the controller), other machines authorize through it. If deploying a peer, it needs authorization from cinnabar.

Check controller status:

Terminal window
# On cinnabar
sudo zerotier-cli listnetworks
# Look for "CONTROLLER" in the output

Step 7: Configure user secrets (legacy sops-nix)

Section titled “Step 7: Configure user secrets (legacy sops-nix)”

NixOS machines can also use legacy sops-nix for user-specific credentials during the migration.

Clan vars (target) provides all secrets:

  • Generated automatically
  • Machine-scoped
  • Deployed to /run/secrets/

sops-nix (legacy) provides user secrets during migration:

  • Created manually
  • User-scoped
  • Deployed to ~/.config/sops-nix/secrets/

Darwin machines use legacy sops-nix patterns. NixOS is migrating toward clan vars for all secrets.

The process mirrors darwin. On the NixOS machine:

Terminal window
# Create the sops directory
mkdir -p ~/.config/sops/age
# Either copy your age key from another machine
scp yourworkstation:~/.config/sops/age/keys.txt ~/.config/sops/age/
# Or derive a new one from your SSH key
ssh-to-age -private-key -i ~/.ssh/id_ed25519 > ~/.config/sops/age/keys.txt
chmod 600 ~/.config/sops/age/keys.txt

After setting up the key, your user’s sops secrets will decrypt during home-manager activation.

Terminal window
# Clan vars: System secrets
ls /run/secrets/
# Legacy sops-nix: User secrets (after home-manager activation)
ls ~/.config/sops-nix/secrets/

Let’s understand the multi-cloud strategy.

Hetzner provides cost-effective European hosting:

modules/terranix/hetzner.nix
resource.hcloud_server.cinnabar = {
server_type = "cx22"; # 2 vCPU, 4GB RAM, ~$5/month
location = "fsn1"; # Falkenstein, Germany
image = "ubuntu-24.04";
};

Use Hetzner for:

  • Always-on infrastructure (zerotier controller)
  • General-purpose workloads
  • European data residency requirements

GCP provides access to GPUs and burst compute:

modules/terranix/gcp.nix
resource.google_compute_instance.galena = {
machine_type = "e2-standard-8"; # 8 vCPU, 32GB RAM
zone = "us-west1-b";
# Toggle pattern
count = var.galena_enabled ? 1 : 0;
};

Use GCP for:

  • GPU workloads (scheelite with Tesla T4)
  • Burst compute capacity
  • US-based infrastructure

The toggle pattern prevents surprise bills:

Terminal window
# Disable expensive resources when not needed
# In gcp.nix, set: scheelite.enabled = false
# Then apply
nix run .#terraform -- apply
# Resources are destroyed but configuration preserved

To re-enable:

Terminal window
# Set: scheelite.enabled = true
nix run .#terraform -- apply
# Resources recreated with same configuration
# Then deploy NixOS config
clan machines update scheelite

You’ve now deployed NixOS to cloud infrastructure. Along the way, you learned:

  • Terranix translates Nix to Terraform for infrastructure provisioning
  • Clan orchestrates NixOS deployment with inventory-based service assignment
  • Secrets on NixOS: clan vars for system secrets, legacy sops-nix for user secrets during migration
  • Multi-cloud patterns for Hetzner (cost-effective) and GCP (burst/GPU)
  • Toggle patterns manage costs by enabling/disabling resources

Now that you’ve deployed NixOS:

  1. Add more machines by creating configurations in modules/machines/nixos/ and terranix definitions

  2. Explore service patterns by examining how zerotier and users are assigned via inventory

  3. Review operational procedures in the guides:

  4. Understand the architecture more deeply:

Check your cloud credentials:

Terminal window
# Hetzner
echo $HCLOUD_TOKEN
# GCP
echo $GOOGLE_APPLICATION_CREDENTIALS
ls -la $GOOGLE_APPLICATION_CREDENTIALS

Credentials should be set via environment variables or secrets.

clan machines install fails with SSH error

Section titled “clan machines install fails with SSH error”

Verify SSH connectivity:

Terminal window
# Test basic SSH
ssh root@<IP_ADDRESS> echo "connection works"
# Check SSH key is offered
ssh -v root@<IP_ADDRESS> 2>&1 | grep "Offering"

Ensure:

  • The terraform-provisioned SSH key matches what you have locally
  • Root login is enabled on the fresh VM image
  • Firewall allows SSH (port 22)

Large deployments may need more time:

Terminal window
# Increase timeout
clan machines update cinnabar --timeout 3600

Or check if the machine is actually reachable:

Terminal window
ping <IP_ADDRESS>
ssh root@<IP_ADDRESS>

Check zerotier service status:

Terminal window
sudo systemctl status zerotier-one
sudo journalctl -u zerotier-one -n 50

Common issues:

  • Network ID wrong in configuration
  • Firewall blocking UDP 9993
  • Controller not authorizing the peer

Ensure vars were generated:

Terminal window
# Check vars directory
ls vars/cinnabar/
# Regenerate if needed
clan vars generate cinnabar
# Redeploy
clan machines update cinnabar

For comprehensive troubleshooting, see the Host Onboarding Guide.