Compare commits

..

36 Commits

Author SHA1 Message Date
Bas Nijholt
27f17a2451 Remove unused PortMapping.protocol field 2025-12-14 00:52:47 -08:00
Bas Nijholt
98c2492d21 docs: Add cf alias and check command to README 2025-12-14 00:41:26 -08:00
Bas Nijholt
04339cbb9a Group CLI commands into Lifecycle, Monitoring, Configuration 2025-12-14 00:37:18 -08:00
Bas Nijholt
cdb3b1d257 Show friendly error when config file not found
Instead of a Python traceback, display a clean error message with
the red ✗ symbol when the config file cannot be found.
2025-12-14 00:31:36 -08:00
Bas Nijholt
0913769729 Fix check command to validate all services with check_all flag 2025-12-14 00:23:23 -08:00
Bas Nijholt
3a1d5b77b5 Add traefik port validation to check command 2025-12-14 00:19:17 -08:00
Bas Nijholt
e12002ce86 Add test for network_mode: service:X port lookup 2025-12-14 00:03:11 -08:00
Bas Nijholt
676a6fe72d Support network_mode: service:X for port lookup in traefik config 2025-12-14 00:02:07 -08:00
Bas Nijholt
f29f8938fe Add -h as alias for --help 2025-12-13 23:56:33 -08:00
Bas Nijholt
4c0e147786 Escape log output to prevent Rich markup errors 2025-12-13 23:55:44 -08:00
Bas Nijholt
cba61118de Add cf alias for compose-farm command 2025-12-13 23:54:00 -08:00
Bas Nijholt
32dc6b3665 Skip empty lines in streaming output 2025-12-13 23:50:35 -08:00
Bas Nijholt
7d98e664e9 Auto-detect local IPs to skip SSH when on target host 2025-12-13 23:48:28 -08:00
Bas Nijholt
6763403700 Fix duplicate prefix before traefik config message 2025-12-13 23:46:41 -08:00
Bas Nijholt
feb0e13bfd Add check command to find missing services 2025-12-13 23:43:47 -08:00
Bas Nijholt
b86f6d190f Add Rich styling to CLI output
- Service names in cyan, host names in magenta
- Success checkmarks, warning/error symbols
- Colored sync diff indicators (+/-/~)
- Unicode arrows for migrations
2025-12-13 23:40:07 -08:00
Bas Nijholt
5ed15b5445 docs: Add Docker Swarm overlay network notes 2025-12-13 23:16:09 -08:00
Bas Nijholt
761b6dd2d1 Rename state file to compose-farm-state.yaml (not hidden) 2025-12-13 23:01:40 -08:00
Bas Nijholt
e86c2b6d47 docs: Simplify Traefik port requirement note 2025-12-13 22:59:50 -08:00
basnijholt
9353b74c35 chore(docs): update TOC 2025-12-14 06:58:15 +00:00
Bas Nijholt
b7e8e0f3a9 docs: Add limitations and best practices section
Documents cross-host networking limitations:
- Docker DNS doesn't work across hosts
- Dependent services should stay in same compose file
- Ports must be published for cross-host communication
2025-12-13 22:58:01 -08:00
Bas Nijholt
b6c02587bc Rename traefik_host to traefik_service
Instead of specifying the host directly, specify the service name
that runs Traefik. The host is then looked up from the services
mapping, avoiding redundancy.
2025-12-13 22:43:33 -08:00
Bas Nijholt
d412c42ca4 Store state file alongside config file
State is now stored at .compose-farm-state.yaml in the same
directory as the config file. This allows multiple compose-farm
setups with independent state.

State functions now require a Config parameter to locate the
state file via config.get_state_path().
2025-12-13 22:38:11 -08:00
Bas Nijholt
13e0adbbb9 Add traefik_host config to skip local services
When traefik_host is set, services on that host are skipped in
file-provider generation since Traefik's docker provider handles
them directly. This allows running compose-farm from any host
while still generating correct file-provider config.
2025-12-13 22:34:20 -08:00
Bas Nijholt
68c41eb37c Improve missing ports warning message
Replace technical "L3 reachability" phrasing with actionable
guidance: "Add a ports: mapping for cross-host routing."
2025-12-13 22:29:20 -08:00
Bas Nijholt
8af088bb5d Add traefik_file config for auto-regeneration
When traefik_file is set in config, compose-farm automatically
regenerates the Traefik file-provider config after up, down,
restart, and update commands. Eliminates the need to manually
run traefik-file after service changes.
2025-12-13 22:24:29 -08:00
Bas Nijholt
1308eeca12 fix: Skip local services in traefik-file generation
Local services (localhost, local, 127.0.0.1) are handled by Traefik's
docker provider directly. Generating file-provider entries for them
creates conflicting routes with broken localhost URLs (since Traefik
runs in a container where localhost is isolated).

Now traefik-file only generates config for remote services.
2025-12-13 19:51:57 -08:00
Bas Nijholt
a66a68f395 docs: Clarify no merge commits to main rule 2025-12-13 19:44:30 -08:00
Bas Nijholt
6ea25c862e docs: Add traefik directory merging instructions 2025-12-13 19:41:16 -08:00
Bas Nijholt
280524b546 docs: Use GitHub admonition for TL;DR 2025-12-13 19:34:31 -08:00
Bas Nijholt
db9360771b docs: Shorten TL;DR 2025-12-13 19:33:28 -08:00
Bas Nijholt
c7590ed0b7 docs: Move TOC below TL;DR 2025-12-13 19:32:41 -08:00
Bas Nijholt
bb563b9d4b docs: Add TL;DR to README 2025-12-13 19:31:05 -08:00
Bas Nijholt
fe160ee116 fix: Move traefik import to top-level 2025-12-13 17:07:29 -08:00
Bas Nijholt
4c7f49414f docs: Update README for sync command and auto-migration
- Replace snapshot with sync command
- Add auto-migration documentation
- Update compose file naming convention
2025-12-13 16:55:07 -08:00
Bas Nijholt
bebe5b34ba Merge snapshot into sync command
The sync command now performs both operations:
- Discovers running services and updates state.yaml
- Captures image digests and updates dockerfarm-log.toml

Removes the standalone snapshot command to keep the API simple.
2025-12-13 16:53:49 -08:00
14 changed files with 608 additions and 192 deletions

View File

@@ -31,7 +31,7 @@ compose_farm/
## Git Safety
- Never amend commits.
- Never merge into a branch; prefer fast-forward or rebase as directed.
- **NEVER merge anything into main.** Always commit directly or use fast-forward/rebase.
- Never force push.
## Commands Quick Reference

170
README.md
View File

@@ -1,23 +1,30 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Compose Farm](#compose-farm)
- [Why Compose Farm?](#why-compose-farm)
- [Key Assumption: Shared Storage](#key-assumption-shared-storage)
- [Installation](#installation)
- [Configuration](#configuration)
- [Usage](#usage)
- [Traefik Multihost Ingress (File Provider)](#traefik-multihost-ingress-file-provider)
- [Requirements](#requirements)
- [How It Works](#how-it-works)
- [License](#license)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
# Compose Farm
A minimal CLI tool to run Docker Compose commands across multiple hosts via SSH.
> [!NOTE]
> Run `docker compose` commands across multiple hosts via SSH. One YAML maps services to hosts. Change the mapping, run `up`, and it auto-migrates. No Kubernetes, no Swarm, no magic.
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Why Compose Farm?](#why-compose-farm)
- [Key Assumption: Shared Storage](#key-assumption-shared-storage)
- [Limitations & Best Practices](#limitations--best-practices)
- [What breaks when you move a service](#what-breaks-when-you-move-a-service)
- [Best practices](#best-practices)
- [What Compose Farm doesn't do](#what-compose-farm-doesnt-do)
- [Installation](#installation)
- [Configuration](#configuration)
- [Usage](#usage)
- [Auto-Migration](#auto-migration)
- [Traefik Multihost Ingress (File Provider)](#traefik-multihost-ingress-file-provider)
- [Requirements](#requirements)
- [How It Works](#how-it-works)
- [License](#license)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Why Compose Farm?
I run 100+ Docker Compose stacks on an LXC container that frequently runs out of memory. I needed a way to distribute services across multiple machines without the complexity of:
@@ -44,6 +51,37 @@ nas:/volume1/compose → /opt/compose (on nas03)
Compose Farm simply runs `docker compose -f /opt/compose/{service}/docker-compose.yml` on the appropriate host—it doesn't copy or sync files.
## Limitations & Best Practices
Compose Farm moves containers between hosts but **does not provide cross-host networking**. Docker's internal DNS and networks don't span hosts.
### What breaks when you move a service
- **Docker DNS** - `http://redis:6379` won't resolve from another host
- **Docker networks** - Containers can't reach each other via network names
- **Environment variables** - `DATABASE_URL=postgres://db:5432` stops working
### Best practices
1. **Keep dependent services together** - If an app needs a database, redis, or worker, keep them in the same compose file on the same host
2. **Only migrate standalone services** - Services that don't talk to other containers (or only talk to external APIs) are safe to move
3. **Expose ports for cross-host communication** - If services must communicate across hosts, publish ports and use IP addresses instead of container names:
```yaml
# Instead of: DATABASE_URL=postgres://db:5432
# Use: DATABASE_URL=postgres://192.168.1.66:5432
```
This includes Traefik routing—containers need published ports for the file-provider to reach them
### What Compose Farm doesn't do
- No overlay networking (use Docker Swarm or Kubernetes for that)
- No service discovery across hosts
- No automatic dependency tracking between compose files
If you need containers on different hosts to communicate seamlessly, you need Docker Swarm, Kubernetes, or a service mesh—which adds the complexity Compose Farm is designed to avoid.
## Installation
```bash
@@ -75,37 +113,59 @@ services:
radarr: local # Runs on the machine where you invoke compose-farm
```
Compose files are expected at `{compose_dir}/{service}/docker-compose.yml`.
Compose files are expected at `{compose_dir}/{service}/compose.yaml` (also supports `compose.yml`, `docker-compose.yml`, `docker-compose.yaml`).
## Usage
The CLI is available as both `compose-farm` and the shorter `cf` alias.
```bash
# Start services
compose-farm up plex jellyfin
compose-farm up --all
# Start services (auto-migrates if host changed in config)
cf up plex jellyfin
cf up --all
# Stop services
compose-farm down plex
cf down plex
# Pull latest images
compose-farm pull --all
cf pull --all
# Restart (down + up)
compose-farm restart plex
cf restart plex
# Update (pull + down + up) - the end-to-end update command
compose-farm update --all
cf update --all
# Capture image digests to a TOML log (per service or all)
compose-farm snapshot plex
compose-farm snapshot --all # writes ~/.config/compose-farm/dockerfarm-log.toml
# Sync state with reality (discovers running services + captures image digests)
cf sync # updates state.yaml and dockerfarm-log.toml
cf sync --dry-run # preview without writing
# Check config vs disk (find missing services, validate traefik labels)
cf check
# View logs
compose-farm logs plex
compose-farm logs -f plex # follow
cf logs plex
cf logs -f plex # follow
# Show status
compose-farm ps
cf ps
```
### Auto-Migration
When you change a service's host assignment in config and run `up`, Compose Farm automatically:
1. Runs `down` on the old host
2. Runs `up -d` on the new host
3. Updates state tracking
```yaml
# Before: plex runs on nas01
services:
plex: nas01
# After: change to nas02, then run `cf up plex`
services:
plex: nas02 # Compose Farm will migrate automatically
```
## Traefik Multihost Ingress (File Provider)
@@ -156,12 +216,58 @@ providers:
**Generate the fragment**
```bash
compose-farm traefik-file --output /mnt/data/traefik/dynamic.d/compose-farm.generated.yml
cf traefik-file --all --output /mnt/data/traefik/dynamic.d/compose-farm.yml
```
Rerun this after changing Traefik labels, moving a service to another host, or changing
published ports.
**Auto-regeneration**
To automatically regenerate the Traefik config after `up`, `down`, `restart`, or `update`,
add `traefik_file` to your config:
```yaml
compose_dir: /opt/compose
traefik_file: /opt/traefik/dynamic.d/compose-farm.yml # auto-regenerate on up/down/restart/update
traefik_service: traefik # skip services on same host (docker provider handles them)
hosts:
# ...
services:
traefik: nas01 # Traefik runs here
plex: nas02 # Services on other hosts get file-provider entries
# ...
```
The `traefik_service` option specifies which service runs Traefik. Services on the same host
are skipped in the file-provider config since Traefik's docker provider handles them directly.
Now `cf up plex` will update the Traefik config automatically—no separate
`traefik-file` command needed.
**Combining with existing config**
If you already have a `dynamic.yml` with manual routes, middlewares, etc., move it into the
directory and Traefik will merge all files:
```bash
mkdir -p /opt/traefik/dynamic.d
mv /opt/traefik/dynamic.yml /opt/traefik/dynamic.d/manual.yml
cf traefik-file --all -o /opt/traefik/dynamic.d/compose-farm.yml
```
Update your Traefik config to use directory watching instead of a single file:
```yaml
# Before
- --providers.file.filename=/dynamic.yml
# After
- --providers.file.directory=/dynamic.d
- --providers.file.watch=true
```
## Requirements
- Python 3.11+
@@ -171,7 +277,7 @@ published ports.
## How It Works
1. You run `compose-farm up plex`
1. You run `cf up plex`
2. Compose Farm looks up which host runs `plex` (e.g., `nas01`)
3. It SSHs to `nas01` (or runs locally if `localhost`)
4. It executes `docker compose -f /opt/compose/plex/docker-compose.yml up -d`

View File

@@ -0,0 +1,90 @@
# Docker Swarm Overlay Networks with Compose Farm
Notes from testing Docker Swarm's attachable overlay networks as a way to get cross-host container networking while still using `docker compose`.
## The Idea
Docker Swarm overlay networks can be made "attachable", allowing regular `docker compose` containers (not just swarm services) to join them. This would give us:
- Cross-host Docker DNS (containers find each other by name)
- No need to publish ports for inter-container communication
- Keep using `docker compose up` instead of `docker stack deploy`
## Setup Steps
```bash
# On manager node
docker swarm init --advertise-addr <manager-ip>
# On worker nodes (use token from init output)
docker swarm join --token <token> <manager-ip>:2377
# Create attachable overlay network (on manager)
docker network create --driver overlay --attachable my-network
# In compose files, add the network
networks:
my-network:
external: true
```
## Required Ports
Docker Swarm requires these ports open **bidirectionally** between all nodes:
| Port | Protocol | Purpose |
|------|----------|---------|
| 2377 | TCP | Cluster management |
| 7946 | TCP + UDP | Node communication |
| 4789 | UDP | Overlay network traffic (VXLAN) |
## Test Results (2024-12-13)
- docker-debian (192.168.1.66) as manager
- dev-lxc (192.168.1.167) as worker
### What worked
- Swarm init and join
- Overlay network creation
- Nodes showed as Ready
### What failed
- Container on dev-lxc couldn't attach to overlay network
- Error: `attaching to network failed... context deadline exceeded`
- Cause: Port 7946 blocked from docker-debian → dev-lxc
### Root cause
Firewall on dev-lxc wasn't configured to allow swarm ports. Opening these ports requires sudo access on each node.
## Conclusion
Docker Swarm overlay networks are **not plug-and-play**. Requirements:
1. Swarm init/join on all nodes
2. Firewall rules on all nodes (needs sudo/root)
3. All nodes must have bidirectional connectivity on 3 ports
For a simpler alternative, consider:
- **Tailscale**: VPN mesh, containers use host's Tailscale IP
- **Host networking + published ports**: What compose-farm does today
- **Keep dependent services together**: Avoid cross-host networking entirely
## Future Work
If we decide to support overlay networks:
1. Add a `compose-farm network create` command that:
- Initializes swarm if needed
- Creates attachable overlay network
- Documents required firewall rules
2. Add network config to compose-farm.yaml:
```yaml
overlay_network: compose-farm-net
```
3. Auto-inject network into compose files (or document manual setup)

View File

@@ -12,10 +12,12 @@ dependencies = [
"pydantic>=2.0.0",
"asyncssh>=2.14.0",
"pyyaml>=6.0",
"rich>=13.0.0",
]
[project.scripts]
compose-farm = "compose_farm.cli:app"
cf = "compose_farm.cli:app"
[build-system]
requires = ["hatchling", "hatch-vcs"]

View File

@@ -8,6 +8,7 @@ from typing import TYPE_CHECKING, Annotated, TypeVar
import typer
import yaml
from rich.console import Console
from . import __version__
from .config import Config, load_config
@@ -21,12 +22,42 @@ from .ssh import (
run_sequential_on_services,
)
from .state import get_service_host, load_state, remove_service, save_state, set_service_host
from .traefik import generate_traefik_config
if TYPE_CHECKING:
from collections.abc import Coroutine
T = TypeVar("T")
console = Console(highlight=False)
err_console = Console(stderr=True, highlight=False)
def _load_config_or_exit(config_path: Path | None) -> Config:
"""Load config or exit with a friendly error message."""
try:
return load_config(config_path)
except FileNotFoundError as e:
err_console.print(f"[red]✗[/] {e}")
raise typer.Exit(1) from e
def _maybe_regenerate_traefik(cfg: Config) -> None:
"""Regenerate traefik config if traefik_file is configured."""
if cfg.traefik_file is None:
return
try:
dynamic, warnings = generate_traefik_config(cfg, list(cfg.services.keys()))
cfg.traefik_file.parent.mkdir(parents=True, exist_ok=True)
cfg.traefik_file.write_text(yaml.safe_dump(dynamic, sort_keys=False))
console.print() # Ensure we're on a new line after streaming output
console.print(f"[green]✓[/] Traefik config updated: {cfg.traefik_file}")
for warning in warnings:
err_console.print(f"[yellow]![/] {warning}")
except (FileNotFoundError, ValueError) as exc:
err_console.print(f"[yellow]![/] Failed to update traefik config: {exc}")
def _version_callback(value: bool) -> None:
"""Print version and exit."""
@@ -39,6 +70,7 @@ app = typer.Typer(
name="compose-farm",
help="Compose Farm - run docker compose commands across multiple hosts",
no_args_is_help=True,
context_settings={"help_option_names": ["-h", "--help"]},
)
@@ -64,12 +96,12 @@ def _get_services(
config_path: Path | None,
) -> tuple[list[str], Config]:
"""Resolve service list and load config."""
config = load_config(config_path)
config = _load_config_or_exit(config_path)
if all_services:
return list(config.services.keys()), config
if not services:
typer.echo("Error: Specify services or use --all", err=True)
err_console.print("[red]✗[/] Specify services or use --all")
raise typer.Exit(1)
return list(services), config
@@ -84,7 +116,9 @@ def _report_results(results: list[CommandResult]) -> None:
failed = [r for r in results if not r.success]
if failed:
for r in failed:
typer.echo(f"[{r.service}] Failed with exit code {r.exit_code}", err=True)
err_console.print(
f"[cyan]\\[{r.service}][/] [red]Failed with exit code {r.exit_code}[/]"
)
raise typer.Exit(1)
@@ -115,20 +149,23 @@ async def _up_with_migration(
for service in services:
target_host = cfg.services[service]
current_host = get_service_host(service)
current_host = get_service_host(cfg, service)
# If service is deployed elsewhere, migrate it
if current_host and current_host != target_host:
if current_host in cfg.hosts:
typer.echo(f"[{service}] Migrating from {current_host} to {target_host}...")
console.print(
f"[cyan]\\[{service}][/] Migrating from "
f"[magenta]{current_host}[/] → [magenta]{target_host}[/]..."
)
down_result = await run_compose_on_host(cfg, service, current_host, "down")
if not down_result.success:
results.append(down_result)
continue
else:
typer.echo(
f"[{service}] Warning: was on {current_host} (not in config), skipping down",
err=True,
err_console.print(
f"[cyan]\\[{service}][/] [yellow]![/] was on "
f"[magenta]{current_host}[/] (not in config), skipping down"
)
# Start on target host
@@ -137,12 +174,12 @@ async def _up_with_migration(
# Update state on success
if up_result.success:
set_service_host(service, target_host)
set_service_host(cfg, service, target_host)
return results
@app.command()
@app.command(rich_help_panel="Lifecycle")
def up(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -151,10 +188,11 @@ def up(
"""Start services (docker compose up -d). Auto-migrates if host changed."""
svc_list, cfg = _get_services(services or [], all_services, config)
results = _run_async(_up_with_migration(cfg, svc_list))
_maybe_regenerate_traefik(cfg)
_report_results(results)
@app.command()
@app.command(rich_help_panel="Lifecycle")
def down(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -167,12 +205,13 @@ def down(
# Remove from state on success
for result in results:
if result.success:
remove_service(result.service)
remove_service(cfg, result.service)
_maybe_regenerate_traefik(cfg)
_report_results(results)
@app.command()
@app.command(rich_help_panel="Lifecycle")
def pull(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -184,7 +223,7 @@ def pull(
_report_results(results)
@app.command()
@app.command(rich_help_panel="Lifecycle")
def restart(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -193,10 +232,11 @@ def restart(
"""Restart services (down + up)."""
svc_list, cfg = _get_services(services or [], all_services, config)
results = _run_async(run_sequential_on_services(cfg, svc_list, ["down", "up -d"]))
_maybe_regenerate_traefik(cfg)
_report_results(results)
@app.command()
@app.command(rich_help_panel="Lifecycle")
def update(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -205,10 +245,11 @@ def update(
"""Update services (pull + down + up)."""
svc_list, cfg = _get_services(services or [], all_services, config)
results = _run_async(run_sequential_on_services(cfg, svc_list, ["pull", "down", "up -d"]))
_maybe_regenerate_traefik(cfg)
_report_results(results)
@app.command()
@app.command(rich_help_panel="Monitoring")
def logs(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -225,35 +266,17 @@ def logs(
_report_results(results)
@app.command()
@app.command(rich_help_panel="Monitoring")
def ps(
config: ConfigOption = None,
) -> None:
"""Show status of all services."""
cfg = load_config(config)
cfg = _load_config_or_exit(config)
results = _run_async(run_on_services(cfg, list(cfg.services.keys()), "ps"))
_report_results(results)
@app.command()
def snapshot(
services: ServicesArg = None,
all_services: AllOption = False,
log_path: LogPathOption = None,
config: ConfigOption = None,
) -> None:
"""Record current image digests into the Dockerfarm TOML log."""
svc_list, cfg = _get_services(services or [], all_services, config)
try:
path = _run_async(snapshot_services(cfg, svc_list, log_path=log_path))
except RuntimeError as exc: # pragma: no cover - error path
typer.echo(str(exc), err=True)
raise typer.Exit(1) from exc
typer.echo(f"Snapshot written to {path}")
@app.command("traefik-file")
@app.command("traefik-file", rich_help_panel="Configuration")
def traefik_file(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -268,13 +291,11 @@ def traefik_file(
config: ConfigOption = None,
) -> None:
"""Generate a Traefik file-provider fragment from compose Traefik labels."""
from .traefik import generate_traefik_config
svc_list, cfg = _get_services(services or [], all_services, config)
try:
dynamic, warnings = generate_traefik_config(cfg, svc_list)
except (FileNotFoundError, ValueError) as exc:
typer.echo(str(exc), err=True)
err_console.print(f"[red]✗[/] {exc}")
raise typer.Exit(1) from exc
rendered = yaml.safe_dump(dynamic, sort_keys=False)
@@ -282,12 +303,12 @@ def traefik_file(
if output:
output.parent.mkdir(parents=True, exist_ok=True)
output.write_text(rendered)
typer.echo(f"Traefik config written to {output}")
console.print(f"[green]✓[/] Traefik config written to {output}")
else:
typer.echo(rendered)
console.print(rendered)
for warning in warnings:
typer.echo(warning, err=True)
err_console.print(f"[yellow]![/] {warning}")
async def _discover_running_services(cfg: Config) -> dict[str, str]:
@@ -323,39 +344,45 @@ def _report_sync_changes(
) -> None:
"""Report sync changes to the user."""
if added:
typer.echo(f"\nNew services found ({len(added)}):")
console.print(f"\nNew services found ({len(added)}):")
for service in sorted(added):
typer.echo(f" + {service} on {discovered[service]}")
console.print(f" [green]+[/] [cyan]{service}[/] on [magenta]{discovered[service]}[/]")
if changed:
typer.echo(f"\nServices on different hosts ({len(changed)}):")
console.print(f"\nServices on different hosts ({len(changed)}):")
for service, old_host, new_host in sorted(changed):
typer.echo(f" ~ {service}: {old_host} -> {new_host}")
console.print(
f" [yellow]~[/] [cyan]{service}[/]: "
f"[magenta]{old_host}[/] → [magenta]{new_host}[/]"
)
if removed:
typer.echo(f"\nServices no longer running ({len(removed)}):")
console.print(f"\nServices no longer running ({len(removed)}):")
for service in sorted(removed):
typer.echo(f" - {service} (was on {current_state[service]})")
console.print(
f" [red]-[/] [cyan]{service}[/] (was on [magenta]{current_state[service]}[/])"
)
@app.command()
@app.command(rich_help_panel="Configuration")
def sync(
config: ConfigOption = None,
log_path: LogPathOption = None,
dry_run: Annotated[
bool,
typer.Option("--dry-run", "-n", help="Show what would be synced without writing"),
] = False,
) -> None:
"""Discover running services and update state file.
"""Sync local state with running services.
Queries all hosts to find where services are actually running and
updates the state file to match reality. Useful after manual changes
or when first setting up compose-farm on an existing deployment.
Discovers which services are running on which hosts, updates the state
file, and captures image digests. Combines service discovery with
image snapshot into a single command.
"""
cfg = load_config(config)
current_state = load_state()
cfg = _load_config_or_exit(config)
current_state = load_state(cfg)
typer.echo("Discovering running services...")
console.print("Discovering running services...")
discovered = _run_async(_discover_running_services(cfg))
# Calculate changes
@@ -367,20 +394,74 @@ def sync(
if s in current_state and current_state[s] != discovered[s]
]
# Report findings
if not (added or removed or changed):
typer.echo("State is already in sync.")
return
_report_sync_changes(added, removed, changed, discovered, current_state)
# Report state changes
state_changed = bool(added or removed or changed)
if state_changed:
_report_sync_changes(added, removed, changed, discovered, current_state)
else:
console.print("[green]✓[/] State is already in sync.")
if dry_run:
typer.echo("\n(dry-run: no changes made)")
console.print("\n[dim](dry-run: no changes made)[/]")
return
# Apply changes
save_state(discovered)
typer.echo(f"\nState updated: {len(discovered)} services tracked.")
# Update state file
if state_changed:
save_state(cfg, discovered)
console.print(f"\n[green]✓[/] State updated: {len(discovered)} services tracked.")
# Capture image digests for running services
if discovered:
console.print("\nCapturing image digests...")
try:
path = _run_async(snapshot_services(cfg, list(discovered.keys()), log_path=log_path))
console.print(f"[green]✓[/] Digests written to {path}")
except RuntimeError as exc:
err_console.print(f"[yellow]![/] {exc}")
@app.command(rich_help_panel="Configuration")
def check(
config: ConfigOption = None,
) -> None:
"""Check for compose directories not in config (and vice versa)."""
cfg = _load_config_or_exit(config)
configured = set(cfg.services.keys())
on_disk = cfg.discover_compose_dirs()
missing_from_config = sorted(on_disk - configured)
missing_from_disk = sorted(configured - on_disk)
if missing_from_config:
console.print(f"\n[yellow]Not in config[/] ({len(missing_from_config)}):")
for name in missing_from_config:
console.print(f" [yellow]+[/] [cyan]{name}[/]")
if missing_from_disk:
console.print(f"\n[red]No compose file found[/] ({len(missing_from_disk)}):")
for name in missing_from_disk:
console.print(f" [red]-[/] [cyan]{name}[/]")
if not missing_from_config and not missing_from_disk:
console.print("[green]✓[/] All compose directories are in config.")
elif missing_from_config:
console.print(f"\n[dim]To add missing services, append to {cfg.config_path}:[/]")
for name in missing_from_config:
console.print(f"[dim] {name}: docker-debian[/]")
# Check traefik labels have matching ports
try:
_, traefik_warnings = generate_traefik_config(
cfg, list(cfg.services.keys()), check_all=True
)
if traefik_warnings:
console.print(f"\n[yellow]Traefik issues[/] ({len(traefik_warnings)}):")
for warning in traefik_warnings:
console.print(f" [yellow]![/] {warning}")
elif not missing_from_config and not missing_from_disk:
console.print("[green]✓[/] All traefik services have published ports.")
except (FileNotFoundError, ValueError):
pass # Skip traefik check if config can't be loaded
if __name__ == "__main__":

View File

@@ -23,6 +23,13 @@ class Config(BaseModel):
compose_dir: Path = Path("/opt/compose")
hosts: dict[str, Host]
services: dict[str, str] # service_name -> host_name
traefik_file: Path | None = None # Auto-regenerate traefik config after up/down
traefik_service: str | None = None # Service name for Traefik (skip its host in file-provider)
config_path: Path = Path() # Set by load_config()
def get_state_path(self) -> Path:
"""Get the state file path (stored alongside config)."""
return self.config_path.parent / "compose-farm-state.yaml"
@model_validator(mode="after")
def validate_service_hosts(self) -> Config:
@@ -58,6 +65,25 @@ class Config(BaseModel):
# Default to compose.yaml if none exist (will error later)
return service_dir / "compose.yaml"
def discover_compose_dirs(self) -> set[str]:
"""Find all directories in compose_dir that contain a compose file."""
compose_filenames = {
"compose.yaml",
"compose.yml",
"docker-compose.yml",
"docker-compose.yaml",
}
found: set[str] = set()
if not self.compose_dir.exists():
return found
for subdir in self.compose_dir.iterdir():
if subdir.is_dir():
for filename in compose_filenames:
if (subdir / filename).exists():
found.add(subdir.name)
break
return found
def _parse_hosts(raw_hosts: dict[str, str | dict[str, str | int]]) -> dict[str, Host]:
"""Parse hosts from config, handling both simple and full forms."""
@@ -103,5 +129,6 @@ def load_config(path: Path | None = None) -> Config:
# Parse hosts with flexible format support
raw["hosts"] = _parse_hosts(raw.get("hosts", {}))
raw["config_path"] = config_path.resolve()
return Config(**raw)

View File

@@ -3,18 +3,42 @@
from __future__ import annotations
import asyncio
import sys
import socket
from dataclasses import dataclass
from functools import lru_cache
from typing import TYPE_CHECKING, Any
import asyncssh
from rich.console import Console
from rich.markup import escape
if TYPE_CHECKING:
from .config import Config, Host
_console = Console(highlight=False)
_err_console = Console(stderr=True, highlight=False)
LOCAL_ADDRESSES = frozenset({"local", "localhost", "127.0.0.1", "::1"})
@lru_cache(maxsize=1)
def _get_local_ips() -> frozenset[str]:
"""Get all IP addresses of the current machine."""
ips: set[str] = set()
try:
hostname = socket.gethostname()
# Get all addresses for hostname
for info in socket.getaddrinfo(hostname, None):
ips.add(info[4][0])
# Also try getting the default outbound IP
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as s:
s.connect(("8.8.8.8", 80))
ips.add(s.getsockname()[0])
except OSError:
pass
return frozenset(ips)
@dataclass
class CommandResult:
"""Result of a command execution."""
@@ -28,7 +52,11 @@ class CommandResult:
def _is_local(host: Host) -> bool:
"""Check if host should run locally (no SSH)."""
return host.address.lower() in LOCAL_ADDRESSES
addr = host.address.lower()
if addr in LOCAL_ADDRESSES:
return True
# Check if address matches any of this machine's IPs
return addr in _get_local_ips()
async def _run_local_command(
@@ -53,12 +81,14 @@ async def _run_local_command(
*,
is_stderr: bool = False,
) -> None:
output = sys.stderr if is_stderr else sys.stdout
console = _err_console if is_stderr else _console
while True:
line = await reader.readline()
if not line:
break
print(f"[{prefix}] {line.decode()}", end="", file=output, flush=True)
text = line.decode()
if text.strip(): # Skip empty lines
console.print(f"[cyan]\\[{prefix}][/] {escape(text)}", end="")
await asyncio.gather(
read_stream(proc.stdout, service),
@@ -80,7 +110,7 @@ async def _run_local_command(
stderr=stderr_data.decode() if stderr_data else "",
)
except OSError as e:
print(f"[{service}] Local error: {e}", file=sys.stderr)
_err_console.print(f"[cyan]\\[{service}][/] [red]Local error:[/] {e}")
return CommandResult(service=service, exit_code=1, success=False)
@@ -111,9 +141,10 @@ async def _run_ssh_command(
*,
is_stderr: bool = False,
) -> None:
output = sys.stderr if is_stderr else sys.stdout
console = _err_console if is_stderr else _console
async for line in reader:
print(f"[{prefix}] {line}", end="", file=output, flush=True)
if line.strip(): # Skip empty lines
console.print(f"[cyan]\\[{prefix}][/] {escape(line)}", end="")
await asyncio.gather(
read_stream(proc.stdout, service),
@@ -135,7 +166,7 @@ async def _run_ssh_command(
stderr=stderr_data,
)
except (OSError, asyncssh.Error) as e:
print(f"[{service}] SSH error: {e}", file=sys.stderr)
_err_console.print(f"[cyan]\\[{service}][/] [red]SSH error:[/] {e}")
return CommandResult(service=service, exit_code=1, success=False)

View File

@@ -2,25 +2,20 @@
from __future__ import annotations
from pathlib import Path
from typing import Any
from typing import TYPE_CHECKING, Any
import yaml
def _get_state_path() -> Path:
"""Get the path to the state file."""
state_dir = Path.home() / ".config" / "compose-farm"
state_dir.mkdir(parents=True, exist_ok=True)
return state_dir / "state.yaml"
if TYPE_CHECKING:
from .config import Config
def load_state() -> dict[str, str]:
def load_state(config: Config) -> dict[str, str]:
"""Load the current deployment state.
Returns a dict mapping service names to host names.
"""
state_path = _get_state_path()
state_path = config.get_state_path()
if not state_path.exists():
return {}
@@ -31,28 +26,28 @@ def load_state() -> dict[str, str]:
return deployed
def save_state(deployed: dict[str, str]) -> None:
def save_state(config: Config, deployed: dict[str, str]) -> None:
"""Save the deployment state."""
state_path = _get_state_path()
state_path = config.get_state_path()
with state_path.open("w") as f:
yaml.safe_dump({"deployed": deployed}, f, sort_keys=False)
def get_service_host(service: str) -> str | None:
def get_service_host(config: Config, service: str) -> str | None:
"""Get the host where a service is currently deployed."""
state = load_state()
state = load_state(config)
return state.get(service)
def set_service_host(service: str, host: str) -> None:
def set_service_host(config: Config, service: str, host: str) -> None:
"""Record that a service is deployed on a host."""
state = load_state()
state = load_state(config)
state[service] = host
save_state(state)
save_state(config, state)
def remove_service(service: str) -> None:
def remove_service(config: Config, service: str) -> None:
"""Remove a service from the state (after down)."""
state = load_state()
state = load_state(config)
state.pop(service, None)
save_state(state)
save_state(config, state)

View File

@@ -15,6 +15,8 @@ from typing import TYPE_CHECKING, Any
import yaml
from .ssh import LOCAL_ADDRESSES
if TYPE_CHECKING:
from pathlib import Path
@@ -27,7 +29,6 @@ class PortMapping:
target: int
published: int | None
protocol: str | None = None
@dataclass
@@ -119,7 +120,7 @@ def _parse_ports(raw: Any, env: dict[str, str]) -> list[PortMapping]: # noqa: P
for item in items:
if isinstance(item, str):
interpolated = _interpolate(item, env)
port_spec, _, protocol = interpolated.partition("/")
port_spec, _, _ = interpolated.partition("/")
parts = port_spec.split(":")
published: int | None = None
target: int | None = None
@@ -134,9 +135,7 @@ def _parse_ports(raw: Any, env: dict[str, str]) -> list[PortMapping]: # noqa: P
target = int(parts[-1])
if target is not None:
mappings.append(
PortMapping(target=target, published=published, protocol=protocol or None)
)
mappings.append(PortMapping(target=target, published=published))
elif isinstance(item, dict):
target_raw = item.get("target")
if isinstance(target_raw, str):
@@ -156,14 +155,7 @@ def _parse_ports(raw: Any, env: dict[str, str]) -> list[PortMapping]: # noqa: P
published_val = int(str(published_raw)) if published_raw is not None else None
except (TypeError, ValueError):
published_val = None
protocol_val = item.get("protocol")
mappings.append(
PortMapping(
target=target_val,
published=published_val,
protocol=str(protocol_val) if protocol_val else None,
)
)
mappings.append(PortMapping(target=target_val, published=published_val))
return mappings
@@ -289,8 +281,8 @@ def _finalize_http_services(
if published_port is None:
warnings.append(
f"[{source.stack}/{source.compose_service}] "
f"No host-published port found for Traefik service '{traefik_service}'. "
"Traefik will require L3 reachability to container IPs."
f"No published port found for Traefik service '{traefik_service}'. "
"Add a ports: mapping (e.g., '8080:8080') for cross-host routing."
)
continue
@@ -398,10 +390,28 @@ def _process_service_label(
source.scheme = str(_parse_value(key_without_prefix, label_value))
def _get_ports_for_service(
definition: dict[str, Any],
all_services: dict[str, Any],
env: dict[str, str],
) -> list[PortMapping]:
"""Get ports for a service, following network_mode: service:X if present."""
network_mode = definition.get("network_mode", "")
if isinstance(network_mode, str) and network_mode.startswith("service:"):
# Service uses another service's network - get ports from that service
ref_service = network_mode[len("service:") :]
if ref_service in all_services:
ref_def = all_services[ref_service]
if isinstance(ref_def, dict):
return _parse_ports(ref_def.get("ports"), env)
return _parse_ports(definition.get("ports"), env)
def _process_service_labels(
stack: str,
compose_service: str,
definition: dict[str, Any],
all_services: dict[str, Any],
host_address: str,
env: dict[str, str],
dynamic: dict[str, Any],
@@ -415,7 +425,7 @@ def _process_service_labels(
if enable_raw is not None and _parse_value("enable", enable_raw) is False:
return
ports = _parse_ports(definition.get("ports"), env)
ports = _get_ports_for_service(definition, all_services, env)
routers: dict[str, bool] = {}
service_names: set[str] = set()
@@ -450,17 +460,41 @@ def _process_service_labels(
def generate_traefik_config(
config: Config,
services: list[str],
*,
check_all: bool = False,
) -> tuple[dict[str, Any], list[str]]:
"""Generate Traefik dynamic config from compose labels.
Args:
config: The compose-farm config.
services: List of service names to process.
check_all: If True, check all services for warnings (ignore host filtering).
Used by the check command to validate all traefik labels.
Returns (config_dict, warnings).
"""
dynamic: dict[str, Any] = {}
warnings: list[str] = []
sources: dict[str, TraefikServiceSource] = {}
# Determine Traefik's host from service assignment
traefik_host = None
if config.traefik_service and not check_all:
traefik_host = config.services.get(config.traefik_service)
for stack in services:
raw_services, env, host_address = _load_stack(config, stack)
stack_host = config.services.get(stack)
# Skip services on Traefik's host - docker provider handles them directly
# (unless check_all is True, for validation purposes)
if not check_all:
if host_address.lower() in LOCAL_ADDRESSES:
continue
if traefik_host and stack_host == traefik_host:
continue
for compose_service, definition in raw_services.items():
if not isinstance(definition, dict):
continue
@@ -468,6 +502,7 @@ def generate_traefik_config(
stack,
compose_service,
definition,
raw_services,
host_address,
env,
dynamic,

View File

@@ -75,7 +75,8 @@ class TestConfig:
services={"plex": "nas01"},
)
path = config.get_compose_path("plex")
assert path == Path("/opt/compose/plex/docker-compose.yml")
# Defaults to compose.yaml when no file exists
assert path == Path("/opt/compose/plex/compose.yaml")
class TestLoadConfig:

View File

@@ -4,7 +4,7 @@ from pathlib import Path
import pytest
from compose_farm import state as state_module
from compose_farm.config import Config, Host
from compose_farm.state import (
get_service_host,
load_state,
@@ -15,52 +15,51 @@ from compose_farm.state import (
@pytest.fixture
def state_dir(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
"""Create a temporary state directory and patch _get_state_path."""
state_path = tmp_path / ".config" / "compose-farm"
state_path.mkdir(parents=True)
def mock_get_state_path() -> Path:
return state_path / "state.yaml"
monkeypatch.setattr(state_module, "_get_state_path", mock_get_state_path)
return state_path
def config(tmp_path: Path) -> Config:
"""Create a config with a temporary config path for state storage."""
config_path = tmp_path / "compose-farm.yaml"
config_path.write_text("") # Create empty file
return Config(
compose_dir=tmp_path / "compose",
hosts={"nas01": Host(address="192.168.1.10")},
services={"plex": "nas01"},
config_path=config_path,
)
class TestLoadState:
"""Tests for load_state function."""
def test_load_state_empty(self, state_dir: Path) -> None:
def test_load_state_empty(self, config: Config) -> None:
"""Returns empty dict when state file doesn't exist."""
_ = state_dir # Fixture activates the mock
result = load_state()
result = load_state(config)
assert result == {}
def test_load_state_with_data(self, state_dir: Path) -> None:
def test_load_state_with_data(self, config: Config) -> None:
"""Loads existing state from file."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n jellyfin: nas02\n")
result = load_state()
result = load_state(config)
assert result == {"plex": "nas01", "jellyfin": "nas02"}
def test_load_state_empty_file(self, state_dir: Path) -> None:
def test_load_state_empty_file(self, config: Config) -> None:
"""Returns empty dict for empty file."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("")
result = load_state()
result = load_state(config)
assert result == {}
class TestSaveState:
"""Tests for save_state function."""
def test_save_state(self, state_dir: Path) -> None:
def test_save_state(self, config: Config) -> None:
"""Saves state to file."""
save_state({"plex": "nas01", "jellyfin": "nas02"})
save_state(config, {"plex": "nas01", "jellyfin": "nas02"})
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
assert state_file.exists()
content = state_file.read_text()
assert "plex: nas01" in content
@@ -70,65 +69,64 @@ class TestSaveState:
class TestGetServiceHost:
"""Tests for get_service_host function."""
def test_get_existing_service(self, state_dir: Path) -> None:
def test_get_existing_service(self, config: Config) -> None:
"""Returns host for existing service."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n")
host = get_service_host("plex")
host = get_service_host(config, "plex")
assert host == "nas01"
def test_get_nonexistent_service(self, state_dir: Path) -> None:
def test_get_nonexistent_service(self, config: Config) -> None:
"""Returns None for service not in state."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n")
host = get_service_host("unknown")
host = get_service_host(config, "unknown")
assert host is None
class TestSetServiceHost:
"""Tests for set_service_host function."""
def test_set_new_service(self, state_dir: Path) -> None:
def test_set_new_service(self, config: Config) -> None:
"""Adds new service to state."""
_ = state_dir # Fixture activates the mock
set_service_host("plex", "nas01")
set_service_host(config, "plex", "nas01")
result = load_state()
result = load_state(config)
assert result["plex"] == "nas01"
def test_update_existing_service(self, state_dir: Path) -> None:
def test_update_existing_service(self, config: Config) -> None:
"""Updates host for existing service."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n")
set_service_host("plex", "nas02")
set_service_host(config, "plex", "nas02")
result = load_state()
result = load_state(config)
assert result["plex"] == "nas02"
class TestRemoveService:
"""Tests for remove_service function."""
def test_remove_existing_service(self, state_dir: Path) -> None:
def test_remove_existing_service(self, config: Config) -> None:
"""Removes service from state."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n jellyfin: nas02\n")
remove_service("plex")
remove_service(config, "plex")
result = load_state()
result = load_state(config)
assert "plex" not in result
assert result["jellyfin"] == "nas02"
def test_remove_nonexistent_service(self, state_dir: Path) -> None:
def test_remove_nonexistent_service(self, config: Config) -> None:
"""Removing nonexistent service doesn't error."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n")
remove_service("unknown") # Should not raise
remove_service(config, "unknown") # Should not raise
result = load_state()
result = load_state(config)
assert result["plex"] == "nas01"

View File

@@ -171,4 +171,4 @@ class TestReportSyncChanges:
)
captured = capsys.readouterr()
assert "Services on different hosts (1)" in captured.out
assert "~ plex: nas01 -> nas02" in captured.out
assert "~ plex: nas01 nas02" in captured.out

View File

@@ -76,7 +76,7 @@ def test_generate_traefik_config_without_published_port_warns(tmp_path: Path) ->
dynamic, warnings = generate_traefik_config(cfg, ["app"])
assert dynamic["http"]["routers"]["app"]["rule"] == "Host(`app.lab.mydomain.org`)"
assert any("No host-published port found" in warning for warning in warnings)
assert any("No published port found" in warning for warning in warnings)
def test_generate_interpolates_env_and_infers_router_service(tmp_path: Path) -> None:
@@ -193,3 +193,51 @@ def test_generate_skips_services_with_enable_false(tmp_path: Path) -> None:
assert dynamic == {}
assert warnings == []
def test_generate_follows_network_mode_service_for_ports(tmp_path: Path) -> None:
"""Services using network_mode: service:X should use ports from service X."""
cfg = Config(
compose_dir=tmp_path,
hosts={"nas01": Host(address="192.168.1.10")},
services={"vpn-stack": "nas01"},
)
compose_path = tmp_path / "vpn-stack" / "docker-compose.yml"
_write_compose(
compose_path,
{
"services": {
"vpn": {
"image": "gluetun",
"ports": ["5080:5080", "9696:9696"],
},
"qbittorrent": {
"image": "qbittorrent",
"network_mode": "service:vpn",
"labels": [
"traefik.enable=true",
"traefik.http.routers.torrent.rule=Host(`torrent.example.com`)",
"traefik.http.services.torrent.loadbalancer.server.port=5080",
],
},
"prowlarr": {
"image": "prowlarr",
"network_mode": "service:vpn",
"labels": [
"traefik.enable=true",
"traefik.http.routers.prowlarr.rule=Host(`prowlarr.example.com`)",
"traefik.http.services.prowlarr.loadbalancer.server.port=9696",
],
},
}
},
)
dynamic, warnings = generate_traefik_config(cfg, ["vpn-stack"])
assert warnings == []
# Both services should get their ports from the vpn service
torrent_servers = dynamic["http"]["services"]["torrent"]["loadbalancer"]["servers"]
assert torrent_servers == [{"url": "http://192.168.1.10:5080"}]
prowlarr_servers = dynamic["http"]["services"]["prowlarr"]["loadbalancer"]["servers"]
assert prowlarr_servers == [{"url": "http://192.168.1.10:9696"}]

2
uv.lock generated
View File

@@ -131,6 +131,7 @@ dependencies = [
{ name = "asyncssh" },
{ name = "pydantic" },
{ name = "pyyaml" },
{ name = "rich" },
{ name = "typer" },
]
@@ -151,6 +152,7 @@ requires-dist = [
{ name = "asyncssh", specifier = ">=2.14.0" },
{ name = "pydantic", specifier = ">=2.0.0" },
{ name = "pyyaml", specifier = ">=6.0" },
{ name = "rich", specifier = ">=13.0.0" },
{ name = "typer", specifier = ">=0.9.0" },
]