Compare commits

..

35 Commits

Author SHA1 Message Date
Bas Nijholt
27f17a2451 Remove unused PortMapping.protocol field 2025-12-14 00:52:47 -08:00
Bas Nijholt
98c2492d21 docs: Add cf alias and check command to README 2025-12-14 00:41:26 -08:00
Bas Nijholt
04339cbb9a Group CLI commands into Lifecycle, Monitoring, Configuration 2025-12-14 00:37:18 -08:00
Bas Nijholt
cdb3b1d257 Show friendly error when config file not found
Instead of a Python traceback, display a clean error message with
the red ✗ symbol when the config file cannot be found.
2025-12-14 00:31:36 -08:00
Bas Nijholt
0913769729 Fix check command to validate all services with check_all flag 2025-12-14 00:23:23 -08:00
Bas Nijholt
3a1d5b77b5 Add traefik port validation to check command 2025-12-14 00:19:17 -08:00
Bas Nijholt
e12002ce86 Add test for network_mode: service:X port lookup 2025-12-14 00:03:11 -08:00
Bas Nijholt
676a6fe72d Support network_mode: service:X for port lookup in traefik config 2025-12-14 00:02:07 -08:00
Bas Nijholt
f29f8938fe Add -h as alias for --help 2025-12-13 23:56:33 -08:00
Bas Nijholt
4c0e147786 Escape log output to prevent Rich markup errors 2025-12-13 23:55:44 -08:00
Bas Nijholt
cba61118de Add cf alias for compose-farm command 2025-12-13 23:54:00 -08:00
Bas Nijholt
32dc6b3665 Skip empty lines in streaming output 2025-12-13 23:50:35 -08:00
Bas Nijholt
7d98e664e9 Auto-detect local IPs to skip SSH when on target host 2025-12-13 23:48:28 -08:00
Bas Nijholt
6763403700 Fix duplicate prefix before traefik config message 2025-12-13 23:46:41 -08:00
Bas Nijholt
feb0e13bfd Add check command to find missing services 2025-12-13 23:43:47 -08:00
Bas Nijholt
b86f6d190f Add Rich styling to CLI output
- Service names in cyan, host names in magenta
- Success checkmarks, warning/error symbols
- Colored sync diff indicators (+/-/~)
- Unicode arrows for migrations
2025-12-13 23:40:07 -08:00
Bas Nijholt
5ed15b5445 docs: Add Docker Swarm overlay network notes 2025-12-13 23:16:09 -08:00
Bas Nijholt
761b6dd2d1 Rename state file to compose-farm-state.yaml (not hidden) 2025-12-13 23:01:40 -08:00
Bas Nijholt
e86c2b6d47 docs: Simplify Traefik port requirement note 2025-12-13 22:59:50 -08:00
basnijholt
9353b74c35 chore(docs): update TOC 2025-12-14 06:58:15 +00:00
Bas Nijholt
b7e8e0f3a9 docs: Add limitations and best practices section
Documents cross-host networking limitations:
- Docker DNS doesn't work across hosts
- Dependent services should stay in same compose file
- Ports must be published for cross-host communication
2025-12-13 22:58:01 -08:00
Bas Nijholt
b6c02587bc Rename traefik_host to traefik_service
Instead of specifying the host directly, specify the service name
that runs Traefik. The host is then looked up from the services
mapping, avoiding redundancy.
2025-12-13 22:43:33 -08:00
Bas Nijholt
d412c42ca4 Store state file alongside config file
State is now stored at .compose-farm-state.yaml in the same
directory as the config file. This allows multiple compose-farm
setups with independent state.

State functions now require a Config parameter to locate the
state file via config.get_state_path().
2025-12-13 22:38:11 -08:00
Bas Nijholt
13e0adbbb9 Add traefik_host config to skip local services
When traefik_host is set, services on that host are skipped in
file-provider generation since Traefik's docker provider handles
them directly. This allows running compose-farm from any host
while still generating correct file-provider config.
2025-12-13 22:34:20 -08:00
Bas Nijholt
68c41eb37c Improve missing ports warning message
Replace technical "L3 reachability" phrasing with actionable
guidance: "Add a ports: mapping for cross-host routing."
2025-12-13 22:29:20 -08:00
Bas Nijholt
8af088bb5d Add traefik_file config for auto-regeneration
When traefik_file is set in config, compose-farm automatically
regenerates the Traefik file-provider config after up, down,
restart, and update commands. Eliminates the need to manually
run traefik-file after service changes.
2025-12-13 22:24:29 -08:00
Bas Nijholt
1308eeca12 fix: Skip local services in traefik-file generation
Local services (localhost, local, 127.0.0.1) are handled by Traefik's
docker provider directly. Generating file-provider entries for them
creates conflicting routes with broken localhost URLs (since Traefik
runs in a container where localhost is isolated).

Now traefik-file only generates config for remote services.
2025-12-13 19:51:57 -08:00
Bas Nijholt
a66a68f395 docs: Clarify no merge commits to main rule 2025-12-13 19:44:30 -08:00
Bas Nijholt
6ea25c862e docs: Add traefik directory merging instructions 2025-12-13 19:41:16 -08:00
Bas Nijholt
280524b546 docs: Use GitHub admonition for TL;DR 2025-12-13 19:34:31 -08:00
Bas Nijholt
db9360771b docs: Shorten TL;DR 2025-12-13 19:33:28 -08:00
Bas Nijholt
c7590ed0b7 docs: Move TOC below TL;DR 2025-12-13 19:32:41 -08:00
Bas Nijholt
bb563b9d4b docs: Add TL;DR to README 2025-12-13 19:31:05 -08:00
Bas Nijholt
fe160ee116 fix: Move traefik import to top-level 2025-12-13 17:07:29 -08:00
Bas Nijholt
4c7f49414f docs: Update README for sync command and auto-migration
- Replace snapshot with sync command
- Add auto-migration documentation
- Update compose file naming convention
2025-12-13 16:55:07 -08:00
13 changed files with 588 additions and 166 deletions

View File

@@ -31,7 +31,7 @@ compose_farm/
## Git Safety
- Never amend commits.
- Never merge into a branch; prefer fast-forward or rebase as directed.
- **NEVER merge anything into main.** Always commit directly or use fast-forward/rebase.
- Never force push.
## Commands Quick Reference

170
README.md
View File

@@ -1,23 +1,30 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Compose Farm](#compose-farm)
- [Why Compose Farm?](#why-compose-farm)
- [Key Assumption: Shared Storage](#key-assumption-shared-storage)
- [Installation](#installation)
- [Configuration](#configuration)
- [Usage](#usage)
- [Traefik Multihost Ingress (File Provider)](#traefik-multihost-ingress-file-provider)
- [Requirements](#requirements)
- [How It Works](#how-it-works)
- [License](#license)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
# Compose Farm
A minimal CLI tool to run Docker Compose commands across multiple hosts via SSH.
> [!NOTE]
> Run `docker compose` commands across multiple hosts via SSH. One YAML maps services to hosts. Change the mapping, run `up`, and it auto-migrates. No Kubernetes, no Swarm, no magic.
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Why Compose Farm?](#why-compose-farm)
- [Key Assumption: Shared Storage](#key-assumption-shared-storage)
- [Limitations & Best Practices](#limitations--best-practices)
- [What breaks when you move a service](#what-breaks-when-you-move-a-service)
- [Best practices](#best-practices)
- [What Compose Farm doesn't do](#what-compose-farm-doesnt-do)
- [Installation](#installation)
- [Configuration](#configuration)
- [Usage](#usage)
- [Auto-Migration](#auto-migration)
- [Traefik Multihost Ingress (File Provider)](#traefik-multihost-ingress-file-provider)
- [Requirements](#requirements)
- [How It Works](#how-it-works)
- [License](#license)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Why Compose Farm?
I run 100+ Docker Compose stacks on an LXC container that frequently runs out of memory. I needed a way to distribute services across multiple machines without the complexity of:
@@ -44,6 +51,37 @@ nas:/volume1/compose → /opt/compose (on nas03)
Compose Farm simply runs `docker compose -f /opt/compose/{service}/docker-compose.yml` on the appropriate host—it doesn't copy or sync files.
## Limitations & Best Practices
Compose Farm moves containers between hosts but **does not provide cross-host networking**. Docker's internal DNS and networks don't span hosts.
### What breaks when you move a service
- **Docker DNS** - `http://redis:6379` won't resolve from another host
- **Docker networks** - Containers can't reach each other via network names
- **Environment variables** - `DATABASE_URL=postgres://db:5432` stops working
### Best practices
1. **Keep dependent services together** - If an app needs a database, redis, or worker, keep them in the same compose file on the same host
2. **Only migrate standalone services** - Services that don't talk to other containers (or only talk to external APIs) are safe to move
3. **Expose ports for cross-host communication** - If services must communicate across hosts, publish ports and use IP addresses instead of container names:
```yaml
# Instead of: DATABASE_URL=postgres://db:5432
# Use: DATABASE_URL=postgres://192.168.1.66:5432
```
This includes Traefik routing—containers need published ports for the file-provider to reach them
### What Compose Farm doesn't do
- No overlay networking (use Docker Swarm or Kubernetes for that)
- No service discovery across hosts
- No automatic dependency tracking between compose files
If you need containers on different hosts to communicate seamlessly, you need Docker Swarm, Kubernetes, or a service mesh—which adds the complexity Compose Farm is designed to avoid.
## Installation
```bash
@@ -75,37 +113,59 @@ services:
radarr: local # Runs on the machine where you invoke compose-farm
```
Compose files are expected at `{compose_dir}/{service}/docker-compose.yml`.
Compose files are expected at `{compose_dir}/{service}/compose.yaml` (also supports `compose.yml`, `docker-compose.yml`, `docker-compose.yaml`).
## Usage
The CLI is available as both `compose-farm` and the shorter `cf` alias.
```bash
# Start services
compose-farm up plex jellyfin
compose-farm up --all
# Start services (auto-migrates if host changed in config)
cf up plex jellyfin
cf up --all
# Stop services
compose-farm down plex
cf down plex
# Pull latest images
compose-farm pull --all
cf pull --all
# Restart (down + up)
compose-farm restart plex
cf restart plex
# Update (pull + down + up) - the end-to-end update command
compose-farm update --all
cf update --all
# Capture image digests to a TOML log (per service or all)
compose-farm snapshot plex
compose-farm snapshot --all # writes ~/.config/compose-farm/dockerfarm-log.toml
# Sync state with reality (discovers running services + captures image digests)
cf sync # updates state.yaml and dockerfarm-log.toml
cf sync --dry-run # preview without writing
# Check config vs disk (find missing services, validate traefik labels)
cf check
# View logs
compose-farm logs plex
compose-farm logs -f plex # follow
cf logs plex
cf logs -f plex # follow
# Show status
compose-farm ps
cf ps
```
### Auto-Migration
When you change a service's host assignment in config and run `up`, Compose Farm automatically:
1. Runs `down` on the old host
2. Runs `up -d` on the new host
3. Updates state tracking
```yaml
# Before: plex runs on nas01
services:
plex: nas01
# After: change to nas02, then run `cf up plex`
services:
plex: nas02 # Compose Farm will migrate automatically
```
## Traefik Multihost Ingress (File Provider)
@@ -156,12 +216,58 @@ providers:
**Generate the fragment**
```bash
compose-farm traefik-file --output /mnt/data/traefik/dynamic.d/compose-farm.generated.yml
cf traefik-file --all --output /mnt/data/traefik/dynamic.d/compose-farm.yml
```
Rerun this after changing Traefik labels, moving a service to another host, or changing
published ports.
**Auto-regeneration**
To automatically regenerate the Traefik config after `up`, `down`, `restart`, or `update`,
add `traefik_file` to your config:
```yaml
compose_dir: /opt/compose
traefik_file: /opt/traefik/dynamic.d/compose-farm.yml # auto-regenerate on up/down/restart/update
traefik_service: traefik # skip services on same host (docker provider handles them)
hosts:
# ...
services:
traefik: nas01 # Traefik runs here
plex: nas02 # Services on other hosts get file-provider entries
# ...
```
The `traefik_service` option specifies which service runs Traefik. Services on the same host
are skipped in the file-provider config since Traefik's docker provider handles them directly.
Now `cf up plex` will update the Traefik config automatically—no separate
`traefik-file` command needed.
**Combining with existing config**
If you already have a `dynamic.yml` with manual routes, middlewares, etc., move it into the
directory and Traefik will merge all files:
```bash
mkdir -p /opt/traefik/dynamic.d
mv /opt/traefik/dynamic.yml /opt/traefik/dynamic.d/manual.yml
cf traefik-file --all -o /opt/traefik/dynamic.d/compose-farm.yml
```
Update your Traefik config to use directory watching instead of a single file:
```yaml
# Before
- --providers.file.filename=/dynamic.yml
# After
- --providers.file.directory=/dynamic.d
- --providers.file.watch=true
```
## Requirements
- Python 3.11+
@@ -171,7 +277,7 @@ published ports.
## How It Works
1. You run `compose-farm up plex`
1. You run `cf up plex`
2. Compose Farm looks up which host runs `plex` (e.g., `nas01`)
3. It SSHs to `nas01` (or runs locally if `localhost`)
4. It executes `docker compose -f /opt/compose/plex/docker-compose.yml up -d`

View File

@@ -0,0 +1,90 @@
# Docker Swarm Overlay Networks with Compose Farm
Notes from testing Docker Swarm's attachable overlay networks as a way to get cross-host container networking while still using `docker compose`.
## The Idea
Docker Swarm overlay networks can be made "attachable", allowing regular `docker compose` containers (not just swarm services) to join them. This would give us:
- Cross-host Docker DNS (containers find each other by name)
- No need to publish ports for inter-container communication
- Keep using `docker compose up` instead of `docker stack deploy`
## Setup Steps
```bash
# On manager node
docker swarm init --advertise-addr <manager-ip>
# On worker nodes (use token from init output)
docker swarm join --token <token> <manager-ip>:2377
# Create attachable overlay network (on manager)
docker network create --driver overlay --attachable my-network
# In compose files, add the network
networks:
my-network:
external: true
```
## Required Ports
Docker Swarm requires these ports open **bidirectionally** between all nodes:
| Port | Protocol | Purpose |
|------|----------|---------|
| 2377 | TCP | Cluster management |
| 7946 | TCP + UDP | Node communication |
| 4789 | UDP | Overlay network traffic (VXLAN) |
## Test Results (2024-12-13)
- docker-debian (192.168.1.66) as manager
- dev-lxc (192.168.1.167) as worker
### What worked
- Swarm init and join
- Overlay network creation
- Nodes showed as Ready
### What failed
- Container on dev-lxc couldn't attach to overlay network
- Error: `attaching to network failed... context deadline exceeded`
- Cause: Port 7946 blocked from docker-debian → dev-lxc
### Root cause
Firewall on dev-lxc wasn't configured to allow swarm ports. Opening these ports requires sudo access on each node.
## Conclusion
Docker Swarm overlay networks are **not plug-and-play**. Requirements:
1. Swarm init/join on all nodes
2. Firewall rules on all nodes (needs sudo/root)
3. All nodes must have bidirectional connectivity on 3 ports
For a simpler alternative, consider:
- **Tailscale**: VPN mesh, containers use host's Tailscale IP
- **Host networking + published ports**: What compose-farm does today
- **Keep dependent services together**: Avoid cross-host networking entirely
## Future Work
If we decide to support overlay networks:
1. Add a `compose-farm network create` command that:
- Initializes swarm if needed
- Creates attachable overlay network
- Documents required firewall rules
2. Add network config to compose-farm.yaml:
```yaml
overlay_network: compose-farm-net
```
3. Auto-inject network into compose files (or document manual setup)

View File

@@ -12,10 +12,12 @@ dependencies = [
"pydantic>=2.0.0",
"asyncssh>=2.14.0",
"pyyaml>=6.0",
"rich>=13.0.0",
]
[project.scripts]
compose-farm = "compose_farm.cli:app"
cf = "compose_farm.cli:app"
[build-system]
requires = ["hatchling", "hatch-vcs"]

View File

@@ -8,6 +8,7 @@ from typing import TYPE_CHECKING, Annotated, TypeVar
import typer
import yaml
from rich.console import Console
from . import __version__
from .config import Config, load_config
@@ -21,12 +22,42 @@ from .ssh import (
run_sequential_on_services,
)
from .state import get_service_host, load_state, remove_service, save_state, set_service_host
from .traefik import generate_traefik_config
if TYPE_CHECKING:
from collections.abc import Coroutine
T = TypeVar("T")
console = Console(highlight=False)
err_console = Console(stderr=True, highlight=False)
def _load_config_or_exit(config_path: Path | None) -> Config:
"""Load config or exit with a friendly error message."""
try:
return load_config(config_path)
except FileNotFoundError as e:
err_console.print(f"[red]✗[/] {e}")
raise typer.Exit(1) from e
def _maybe_regenerate_traefik(cfg: Config) -> None:
"""Regenerate traefik config if traefik_file is configured."""
if cfg.traefik_file is None:
return
try:
dynamic, warnings = generate_traefik_config(cfg, list(cfg.services.keys()))
cfg.traefik_file.parent.mkdir(parents=True, exist_ok=True)
cfg.traefik_file.write_text(yaml.safe_dump(dynamic, sort_keys=False))
console.print() # Ensure we're on a new line after streaming output
console.print(f"[green]✓[/] Traefik config updated: {cfg.traefik_file}")
for warning in warnings:
err_console.print(f"[yellow]![/] {warning}")
except (FileNotFoundError, ValueError) as exc:
err_console.print(f"[yellow]![/] Failed to update traefik config: {exc}")
def _version_callback(value: bool) -> None:
"""Print version and exit."""
@@ -39,6 +70,7 @@ app = typer.Typer(
name="compose-farm",
help="Compose Farm - run docker compose commands across multiple hosts",
no_args_is_help=True,
context_settings={"help_option_names": ["-h", "--help"]},
)
@@ -64,12 +96,12 @@ def _get_services(
config_path: Path | None,
) -> tuple[list[str], Config]:
"""Resolve service list and load config."""
config = load_config(config_path)
config = _load_config_or_exit(config_path)
if all_services:
return list(config.services.keys()), config
if not services:
typer.echo("Error: Specify services or use --all", err=True)
err_console.print("[red]✗[/] Specify services or use --all")
raise typer.Exit(1)
return list(services), config
@@ -84,7 +116,9 @@ def _report_results(results: list[CommandResult]) -> None:
failed = [r for r in results if not r.success]
if failed:
for r in failed:
typer.echo(f"[{r.service}] Failed with exit code {r.exit_code}", err=True)
err_console.print(
f"[cyan]\\[{r.service}][/] [red]Failed with exit code {r.exit_code}[/]"
)
raise typer.Exit(1)
@@ -115,20 +149,23 @@ async def _up_with_migration(
for service in services:
target_host = cfg.services[service]
current_host = get_service_host(service)
current_host = get_service_host(cfg, service)
# If service is deployed elsewhere, migrate it
if current_host and current_host != target_host:
if current_host in cfg.hosts:
typer.echo(f"[{service}] Migrating from {current_host} to {target_host}...")
console.print(
f"[cyan]\\[{service}][/] Migrating from "
f"[magenta]{current_host}[/] → [magenta]{target_host}[/]..."
)
down_result = await run_compose_on_host(cfg, service, current_host, "down")
if not down_result.success:
results.append(down_result)
continue
else:
typer.echo(
f"[{service}] Warning: was on {current_host} (not in config), skipping down",
err=True,
err_console.print(
f"[cyan]\\[{service}][/] [yellow]![/] was on "
f"[magenta]{current_host}[/] (not in config), skipping down"
)
# Start on target host
@@ -137,12 +174,12 @@ async def _up_with_migration(
# Update state on success
if up_result.success:
set_service_host(service, target_host)
set_service_host(cfg, service, target_host)
return results
@app.command()
@app.command(rich_help_panel="Lifecycle")
def up(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -151,10 +188,11 @@ def up(
"""Start services (docker compose up -d). Auto-migrates if host changed."""
svc_list, cfg = _get_services(services or [], all_services, config)
results = _run_async(_up_with_migration(cfg, svc_list))
_maybe_regenerate_traefik(cfg)
_report_results(results)
@app.command()
@app.command(rich_help_panel="Lifecycle")
def down(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -167,12 +205,13 @@ def down(
# Remove from state on success
for result in results:
if result.success:
remove_service(result.service)
remove_service(cfg, result.service)
_maybe_regenerate_traefik(cfg)
_report_results(results)
@app.command()
@app.command(rich_help_panel="Lifecycle")
def pull(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -184,7 +223,7 @@ def pull(
_report_results(results)
@app.command()
@app.command(rich_help_panel="Lifecycle")
def restart(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -193,10 +232,11 @@ def restart(
"""Restart services (down + up)."""
svc_list, cfg = _get_services(services or [], all_services, config)
results = _run_async(run_sequential_on_services(cfg, svc_list, ["down", "up -d"]))
_maybe_regenerate_traefik(cfg)
_report_results(results)
@app.command()
@app.command(rich_help_panel="Lifecycle")
def update(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -205,10 +245,11 @@ def update(
"""Update services (pull + down + up)."""
svc_list, cfg = _get_services(services or [], all_services, config)
results = _run_async(run_sequential_on_services(cfg, svc_list, ["pull", "down", "up -d"]))
_maybe_regenerate_traefik(cfg)
_report_results(results)
@app.command()
@app.command(rich_help_panel="Monitoring")
def logs(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -225,17 +266,17 @@ def logs(
_report_results(results)
@app.command()
@app.command(rich_help_panel="Monitoring")
def ps(
config: ConfigOption = None,
) -> None:
"""Show status of all services."""
cfg = load_config(config)
cfg = _load_config_or_exit(config)
results = _run_async(run_on_services(cfg, list(cfg.services.keys()), "ps"))
_report_results(results)
@app.command("traefik-file")
@app.command("traefik-file", rich_help_panel="Configuration")
def traefik_file(
services: ServicesArg = None,
all_services: AllOption = False,
@@ -250,13 +291,11 @@ def traefik_file(
config: ConfigOption = None,
) -> None:
"""Generate a Traefik file-provider fragment from compose Traefik labels."""
from .traefik import generate_traefik_config
svc_list, cfg = _get_services(services or [], all_services, config)
try:
dynamic, warnings = generate_traefik_config(cfg, svc_list)
except (FileNotFoundError, ValueError) as exc:
typer.echo(str(exc), err=True)
err_console.print(f"[red]✗[/] {exc}")
raise typer.Exit(1) from exc
rendered = yaml.safe_dump(dynamic, sort_keys=False)
@@ -264,12 +303,12 @@ def traefik_file(
if output:
output.parent.mkdir(parents=True, exist_ok=True)
output.write_text(rendered)
typer.echo(f"Traefik config written to {output}")
console.print(f"[green]✓[/] Traefik config written to {output}")
else:
typer.echo(rendered)
console.print(rendered)
for warning in warnings:
typer.echo(warning, err=True)
err_console.print(f"[yellow]![/] {warning}")
async def _discover_running_services(cfg: Config) -> dict[str, str]:
@@ -305,22 +344,27 @@ def _report_sync_changes(
) -> None:
"""Report sync changes to the user."""
if added:
typer.echo(f"\nNew services found ({len(added)}):")
console.print(f"\nNew services found ({len(added)}):")
for service in sorted(added):
typer.echo(f" + {service} on {discovered[service]}")
console.print(f" [green]+[/] [cyan]{service}[/] on [magenta]{discovered[service]}[/]")
if changed:
typer.echo(f"\nServices on different hosts ({len(changed)}):")
console.print(f"\nServices on different hosts ({len(changed)}):")
for service, old_host, new_host in sorted(changed):
typer.echo(f" ~ {service}: {old_host} -> {new_host}")
console.print(
f" [yellow]~[/] [cyan]{service}[/]: "
f"[magenta]{old_host}[/] → [magenta]{new_host}[/]"
)
if removed:
typer.echo(f"\nServices no longer running ({len(removed)}):")
console.print(f"\nServices no longer running ({len(removed)}):")
for service in sorted(removed):
typer.echo(f" - {service} (was on {current_state[service]})")
console.print(
f" [red]-[/] [cyan]{service}[/] (was on [magenta]{current_state[service]}[/])"
)
@app.command()
@app.command(rich_help_panel="Configuration")
def sync(
config: ConfigOption = None,
log_path: LogPathOption = None,
@@ -335,10 +379,10 @@ def sync(
file, and captures image digests. Combines service discovery with
image snapshot into a single command.
"""
cfg = load_config(config)
current_state = load_state()
cfg = _load_config_or_exit(config)
current_state = load_state(cfg)
typer.echo("Discovering running services...")
console.print("Discovering running services...")
discovered = _run_async(_discover_running_services(cfg))
# Calculate changes
@@ -355,25 +399,69 @@ def sync(
if state_changed:
_report_sync_changes(added, removed, changed, discovered, current_state)
else:
typer.echo("State is already in sync.")
console.print("[green]✓[/] State is already in sync.")
if dry_run:
typer.echo("\n(dry-run: no changes made)")
console.print("\n[dim](dry-run: no changes made)[/]")
return
# Update state file
if state_changed:
save_state(discovered)
typer.echo(f"\nState updated: {len(discovered)} services tracked.")
save_state(cfg, discovered)
console.print(f"\n[green]✓[/] State updated: {len(discovered)} services tracked.")
# Capture image digests for running services
if discovered:
typer.echo("\nCapturing image digests...")
console.print("\nCapturing image digests...")
try:
path = _run_async(snapshot_services(cfg, list(discovered.keys()), log_path=log_path))
typer.echo(f"Digests written to {path}")
console.print(f"[green]✓[/] Digests written to {path}")
except RuntimeError as exc:
typer.echo(f"Warning: {exc}", err=True)
err_console.print(f"[yellow]![/] {exc}")
@app.command(rich_help_panel="Configuration")
def check(
config: ConfigOption = None,
) -> None:
"""Check for compose directories not in config (and vice versa)."""
cfg = _load_config_or_exit(config)
configured = set(cfg.services.keys())
on_disk = cfg.discover_compose_dirs()
missing_from_config = sorted(on_disk - configured)
missing_from_disk = sorted(configured - on_disk)
if missing_from_config:
console.print(f"\n[yellow]Not in config[/] ({len(missing_from_config)}):")
for name in missing_from_config:
console.print(f" [yellow]+[/] [cyan]{name}[/]")
if missing_from_disk:
console.print(f"\n[red]No compose file found[/] ({len(missing_from_disk)}):")
for name in missing_from_disk:
console.print(f" [red]-[/] [cyan]{name}[/]")
if not missing_from_config and not missing_from_disk:
console.print("[green]✓[/] All compose directories are in config.")
elif missing_from_config:
console.print(f"\n[dim]To add missing services, append to {cfg.config_path}:[/]")
for name in missing_from_config:
console.print(f"[dim] {name}: docker-debian[/]")
# Check traefik labels have matching ports
try:
_, traefik_warnings = generate_traefik_config(
cfg, list(cfg.services.keys()), check_all=True
)
if traefik_warnings:
console.print(f"\n[yellow]Traefik issues[/] ({len(traefik_warnings)}):")
for warning in traefik_warnings:
console.print(f" [yellow]![/] {warning}")
elif not missing_from_config and not missing_from_disk:
console.print("[green]✓[/] All traefik services have published ports.")
except (FileNotFoundError, ValueError):
pass # Skip traefik check if config can't be loaded
if __name__ == "__main__":

View File

@@ -23,6 +23,13 @@ class Config(BaseModel):
compose_dir: Path = Path("/opt/compose")
hosts: dict[str, Host]
services: dict[str, str] # service_name -> host_name
traefik_file: Path | None = None # Auto-regenerate traefik config after up/down
traefik_service: str | None = None # Service name for Traefik (skip its host in file-provider)
config_path: Path = Path() # Set by load_config()
def get_state_path(self) -> Path:
"""Get the state file path (stored alongside config)."""
return self.config_path.parent / "compose-farm-state.yaml"
@model_validator(mode="after")
def validate_service_hosts(self) -> Config:
@@ -58,6 +65,25 @@ class Config(BaseModel):
# Default to compose.yaml if none exist (will error later)
return service_dir / "compose.yaml"
def discover_compose_dirs(self) -> set[str]:
"""Find all directories in compose_dir that contain a compose file."""
compose_filenames = {
"compose.yaml",
"compose.yml",
"docker-compose.yml",
"docker-compose.yaml",
}
found: set[str] = set()
if not self.compose_dir.exists():
return found
for subdir in self.compose_dir.iterdir():
if subdir.is_dir():
for filename in compose_filenames:
if (subdir / filename).exists():
found.add(subdir.name)
break
return found
def _parse_hosts(raw_hosts: dict[str, str | dict[str, str | int]]) -> dict[str, Host]:
"""Parse hosts from config, handling both simple and full forms."""
@@ -103,5 +129,6 @@ def load_config(path: Path | None = None) -> Config:
# Parse hosts with flexible format support
raw["hosts"] = _parse_hosts(raw.get("hosts", {}))
raw["config_path"] = config_path.resolve()
return Config(**raw)

View File

@@ -3,18 +3,42 @@
from __future__ import annotations
import asyncio
import sys
import socket
from dataclasses import dataclass
from functools import lru_cache
from typing import TYPE_CHECKING, Any
import asyncssh
from rich.console import Console
from rich.markup import escape
if TYPE_CHECKING:
from .config import Config, Host
_console = Console(highlight=False)
_err_console = Console(stderr=True, highlight=False)
LOCAL_ADDRESSES = frozenset({"local", "localhost", "127.0.0.1", "::1"})
@lru_cache(maxsize=1)
def _get_local_ips() -> frozenset[str]:
"""Get all IP addresses of the current machine."""
ips: set[str] = set()
try:
hostname = socket.gethostname()
# Get all addresses for hostname
for info in socket.getaddrinfo(hostname, None):
ips.add(info[4][0])
# Also try getting the default outbound IP
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as s:
s.connect(("8.8.8.8", 80))
ips.add(s.getsockname()[0])
except OSError:
pass
return frozenset(ips)
@dataclass
class CommandResult:
"""Result of a command execution."""
@@ -28,7 +52,11 @@ class CommandResult:
def _is_local(host: Host) -> bool:
"""Check if host should run locally (no SSH)."""
return host.address.lower() in LOCAL_ADDRESSES
addr = host.address.lower()
if addr in LOCAL_ADDRESSES:
return True
# Check if address matches any of this machine's IPs
return addr in _get_local_ips()
async def _run_local_command(
@@ -53,12 +81,14 @@ async def _run_local_command(
*,
is_stderr: bool = False,
) -> None:
output = sys.stderr if is_stderr else sys.stdout
console = _err_console if is_stderr else _console
while True:
line = await reader.readline()
if not line:
break
print(f"[{prefix}] {line.decode()}", end="", file=output, flush=True)
text = line.decode()
if text.strip(): # Skip empty lines
console.print(f"[cyan]\\[{prefix}][/] {escape(text)}", end="")
await asyncio.gather(
read_stream(proc.stdout, service),
@@ -80,7 +110,7 @@ async def _run_local_command(
stderr=stderr_data.decode() if stderr_data else "",
)
except OSError as e:
print(f"[{service}] Local error: {e}", file=sys.stderr)
_err_console.print(f"[cyan]\\[{service}][/] [red]Local error:[/] {e}")
return CommandResult(service=service, exit_code=1, success=False)
@@ -111,9 +141,10 @@ async def _run_ssh_command(
*,
is_stderr: bool = False,
) -> None:
output = sys.stderr if is_stderr else sys.stdout
console = _err_console if is_stderr else _console
async for line in reader:
print(f"[{prefix}] {line}", end="", file=output, flush=True)
if line.strip(): # Skip empty lines
console.print(f"[cyan]\\[{prefix}][/] {escape(line)}", end="")
await asyncio.gather(
read_stream(proc.stdout, service),
@@ -135,7 +166,7 @@ async def _run_ssh_command(
stderr=stderr_data,
)
except (OSError, asyncssh.Error) as e:
print(f"[{service}] SSH error: {e}", file=sys.stderr)
_err_console.print(f"[cyan]\\[{service}][/] [red]SSH error:[/] {e}")
return CommandResult(service=service, exit_code=1, success=False)

View File

@@ -2,25 +2,20 @@
from __future__ import annotations
from pathlib import Path
from typing import Any
from typing import TYPE_CHECKING, Any
import yaml
def _get_state_path() -> Path:
"""Get the path to the state file."""
state_dir = Path.home() / ".config" / "compose-farm"
state_dir.mkdir(parents=True, exist_ok=True)
return state_dir / "state.yaml"
if TYPE_CHECKING:
from .config import Config
def load_state() -> dict[str, str]:
def load_state(config: Config) -> dict[str, str]:
"""Load the current deployment state.
Returns a dict mapping service names to host names.
"""
state_path = _get_state_path()
state_path = config.get_state_path()
if not state_path.exists():
return {}
@@ -31,28 +26,28 @@ def load_state() -> dict[str, str]:
return deployed
def save_state(deployed: dict[str, str]) -> None:
def save_state(config: Config, deployed: dict[str, str]) -> None:
"""Save the deployment state."""
state_path = _get_state_path()
state_path = config.get_state_path()
with state_path.open("w") as f:
yaml.safe_dump({"deployed": deployed}, f, sort_keys=False)
def get_service_host(service: str) -> str | None:
def get_service_host(config: Config, service: str) -> str | None:
"""Get the host where a service is currently deployed."""
state = load_state()
state = load_state(config)
return state.get(service)
def set_service_host(service: str, host: str) -> None:
def set_service_host(config: Config, service: str, host: str) -> None:
"""Record that a service is deployed on a host."""
state = load_state()
state = load_state(config)
state[service] = host
save_state(state)
save_state(config, state)
def remove_service(service: str) -> None:
def remove_service(config: Config, service: str) -> None:
"""Remove a service from the state (after down)."""
state = load_state()
state = load_state(config)
state.pop(service, None)
save_state(state)
save_state(config, state)

View File

@@ -15,6 +15,8 @@ from typing import TYPE_CHECKING, Any
import yaml
from .ssh import LOCAL_ADDRESSES
if TYPE_CHECKING:
from pathlib import Path
@@ -27,7 +29,6 @@ class PortMapping:
target: int
published: int | None
protocol: str | None = None
@dataclass
@@ -119,7 +120,7 @@ def _parse_ports(raw: Any, env: dict[str, str]) -> list[PortMapping]: # noqa: P
for item in items:
if isinstance(item, str):
interpolated = _interpolate(item, env)
port_spec, _, protocol = interpolated.partition("/")
port_spec, _, _ = interpolated.partition("/")
parts = port_spec.split(":")
published: int | None = None
target: int | None = None
@@ -134,9 +135,7 @@ def _parse_ports(raw: Any, env: dict[str, str]) -> list[PortMapping]: # noqa: P
target = int(parts[-1])
if target is not None:
mappings.append(
PortMapping(target=target, published=published, protocol=protocol or None)
)
mappings.append(PortMapping(target=target, published=published))
elif isinstance(item, dict):
target_raw = item.get("target")
if isinstance(target_raw, str):
@@ -156,14 +155,7 @@ def _parse_ports(raw: Any, env: dict[str, str]) -> list[PortMapping]: # noqa: P
published_val = int(str(published_raw)) if published_raw is not None else None
except (TypeError, ValueError):
published_val = None
protocol_val = item.get("protocol")
mappings.append(
PortMapping(
target=target_val,
published=published_val,
protocol=str(protocol_val) if protocol_val else None,
)
)
mappings.append(PortMapping(target=target_val, published=published_val))
return mappings
@@ -289,8 +281,8 @@ def _finalize_http_services(
if published_port is None:
warnings.append(
f"[{source.stack}/{source.compose_service}] "
f"No host-published port found for Traefik service '{traefik_service}'. "
"Traefik will require L3 reachability to container IPs."
f"No published port found for Traefik service '{traefik_service}'. "
"Add a ports: mapping (e.g., '8080:8080') for cross-host routing."
)
continue
@@ -398,10 +390,28 @@ def _process_service_label(
source.scheme = str(_parse_value(key_without_prefix, label_value))
def _get_ports_for_service(
definition: dict[str, Any],
all_services: dict[str, Any],
env: dict[str, str],
) -> list[PortMapping]:
"""Get ports for a service, following network_mode: service:X if present."""
network_mode = definition.get("network_mode", "")
if isinstance(network_mode, str) and network_mode.startswith("service:"):
# Service uses another service's network - get ports from that service
ref_service = network_mode[len("service:") :]
if ref_service in all_services:
ref_def = all_services[ref_service]
if isinstance(ref_def, dict):
return _parse_ports(ref_def.get("ports"), env)
return _parse_ports(definition.get("ports"), env)
def _process_service_labels(
stack: str,
compose_service: str,
definition: dict[str, Any],
all_services: dict[str, Any],
host_address: str,
env: dict[str, str],
dynamic: dict[str, Any],
@@ -415,7 +425,7 @@ def _process_service_labels(
if enable_raw is not None and _parse_value("enable", enable_raw) is False:
return
ports = _parse_ports(definition.get("ports"), env)
ports = _get_ports_for_service(definition, all_services, env)
routers: dict[str, bool] = {}
service_names: set[str] = set()
@@ -450,17 +460,41 @@ def _process_service_labels(
def generate_traefik_config(
config: Config,
services: list[str],
*,
check_all: bool = False,
) -> tuple[dict[str, Any], list[str]]:
"""Generate Traefik dynamic config from compose labels.
Args:
config: The compose-farm config.
services: List of service names to process.
check_all: If True, check all services for warnings (ignore host filtering).
Used by the check command to validate all traefik labels.
Returns (config_dict, warnings).
"""
dynamic: dict[str, Any] = {}
warnings: list[str] = []
sources: dict[str, TraefikServiceSource] = {}
# Determine Traefik's host from service assignment
traefik_host = None
if config.traefik_service and not check_all:
traefik_host = config.services.get(config.traefik_service)
for stack in services:
raw_services, env, host_address = _load_stack(config, stack)
stack_host = config.services.get(stack)
# Skip services on Traefik's host - docker provider handles them directly
# (unless check_all is True, for validation purposes)
if not check_all:
if host_address.lower() in LOCAL_ADDRESSES:
continue
if traefik_host and stack_host == traefik_host:
continue
for compose_service, definition in raw_services.items():
if not isinstance(definition, dict):
continue
@@ -468,6 +502,7 @@ def generate_traefik_config(
stack,
compose_service,
definition,
raw_services,
host_address,
env,
dynamic,

View File

@@ -4,7 +4,7 @@ from pathlib import Path
import pytest
from compose_farm import state as state_module
from compose_farm.config import Config, Host
from compose_farm.state import (
get_service_host,
load_state,
@@ -15,52 +15,51 @@ from compose_farm.state import (
@pytest.fixture
def state_dir(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path:
"""Create a temporary state directory and patch _get_state_path."""
state_path = tmp_path / ".config" / "compose-farm"
state_path.mkdir(parents=True)
def mock_get_state_path() -> Path:
return state_path / "state.yaml"
monkeypatch.setattr(state_module, "_get_state_path", mock_get_state_path)
return state_path
def config(tmp_path: Path) -> Config:
"""Create a config with a temporary config path for state storage."""
config_path = tmp_path / "compose-farm.yaml"
config_path.write_text("") # Create empty file
return Config(
compose_dir=tmp_path / "compose",
hosts={"nas01": Host(address="192.168.1.10")},
services={"plex": "nas01"},
config_path=config_path,
)
class TestLoadState:
"""Tests for load_state function."""
def test_load_state_empty(self, state_dir: Path) -> None:
def test_load_state_empty(self, config: Config) -> None:
"""Returns empty dict when state file doesn't exist."""
_ = state_dir # Fixture activates the mock
result = load_state()
result = load_state(config)
assert result == {}
def test_load_state_with_data(self, state_dir: Path) -> None:
def test_load_state_with_data(self, config: Config) -> None:
"""Loads existing state from file."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n jellyfin: nas02\n")
result = load_state()
result = load_state(config)
assert result == {"plex": "nas01", "jellyfin": "nas02"}
def test_load_state_empty_file(self, state_dir: Path) -> None:
def test_load_state_empty_file(self, config: Config) -> None:
"""Returns empty dict for empty file."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("")
result = load_state()
result = load_state(config)
assert result == {}
class TestSaveState:
"""Tests for save_state function."""
def test_save_state(self, state_dir: Path) -> None:
def test_save_state(self, config: Config) -> None:
"""Saves state to file."""
save_state({"plex": "nas01", "jellyfin": "nas02"})
save_state(config, {"plex": "nas01", "jellyfin": "nas02"})
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
assert state_file.exists()
content = state_file.read_text()
assert "plex: nas01" in content
@@ -70,65 +69,64 @@ class TestSaveState:
class TestGetServiceHost:
"""Tests for get_service_host function."""
def test_get_existing_service(self, state_dir: Path) -> None:
def test_get_existing_service(self, config: Config) -> None:
"""Returns host for existing service."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n")
host = get_service_host("plex")
host = get_service_host(config, "plex")
assert host == "nas01"
def test_get_nonexistent_service(self, state_dir: Path) -> None:
def test_get_nonexistent_service(self, config: Config) -> None:
"""Returns None for service not in state."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n")
host = get_service_host("unknown")
host = get_service_host(config, "unknown")
assert host is None
class TestSetServiceHost:
"""Tests for set_service_host function."""
def test_set_new_service(self, state_dir: Path) -> None:
def test_set_new_service(self, config: Config) -> None:
"""Adds new service to state."""
_ = state_dir # Fixture activates the mock
set_service_host("plex", "nas01")
set_service_host(config, "plex", "nas01")
result = load_state()
result = load_state(config)
assert result["plex"] == "nas01"
def test_update_existing_service(self, state_dir: Path) -> None:
def test_update_existing_service(self, config: Config) -> None:
"""Updates host for existing service."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n")
set_service_host("plex", "nas02")
set_service_host(config, "plex", "nas02")
result = load_state()
result = load_state(config)
assert result["plex"] == "nas02"
class TestRemoveService:
"""Tests for remove_service function."""
def test_remove_existing_service(self, state_dir: Path) -> None:
def test_remove_existing_service(self, config: Config) -> None:
"""Removes service from state."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n jellyfin: nas02\n")
remove_service("plex")
remove_service(config, "plex")
result = load_state()
result = load_state(config)
assert "plex" not in result
assert result["jellyfin"] == "nas02"
def test_remove_nonexistent_service(self, state_dir: Path) -> None:
def test_remove_nonexistent_service(self, config: Config) -> None:
"""Removing nonexistent service doesn't error."""
state_file = state_dir / "state.yaml"
state_file = config.get_state_path()
state_file.write_text("deployed:\n plex: nas01\n")
remove_service("unknown") # Should not raise
remove_service(config, "unknown") # Should not raise
result = load_state()
result = load_state(config)
assert result["plex"] == "nas01"

View File

@@ -171,4 +171,4 @@ class TestReportSyncChanges:
)
captured = capsys.readouterr()
assert "Services on different hosts (1)" in captured.out
assert "~ plex: nas01 -> nas02" in captured.out
assert "~ plex: nas01 nas02" in captured.out

View File

@@ -76,7 +76,7 @@ def test_generate_traefik_config_without_published_port_warns(tmp_path: Path) ->
dynamic, warnings = generate_traefik_config(cfg, ["app"])
assert dynamic["http"]["routers"]["app"]["rule"] == "Host(`app.lab.mydomain.org`)"
assert any("No host-published port found" in warning for warning in warnings)
assert any("No published port found" in warning for warning in warnings)
def test_generate_interpolates_env_and_infers_router_service(tmp_path: Path) -> None:
@@ -193,3 +193,51 @@ def test_generate_skips_services_with_enable_false(tmp_path: Path) -> None:
assert dynamic == {}
assert warnings == []
def test_generate_follows_network_mode_service_for_ports(tmp_path: Path) -> None:
"""Services using network_mode: service:X should use ports from service X."""
cfg = Config(
compose_dir=tmp_path,
hosts={"nas01": Host(address="192.168.1.10")},
services={"vpn-stack": "nas01"},
)
compose_path = tmp_path / "vpn-stack" / "docker-compose.yml"
_write_compose(
compose_path,
{
"services": {
"vpn": {
"image": "gluetun",
"ports": ["5080:5080", "9696:9696"],
},
"qbittorrent": {
"image": "qbittorrent",
"network_mode": "service:vpn",
"labels": [
"traefik.enable=true",
"traefik.http.routers.torrent.rule=Host(`torrent.example.com`)",
"traefik.http.services.torrent.loadbalancer.server.port=5080",
],
},
"prowlarr": {
"image": "prowlarr",
"network_mode": "service:vpn",
"labels": [
"traefik.enable=true",
"traefik.http.routers.prowlarr.rule=Host(`prowlarr.example.com`)",
"traefik.http.services.prowlarr.loadbalancer.server.port=9696",
],
},
}
},
)
dynamic, warnings = generate_traefik_config(cfg, ["vpn-stack"])
assert warnings == []
# Both services should get their ports from the vpn service
torrent_servers = dynamic["http"]["services"]["torrent"]["loadbalancer"]["servers"]
assert torrent_servers == [{"url": "http://192.168.1.10:5080"}]
prowlarr_servers = dynamic["http"]["services"]["prowlarr"]["loadbalancer"]["servers"]
assert prowlarr_servers == [{"url": "http://192.168.1.10:9696"}]

2
uv.lock generated
View File

@@ -131,6 +131,7 @@ dependencies = [
{ name = "asyncssh" },
{ name = "pydantic" },
{ name = "pyyaml" },
{ name = "rich" },
{ name = "typer" },
]
@@ -151,6 +152,7 @@ requires-dist = [
{ name = "asyncssh", specifier = ">=2.14.0" },
{ name = "pydantic", specifier = ">=2.0.0" },
{ name = "pyyaml", specifier = ">=6.0" },
{ name = "rich", specifier = ">=13.0.0" },
{ name = "typer", specifier = ">=0.9.0" },
]