Previously, Pydantic validation errors like "Extra inputs are not
permitted" didn't show which field caused the error. Now the error
message includes the field location (e.g., "unknown_key: Extra inputs
are not permitted").
* perf: Optimize stray detection to use 1 SSH call per host
Previously, stray detection checked each stack on each host individually,
resulting in (stacks * hosts) SSH calls. For 50 stacks across 4 hosts,
this meant ~200 parallel SSH connections, causing "Connection lost" errors.
Now queries each host once for all running compose projects using:
docker ps --format '{{.Label "com.docker.compose.project"}}' | sort -u
This reduces SSH calls from ~200 to just 4 (one per host).
Changes:
- Add get_running_stacks_on_host() in executor.py
- Add discover_all_stacks_on_all_hosts() in operations.py
- Update _discover_stacks_full() to use the batch approach
* Remove unused function and add tests
- Remove discover_stack_on_all_hosts() which is no longer used
- Add tests for get_running_stacks_on_host()
- Add tests for discover_all_stacks_on_all_hosts()
- Verifies it returns correct StackDiscoveryResult
- Verifies stray detection works
- Verifies it makes only 1 call per host (not per stack)
* Add self-healing: detect and stop rogue containers
Adds the ability to detect and stop "rogue" containers - stacks running
on hosts they shouldn't be according to config.
Changes:
- `cf refresh`: Now scans ALL hosts and warns about rogues/duplicates
- `cf apply`: Stops rogue containers before migrations (new phase)
- New `--no-rogues` flag to skip rogue detection
Implementation:
- Add StackDiscoveryResult for full host scanning results
- Add discover_stack_on_all_hosts() to check all hosts in parallel
- Add stop_rogue_stacks() to stop containers on unauthorized hosts
- Update tests to include new no_rogues parameter
* Update README.md
* fix: Update refresh tests for _discover_stacks_full return type
The function now returns a tuple (discovered, rogues, duplicates)
for rogue/duplicate detection. Update test mocks accordingly.
* Rename "rogue" terminology to "stray" for consistency
Terminology update across the codebase:
- rogue_hosts -> stray_hosts
- is_rogue -> is_stray
- stop_rogue_stacks -> stop_stray_stacks
- _discover_rogues -> _discover_strays
- --no-rogues -> --no-strays
- _report_rogue_stacks -> _report_stray_stacks
"Stray" better complements "orphaned" (both evoke lost things)
while clearly indicating the stack is running somewhere it
shouldn't be.
* Update README.md
* Move asyncio import to top level
* Fix remaining rogue -> stray in docstrings and README
* Refactor: Extract shared helpers to reduce duplication
1. Extract _stop_stacks_on_hosts helper in operations.py
- Shared by stop_orphaned_stacks and stop_stray_stacks
- Reduces ~50 lines of duplicated code
2. Refactor _discover_strays to reuse _discover_stacks_full
- Removes duplicate discovery logic from lifecycle.py
- Calls management._discover_stacks_full and merges duplicates
* Add PR review prompt
* Fix typos in PR review prompt
* Move import to top level (no in-function imports)
* Update README.md
* Remove obvious comments
* refactor(web): store backups in XDG config directory
Move file backups from `.backups/` alongside the file to
`~/.config/compose-farm/backups/` (respecting XDG_CONFIG_HOME).
The original file path is mirrored inside to avoid name collisions.
* docs(web): document automatic backup location
* refactor(paths): extract shared config_dir() function
* fix(web): use path anchor for Windows compatibility
* feat(web): add Open Website button and command for stacks with Traefik labels
Parse traefik.http.routers.*.rule labels to extract Host() rules and
display "Open Website" button(s) on stack pages. Also adds the command
to the command palette.
- Add extract_website_urls() function to compose.py
- Determine scheme (http/https) from entrypoint (websecure/web)
- Prefer HTTPS when same host has both protocols
- Support environment variable interpolation
- Add external_link icon from Lucide
- Add comprehensive tests for URL extraction
* refactor: move extract_website_urls to traefik.py and reuse existing parsing
Instead of duplicating the Traefik label parsing logic in compose.py,
reuse generate_traefik_config() with check_all=True to get the parsed
router configuration, then extract Host() rules from it.
- Move extract_website_urls from compose.py to traefik.py
- Reuse generate_traefik_config for label parsing
- Move tests from test_compose.py to test_traefik.py
- Update import in pages.py
* test: add comprehensive tests for extract_website_urls
Cover real-world patterns found in stacks:
- Multiple Host() in one rule with || operator
- Host() combined with PathPrefix (e.g., && PathPrefix(`/api`))
- Multiple services in one stack (like arr stack)
- Labels in list format (- key=value)
- No entrypoints (defaults to http)
- Multiple entrypoints including websecure
* fix: Skip buildable images in pull command
Add --ignore-buildable flag to pull command, matching the behavior
of the update command. This prevents pull from failing when a stack
contains services with local build directives (no remote image).
* test: Fix flaky command palette close detection
Use state="hidden" instead of :not([open]) selector when waiting
for the command palette to close. The old approach failed because
wait_for_selector defaults to waiting for visibility, but a closed
<dialog> element is hidden by design.
- Add "Shell: {service}" commands to the command palette when on a stack page
- Allows quick shell access to containers via `Cmd+K` → type "shell" → select service
- Add `get_container_name()` helper in `compose.py` for consistent container name resolution (used by both api.py and pages.py)
- Enable `-n auto` for all test commands in justfile (parallel execution)
- Add redis stack to test fixtures (missing stack was causing test failure)
- Replace hardcoded timeouts with constants: `TIMEOUT` (10s) and `SHORT_TIMEOUT` (5s)
- Rename `test-unit` → `test-cli` and `test-browser` → `test-web`
- Skip CLI startup test when running in parallel mode (`-n auto`)
- Update test assertions for 5 stacks (was 4)
- Add service-level commands to the command palette when viewing a stack detail page
- Services are extracted from the compose file and exposed via a `data-services` attribute
- Commands are grouped by action (all Logs together, all Pull together, etc.) with services sorted alphabetically
- Service commands appear with a teal indicator to distinguish from stack-level commands (green)
- Implement word-boundary fuzzy matching for better filtering UX:
- `rest plex` matches `Restart: plex-server`
- `server` matches `plex-server` (hyphenated names split into words)
- Query words must match the START of command words (prevents false positives like `r ba` matching `Logs: bazarr`)
Available service commands:
- `Restart: <service>` - Restart a specific service
- `Pull: <service>` - Pull image for a service
- `Logs: <service>` - View logs for a service
- `Stop: <service>` - Stop a service
- `Up: <service>` - Start a service
Add Astral's ty type checker (written in Rust, 10-100x faster than mypy)
as a second type checking layer. Both run in pre-commit and CI.
Fixed type issues caught by ty:
- config.py: explicit Host constructor to avoid dict unpacking issues
- executor.py: wrap subprocess.run in closure for asyncio.to_thread
- api.py: use getattr for Jinja TemplateModule macro access
- test files: fix playwright driver_path tuple handling, pytest rootpath typing
One-shot containers (like CLI tools) were showing a perpetual loading
spinner because they weren't in `docker compose ps` output. Now we:
- Use `ps -a` to include stopped/exited containers
- Display exit code: neutral badge for clean exit (0), error badge for failures
- Show "created" state for containers that were never started
- Create paths.py module with lightweight path utilities (no pydantic)
- Move Config imports to TYPE_CHECKING blocks in CLI modules
- Lazy import load_config only when needed
Combined with asyncssh lazy loading, cf --help now starts in ~150ms
instead of ~500ms (70% faster).
When --full is passed, apply also runs 'docker compose up' on all
services (not just missing/migrating ones) to pick up any config
changes (compose file, .env, etc).
- cf apply # Fast: state reconciliation only
- cf apply --full # Thorough: also refresh all running services
Previously, apply only handled:
1. Stopping orphans (in state, not in config)
2. Migrating services (in state, wrong host)
Now it also handles:
3. Starting missing services (in config, not in state)
This fixes the case where a service was stopped as an orphan, then
re-added to config - apply would say "nothing to do" instead of
starting it.
Added get_services_not_in_state() to state.py and updated tests.
Separate "read state from reality" from "write config to reality":
- Rename `sync` to `refresh` (updates local state from running services)
- Add `apply` command (makes reality match config: migrate + stop orphans)
- Add `down --orphaned` flag (stops services removed from config)
- Modify `up --migrate` to only handle migrations (not orphans)
The new mental model:
- `refresh` = Reality → State (discover what's running)
- `apply` = Config → Reality (reconcile: migrate services + stop orphans)
Also extract private helper functions for reporting to match codebase style.
When services are removed from config but still tracked in state,
`cf up --migrate` now stops them automatically. This makes the
config truly declarative - comment out a service, run migrate,
and it stops.
Changes:
- Add get_orphaned_services() to state.py for detecting orphans
- Add stop_orphaned_services() to operations.py for cleanup
- Update lifecycle.py to call stop_orphaned_services on --migrate
- Refactor _report_orphaned_services to use shared function
- Rename "missing_from_config" to "unmanaged" for clarity
- Add tests for get_orphaned_services
- Only remove from state on successful down (not on failure)
Use `pull --ignore-buildable` to skip images that have `build:` defined
in the compose file, preventing pull failures for locally-built images
like gitea-runner-custom. The build step then handles these images.
Remove functions that were replaced by _with_progress variants in cli.py:
- discover_running_services, check_mounts_on_configured_hosts,
check_networks_on_configured_hosts, _check_resources from operations.py
- snapshot_services from logs.py
- get_service_hosts from state.py
Make previously private functions public (remove underscore prefix):
- is_local in executor.py
- isoformat, collect_service_entries, load_existing_entries,
merge_entries, write_toml in logs.py
- load_env, interpolate, parse_ports in compose.py
Update tests to use renamed public functions.
- Add --host/-H option to filter logs to services on a specific host
- Default --tail to 20 lines when showing multiple services (--all, --host, or >1 service)
- Default to 100 lines for single service
- Add tests for contextual default and host filtering
- Extract compose.py from traefik.py for generic compose parsing
(env loading, interpolation, ports, volumes, networks)
- Rename ssh.py to executor.py for clarity
- Extract operations.py from cli.py for business logic
(up_services, discover_running_services, preflight checks)
- Update CLAUDE.md with new architecture diagram
- Add docs/dev/future-improvements.md for low-priority items
CLI is now a thin layer that delegates to operations module.
All 70 tests pass.