Commit Graph

59 Commits

Author SHA1 Message Date
Bas Nijholt
2a923e6e81 fix: Include field name in config validation error messages (#131)
Previously, Pydantic validation errors like "Extra inputs are not
permitted" didn't show which field caused the error. Now the error
message includes the field location (e.g., "unknown_key: Extra inputs
are not permitted").
2025-12-22 22:35:19 -08:00
Bas Nijholt
5f2e081298 perf: Batch snapshot collection to 1 SSH call per host (#130)
## Summary

Optimize `cf refresh` SSH calls from O(stacks) to O(hosts):
- Discovery: 1 SSH call per host (unchanged)
- Snapshots: 1 SSH call per host (was 1 per stack)

For 50 stacks across 4 hosts: 54 → 8 SSH calls.

## Changes

**Performance:**
- Use `docker ps` + `docker image inspect` instead of `docker compose images` per stack
- Batch snapshot collection by host in `collect_stacks_entries_on_host()`

**Architecture:**
- Add `build_discovery_results()` to `operations.py` (business logic)
- Keep progress bar wrapper in `cli/management.py` (presentation)
- Remove dead code: `discover_all_stacks_on_all_hosts()`, `collect_all_stacks_entries()`
2025-12-22 22:19:32 -08:00
Bas Nijholt
6fbc7430cb perf: Optimize stray detection to use 1 SSH call per host (#129)
* perf: Optimize stray detection to use 1 SSH call per host

Previously, stray detection checked each stack on each host individually,
resulting in (stacks * hosts) SSH calls. For 50 stacks across 4 hosts,
this meant ~200 parallel SSH connections, causing "Connection lost" errors.

Now queries each host once for all running compose projects using:
  docker ps --format '{{.Label "com.docker.compose.project"}}' | sort -u

This reduces SSH calls from ~200 to just 4 (one per host).

Changes:
- Add get_running_stacks_on_host() in executor.py
- Add discover_all_stacks_on_all_hosts() in operations.py
- Update _discover_stacks_full() to use the batch approach

* Remove unused function and add tests

- Remove discover_stack_on_all_hosts() which is no longer used
- Add tests for get_running_stacks_on_host()
- Add tests for discover_all_stacks_on_all_hosts()
  - Verifies it returns correct StackDiscoveryResult
  - Verifies stray detection works
  - Verifies it makes only 1 call per host (not per stack)
2025-12-22 12:09:59 -08:00
Bas Nijholt
6fdb43e1e9 Add self-healing: detect and stop stray containers (#128)
* Add self-healing: detect and stop rogue containers

Adds the ability to detect and stop "rogue" containers - stacks running
on hosts they shouldn't be according to config.

Changes:
- `cf refresh`: Now scans ALL hosts and warns about rogues/duplicates
- `cf apply`: Stops rogue containers before migrations (new phase)
- New `--no-rogues` flag to skip rogue detection

Implementation:
- Add StackDiscoveryResult for full host scanning results
- Add discover_stack_on_all_hosts() to check all hosts in parallel
- Add stop_rogue_stacks() to stop containers on unauthorized hosts
- Update tests to include new no_rogues parameter

* Update README.md

* fix: Update refresh tests for _discover_stacks_full return type

The function now returns a tuple (discovered, rogues, duplicates)
for rogue/duplicate detection. Update test mocks accordingly.

* Rename "rogue" terminology to "stray" for consistency

Terminology update across the codebase:
- rogue_hosts -> stray_hosts
- is_rogue -> is_stray
- stop_rogue_stacks -> stop_stray_stacks
- _discover_rogues -> _discover_strays
- --no-rogues -> --no-strays
- _report_rogue_stacks -> _report_stray_stacks

"Stray" better complements "orphaned" (both evoke lost things)
while clearly indicating the stack is running somewhere it
shouldn't be.

* Update README.md

* Move asyncio import to top level

* Fix remaining rogue -> stray in docstrings and README

* Refactor: Extract shared helpers to reduce duplication

1. Extract _stop_stacks_on_hosts helper in operations.py
   - Shared by stop_orphaned_stacks and stop_stray_stacks
   - Reduces ~50 lines of duplicated code

2. Refactor _discover_strays to reuse _discover_stacks_full
   - Removes duplicate discovery logic from lifecycle.py
   - Calls management._discover_stacks_full and merges duplicates

* Add PR review prompt

* Fix typos in PR review prompt

* Move import to top level (no in-function imports)

* Update README.md

* Remove obvious comments
2025-12-22 10:22:09 -08:00
Bas Nijholt
fdb00e7655 refactor(web): store backups in XDG config directory (#113)
* refactor(web): store backups in XDG config directory

Move file backups from `.backups/` alongside the file to
`~/.config/compose-farm/backups/` (respecting XDG_CONFIG_HOME).
The original file path is mirrored inside to avoid name collisions.

* docs(web): document automatic backup location

* refactor(paths): extract shared config_dir() function

* fix(web): use path anchor for Windows compatibility
2025-12-21 15:08:15 -08:00
Bas Nijholt
612242eea9 feat(web): add Open Website button and command for stacks with Traefik labels (#110)
* feat(web): add Open Website button and command for stacks with Traefik labels

Parse traefik.http.routers.*.rule labels to extract Host() rules and
display "Open Website" button(s) on stack pages. Also adds the command
to the command palette.

- Add extract_website_urls() function to compose.py
- Determine scheme (http/https) from entrypoint (websecure/web)
- Prefer HTTPS when same host has both protocols
- Support environment variable interpolation
- Add external_link icon from Lucide
- Add comprehensive tests for URL extraction

* refactor: move extract_website_urls to traefik.py and reuse existing parsing

Instead of duplicating the Traefik label parsing logic in compose.py,
reuse generate_traefik_config() with check_all=True to get the parsed
router configuration, then extract Host() rules from it.

- Move extract_website_urls from compose.py to traefik.py
- Reuse generate_traefik_config for label parsing
- Move tests from test_compose.py to test_traefik.py
- Update import in pages.py

* test: add comprehensive tests for extract_website_urls

Cover real-world patterns found in stacks:
- Multiple Host() in one rule with || operator
- Host() combined with PathPrefix (e.g., && PathPrefix(`/api`))
- Multiple services in one stack (like arr stack)
- Labels in list format (- key=value)
- No entrypoints (defaults to http)
- Multiple entrypoints including websecure
2025-12-21 14:16:46 -08:00
Bas Nijholt
ea650bff8a fix: Skip buildable images in pull command (#109)
* fix: Skip buildable images in pull command

Add --ignore-buildable flag to pull command, matching the behavior
of the update command. This prevents pull from failing when a stack
contains services with local build directives (no remote image).

* test: Fix flaky command palette close detection

Use state="hidden" instead of :not([open]) selector when waiting
for the command palette to close. The old approach failed because
wait_for_selector defaults to waiting for visibility, but a closed
<dialog> element is hidden by design.
2025-12-21 10:28:10 -08:00
Bas Nijholt
36e4bef46d feat(web): add shell command to command palette for services (#104)
- Add "Shell: {service}" commands to the command palette when on a stack page
- Allows quick shell access to containers via `Cmd+K` → type "shell" → select service
- Add `get_container_name()` helper in `compose.py` for consistent container name resolution (used by both api.py and pages.py)
2025-12-21 01:23:54 -08:00
Bas Nijholt
0f67c17281 test: parallel execution and timeout constants (#101)
- Enable `-n auto` for all test commands in justfile (parallel execution)
- Add redis stack to test fixtures (missing stack was causing test failure)
- Replace hardcoded timeouts with constants: `TIMEOUT` (10s) and `SHORT_TIMEOUT` (5s)
- Rename `test-unit` → `test-cli` and `test-browser` → `test-web`
- Skip CLI startup test when running in parallel mode (`-n auto`)
- Update test assertions for 5 stacks (was 4)
2025-12-21 00:48:52 -08:00
Bas Nijholt
f71e5cffd6 feat(web): add service commands to command palette with fuzzy matching (#95)
- Add service-level commands to the command palette when viewing a stack detail page
- Services are extracted from the compose file and exposed via a `data-services` attribute
- Commands are grouped by action (all Logs together, all Pull together, etc.) with services sorted alphabetically
- Service commands appear with a teal indicator to distinguish from stack-level commands (green)
- Implement word-boundary fuzzy matching for better filtering UX:
  - `rest plex` matches `Restart: plex-server`
  - `server` matches `plex-server` (hyphenated names split into words)
  - Query words must match the START of command words (prevents false positives like `r ba` matching `Logs: bazarr`)

Available service commands:
- `Restart: <service>` - Restart a specific service
- `Pull: <service>` - Pull image for a service
- `Logs: <service>` - View logs for a service
- `Stop: <service>` - Stop a service
- `Up: <service>` - Start a service
2025-12-20 23:23:53 -08:00
Bas Nijholt
b0b501fa98 docs: update example services in documentation and tests (#96) 2025-12-20 22:45:13 -08:00
Bas Nijholt
f0cd85b5f5 fix: prevent terminal reconnection to wrong page after navigation (#81) 2025-12-20 16:41:28 -08:00
Bas Nijholt
350947ad12 Rename services to stacks terminology (#79) 2025-12-20 16:00:41 -08:00
Bas Nijholt
bb019bcae6 feat: add ty type checker alongside mypy (#77)
Add Astral's ty type checker (written in Rust, 10-100x faster than mypy)
as a second type checking layer. Both run in pre-commit and CI.

Fixed type issues caught by ty:
- config.py: explicit Host constructor to avoid dict unpacking issues
- executor.py: wrap subprocess.run in closure for asyncio.to_thread
- api.py: use getattr for Jinja TemplateModule macro access
- test files: fix playwright driver_path tuple handling, pytest rootpath typing
2025-12-20 15:43:51 -08:00
Bas Nijholt
de46c3ff0f feat: add web UI demo recording system (#69) 2025-12-20 15:00:03 -08:00
Bas Nijholt
187f83b61d feat: add service arguments to refresh command (#70) 2025-12-20 13:14:09 -08:00
Bas Nijholt
8b16484ce2 feat(web): add theme switcher with 35 DaisyUI themes (#61) 2025-12-19 22:33:10 -08:00
Bas Nijholt
61a845fad8 test: add comprehensive browser tests for HTMX/JS functionality (#59) 2025-12-19 14:27:00 -08:00
Bas Nijholt
1fa17b4e07 feat(web): Auto-refresh dashboard and clean up HTMX inheritance (#49) 2025-12-18 20:07:31 -08:00
Bas Nijholt
cd25a1914c fix(web): Show exit code for stopped containers instead of loading spinner (#51)
One-shot containers (like CLI tools) were showing a perpetual loading
spinner because they weren't in `docker compose ps` output. Now we:
- Use `ps -a` to include stopped/exited containers
- Display exit code: neutral badge for clean exit (0), error badge for failures
- Show "created" state for containers that were never started
2025-12-18 20:03:12 -08:00
Bas Nijholt
a71200b199 feat(test): Add Playwright browser tests for web UI (#48) 2025-12-18 18:26:23 -08:00
Bas Nijholt
2e6146a94b feat(ps): Add service filtering to ps command (#33) 2025-12-18 13:31:18 -08:00
Bas Nijholt
282de12336 feat(cli): Add ssh subcommand for SSH key management (#22) 2025-12-18 11:58:33 -08:00
Bas Nijholt
a6e491575a feat(web): Add Console page with terminal and editor (#17) 2025-12-18 10:29:15 -08:00
Bas Nijholt
1278d0b3af fix(web): Remove config caching so changes are detected immediately
Config was cached with @lru_cache, causing the web UI to show stale
sync status after external config file edits.
2025-12-17 23:17:25 -08:00
Bas Nijholt
5afda8cbb2 Add web UI with FastAPI + HTMX + xterm.js (#13) 2025-12-17 22:52:40 -08:00
Bas Nijholt
6b684b19f2 Run startup time test 6 times 2025-12-17 09:08:45 -08:00
Bas Nijholt
5bf65d3849 Raise Linux CLI startup threshold to 0.25s for CI headroom 2025-12-17 09:05:53 -08:00
Bas Nijholt
e49ad29999 Use OS-specific thresholds for CLI startup test (Linux: 0.2s, macOS: 0.35s, Windows: 2s) 2025-12-17 08:57:50 -08:00
Bas Nijholt
cdbe74ed89 Return early from CLI startup test when under threshold 2025-12-17 08:56:35 -08:00
Bas Nijholt
129970379c Increase CLI startup threshold to 0.35s for macOS/Windows CI 2025-12-17 08:55:40 -08:00
Bas Nijholt
c5c47d14dd Add CLI startup time test to catch slow imports
Runs `cf --help` and fails if startup exceeds 200ms. Shows timing
info in CI logs on both pass and failure.
2025-12-17 08:53:32 -08:00
Bas Nijholt
6048f37ad5 Lazy import pydantic for faster CLI startup
- Create paths.py module with lightweight path utilities (no pydantic)
- Move Config imports to TYPE_CHECKING blocks in CLI modules
- Lazy import load_config only when needed

Combined with asyncssh lazy loading, cf --help now starts in ~150ms
instead of ~500ms (70% faster).
2025-12-17 00:07:15 -08:00
Bas Nijholt
c720170f26 Add --full flag to apply command
When --full is passed, apply also runs 'docker compose up' on all
services (not just missing/migrating ones) to pick up any config
changes (compose file, .env, etc).

- cf apply          # Fast: state reconciliation only
- cf apply --full   # Thorough: also refresh all running services
2025-12-16 23:54:19 -08:00
Bas Nijholt
af9c760fb8 Add missing service detection to apply command
Previously, apply only handled:
1. Stopping orphans (in state, not in config)
2. Migrating services (in state, wrong host)

Now it also handles:
3. Starting missing services (in config, not in state)

This fixes the case where a service was stopped as an orphan, then
re-added to config - apply would say "nothing to do" instead of
starting it.

Added get_services_not_in_state() to state.py and updated tests.
2025-12-16 23:21:09 -08:00
Bas Nijholt
90656b05e3 Add tests for apply command and down --orphaned flag
Tests cover:
- apply: nothing to do, dry-run preview, migrations, orphan cleanup, --no-orphans
- down --orphaned: no orphans, stops services, error on invalid combinations

Lifecycle.py coverage improved from 20% to 61%.
2025-12-16 23:15:46 -08:00
Bas Nijholt
be6b391121 Refactor CLI commands for clearer UX
Separate "read state from reality" from "write config to reality":
- Rename `sync` to `refresh` (updates local state from running services)
- Add `apply` command (makes reality match config: migrate + stop orphans)
- Add `down --orphaned` flag (stops services removed from config)
- Modify `up --migrate` to only handle migrations (not orphans)

The new mental model:
- `refresh` = Reality → State (discover what's running)
- `apply` = Config → Reality (reconcile: migrate services + stop orphans)

Also extract private helper functions for reporting to match codebase style.
2025-12-16 23:06:42 -08:00
Bas Nijholt
7f56ba6a41 Add orphaned service detection and cleanup
When services are removed from config but still tracked in state,
`cf up --migrate` now stops them automatically. This makes the
config truly declarative - comment out a service, run migrate,
and it stops.

Changes:
- Add get_orphaned_services() to state.py for detecting orphans
- Add stop_orphaned_services() to operations.py for cleanup
- Update lifecycle.py to call stop_orphaned_services on --migrate
- Refactor _report_orphaned_services to use shared function
- Rename "missing_from_config" to "unmanaged" for clarity
- Add tests for get_orphaned_services
- Only remove from state on successful down (not on failure)
2025-12-16 22:53:26 -08:00
Bas Nijholt
4b3d7a861e Fix migration and update for services with buildable images
Use `pull --ignore-buildable` to skip images that have `build:` defined
in the compose file, preventing pull failures for locally-built images
like gitea-runner-custom. The build step then handles these images.
2025-12-16 19:42:24 -08:00
Bas Nijholt
b7315d255a refactor: Split CLI into modular subpackage (#11) 2025-12-16 13:08:08 -08:00
Bas Nijholt
cbbcec0d14 Add config subcommand for managing configuration files (#10) 2025-12-16 12:00:44 -08:00
Bas Nijholt
790e32e96b Fix test_load_config_not_found for CF_CONFIG env var 2025-12-16 11:13:44 -08:00
Bas Nijholt
a95f6309b0 Remove dead code and make internal APIs public
Remove functions that were replaced by _with_progress variants in cli.py:
- discover_running_services, check_mounts_on_configured_hosts,
  check_networks_on_configured_hosts, _check_resources from operations.py
- snapshot_services from logs.py
- get_service_hosts from state.py

Make previously private functions public (remove underscore prefix):
- is_local in executor.py
- isoformat, collect_service_entries, load_existing_entries,
  merge_entries, write_toml in logs.py
- load_env, interpolate, parse_ports in compose.py

Update tests to use renamed public functions.
2025-12-15 20:19:28 -08:00
Bas Nijholt
8aa019e25f fix: Move imports to top-level in test file 2025-12-15 10:29:29 -08:00
Bas Nijholt
76aa6e11d2 logs: Make --all and --host mutually exclusive
These options conflict conceptually - --all means all services across
all hosts, while --host means all services on a specific host.
2025-12-14 20:10:28 -08:00
Bas Nijholt
d377df15b4 logs: Add --host filter and contextual --tail default
- Add --host/-H option to filter logs to services on a specific host
- Default --tail to 20 lines when showing multiple services (--all, --host, or >1 service)
- Default to 100 lines for single service
- Add tests for contextual default and host filtering
2025-12-14 20:04:40 -08:00
Bas Nijholt
dc541c0298 test: Skip shell-dependent tests on Windows/Mac 2025-12-14 14:28:31 -08:00
Bas Nijholt
566a07d3a4 Refactor: separate concerns into dedicated modules
- Extract compose.py from traefik.py for generic compose parsing
  (env loading, interpolation, ports, volumes, networks)
- Rename ssh.py to executor.py for clarity
- Extract operations.py from cli.py for business logic
  (up_services, discover_running_services, preflight checks)
- Update CLAUDE.md with new architecture diagram
- Add docs/dev/future-improvements.md for low-priority items

CLI is now a thin layer that delegates to operations module.
All 70 tests pass.
2025-12-14 12:49:24 -08:00
Bas Nijholt
04154b84f6 Add tests for network and path checking
- test_traefik: Tests for parse_external_networks()
- test_ssh: Tests for check_paths_exist() and check_networks_exist()
2025-12-14 12:08:35 -08:00
Bas Nijholt
e12002ce86 Add test for network_mode: service:X port lookup 2025-12-14 00:03:11 -08:00