* fix: --host filter now limits multi-host stack operations to single host
Previously, when using `-H host` with a multi-host stack like `glances: all`,
the command would find the stack (correct) but then operate on ALL hosts for
that stack (incorrect). For example, `cf down -H nas` with `glances` would
stop glances on all 5 hosts instead of just nas.
Now, when `--host` is specified:
- `cf down -H nas` only stops stacks on nas, including only the nas instance
of multi-host stacks
- `cf up -H nas` only starts stacks on nas (skips migration logic since
host is explicitly specified)
Added tests for the new filter_host behavior in both executor and CLI.
* fix: Apply filter_host to logs and ps commands as well
Same bug as up/down: when using `-H host` with multi-host stacks,
logs and ps would show results from all hosts instead of just the
filtered host.
* fix: Don't remove multi-host stacks from state when host-filtered
When using `-H host` with a multi-host stack, we only stop one instance.
The stack is still running on other hosts, so we shouldn't remove it
from state entirely.
This prevents issues where:
- `cf apply` would try to re-start the stack
- `cf ps` would show incorrect running status
- Orphan detection would be confused
Added tests to verify state is preserved for host-filtered multi-host
operations and removed for full stack operations.
* refactor: Introduce StackSelection dataclass for cleaner context passing
Instead of passing filter_host separately through multiple layers,
bundle the selection context into a StackSelection dataclass:
- stacks: list of selected stack names
- config: the loaded Config
- host_filter: optional host filter from -H flag
This provides:
1. Cleaner APIs - context travels together instead of being scattered
2. is_instance_level() method - encapsulates the check for whether
this is an instance-level operation (host-filtered multi-host stack)
3. Future extensibility - can add more context (dry_run, verbose, etc.)
Updated all callers of get_stacks() to use the new return type.
* Revert "refactor: Introduce StackSelection dataclass for cleaner context passing"
This reverts commit e6e9eed93e.
* feat: Proper per-host state tracking for multi-host stacks
- Add `remove_stack_host()` to remove a single host from a multi-host stack's state
- Add `add_stack_host()` to add a single host to a stack's state
- Update `down` command to use `remove_stack_host` for host-filtered multi-host stacks
- Update `up` command to use `add_stack_host` for host-filtered operations
This ensures the state file accurately reflects which hosts each stack is running on,
rather than just tracking if it's running at all.
* fix: Use set comparisons for host list tests
Host lists may be reordered during YAML save/load, so test for
set equality rather than list equality.
* refactor: Merge remove_stack_host into remove_stack as optional parameter
Instead of a separate function, `remove_stack` now takes an optional
`host` parameter. When specified, it removes only that host from
multi-host stacks. This reduces API surface and follows the existing
pattern.
* fix: Restore deterministic host list sorting and add filter_host test
- Restore sorting of list values in _sorted_dict for consistent YAML output
- Add test for logs --host passing filter_host to run_on_stacks
* Add self-healing: detect and stop rogue containers
Adds the ability to detect and stop "rogue" containers - stacks running
on hosts they shouldn't be according to config.
Changes:
- `cf refresh`: Now scans ALL hosts and warns about rogues/duplicates
- `cf apply`: Stops rogue containers before migrations (new phase)
- New `--no-rogues` flag to skip rogue detection
Implementation:
- Add StackDiscoveryResult for full host scanning results
- Add discover_stack_on_all_hosts() to check all hosts in parallel
- Add stop_rogue_stacks() to stop containers on unauthorized hosts
- Update tests to include new no_rogues parameter
* Update README.md
* fix: Update refresh tests for _discover_stacks_full return type
The function now returns a tuple (discovered, rogues, duplicates)
for rogue/duplicate detection. Update test mocks accordingly.
* Rename "rogue" terminology to "stray" for consistency
Terminology update across the codebase:
- rogue_hosts -> stray_hosts
- is_rogue -> is_stray
- stop_rogue_stacks -> stop_stray_stacks
- _discover_rogues -> _discover_strays
- --no-rogues -> --no-strays
- _report_rogue_stacks -> _report_stray_stacks
"Stray" better complements "orphaned" (both evoke lost things)
while clearly indicating the stack is running somewhere it
shouldn't be.
* Update README.md
* Move asyncio import to top level
* Fix remaining rogue -> stray in docstrings and README
* Refactor: Extract shared helpers to reduce duplication
1. Extract _stop_stacks_on_hosts helper in operations.py
- Shared by stop_orphaned_stacks and stop_stray_stacks
- Reduces ~50 lines of duplicated code
2. Refactor _discover_strays to reuse _discover_stacks_full
- Removes duplicate discovery logic from lifecycle.py
- Calls management._discover_stacks_full and merges duplicates
* Add PR review prompt
* Fix typos in PR review prompt
* Move import to top level (no in-function imports)
* Update README.md
* Remove obvious comments
Add Astral's ty type checker (written in Rust, 10-100x faster than mypy)
as a second type checking layer. Both run in pre-commit and CI.
Fixed type issues caught by ty:
- config.py: explicit Host constructor to avoid dict unpacking issues
- executor.py: wrap subprocess.run in closure for asyncio.to_thread
- api.py: use getattr for Jinja TemplateModule macro access
- test files: fix playwright driver_path tuple handling, pytest rootpath typing
When --full is passed, apply also runs 'docker compose up' on all
services (not just missing/migrating ones) to pick up any config
changes (compose file, .env, etc).
- cf apply # Fast: state reconciliation only
- cf apply --full # Thorough: also refresh all running services
Previously, apply only handled:
1. Stopping orphans (in state, not in config)
2. Migrating services (in state, wrong host)
Now it also handles:
3. Starting missing services (in config, not in state)
This fixes the case where a service was stopped as an orphan, then
re-added to config - apply would say "nothing to do" instead of
starting it.
Added get_services_not_in_state() to state.py and updated tests.