Add self-healing: detect and stop stray containers (#128)

* Add self-healing: detect and stop rogue containers

Adds the ability to detect and stop "rogue" containers - stacks running
on hosts they shouldn't be according to config.

Changes:
- `cf refresh`: Now scans ALL hosts and warns about rogues/duplicates
- `cf apply`: Stops rogue containers before migrations (new phase)
- New `--no-rogues` flag to skip rogue detection

Implementation:
- Add StackDiscoveryResult for full host scanning results
- Add discover_stack_on_all_hosts() to check all hosts in parallel
- Add stop_rogue_stacks() to stop containers on unauthorized hosts
- Update tests to include new no_rogues parameter

* Update README.md

* fix: Update refresh tests for _discover_stacks_full return type

The function now returns a tuple (discovered, rogues, duplicates)
for rogue/duplicate detection. Update test mocks accordingly.

* Rename "rogue" terminology to "stray" for consistency

Terminology update across the codebase:
- rogue_hosts -> stray_hosts
- is_rogue -> is_stray
- stop_rogue_stacks -> stop_stray_stacks
- _discover_rogues -> _discover_strays
- --no-rogues -> --no-strays
- _report_rogue_stacks -> _report_stray_stacks

"Stray" better complements "orphaned" (both evoke lost things)
while clearly indicating the stack is running somewhere it
shouldn't be.

* Update README.md

* Move asyncio import to top level

* Fix remaining rogue -> stray in docstrings and README

* Refactor: Extract shared helpers to reduce duplication

1. Extract _stop_stacks_on_hosts helper in operations.py
   - Shared by stop_orphaned_stacks and stop_stray_stacks
   - Reduces ~50 lines of duplicated code

2. Refactor _discover_strays to reuse _discover_stacks_full
   - Removes duplicate discovery logic from lifecycle.py
   - Calls management._discover_stacks_full and merges duplicates

* Add PR review prompt

* Fix typos in PR review prompt

* Move import to top level (no in-function imports)

* Update README.md

* Remove obvious comments
This commit is contained in:
Bas Nijholt
2025-12-22 10:22:09 -08:00
committed by GitHub
parent 620e797671
commit 6fdb43e1e9
7 changed files with 341 additions and 80 deletions

15
.prompts/pr-review.md Normal file
View File

@@ -0,0 +1,15 @@
Review the pull request for:
- **Code cleanliness**: Is the implementation clean and well-structured?
- **DRY principle**: Does it avoid duplication?
- **Code reuse**: Are there parts that should be reused from other places?
- **Organization**: Is everything in the right place?
- **Consistency**: Is it in the same style as other parts of the codebase?
- **Simplicity**: Is it not over-engineered? Remember KISS and YAGNI. No dead code paths and NO defensive programming.
- **User experience**: Does it provide a good user experience?
- **PR**: Is the PR description and title clear and informative?
- **Tests**: Are there tests, and do they cover the changes adequately? Are they testing something meaningful or are they just trivial?
- **Live tests**: Test the changes in a REAL live environment to ensure they work as expected, use the config in `/opt/stacks/compose-farm.yaml`.
- **Rules**: Does the code follow the project's coding standards and guidelines as laid out in @CLAUDE.md?
Look at `git diff origin/main..HEAD` for the changes made in this pull request.