Files
signoz/docs
Jatinderjit Singh a27b7d3d8e move planned maintenance to alertmanager pipeline (#11130)
* add maintenanceMuteStage to move planned maintenance to alertmanager

Rules previously skipped rule.Eval() entirely during maintenance windows.
This change moves suppression to MaintenanceMuter, injected as a Stage
in the alertmanager notification pipeline. Now rules always evaluate and
everys suppression is handled by alertmanager.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: wrap routing pipeline once instead of per-route injection

Replace the per-route-entry loop with a single MultiStage wrap so
maintenance suppression runs once per dispatch group before routing.

* refactor: move maintenance mute stage into custom pipelineBuilder

Copy notify.PipelineBuilder locally so we can inject mms between the
silence stage and the receiver stage (GossipSettle → Inhibit →
TimeActive → TimeMute → Silence → mms → Receiver), matching the
correct suppression order the team requires.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: add license header to pipeline_builder.go

Copied code originates from Apache-2.0 licensed Prometheus Alertmanager;
add dual copyright + SPDX identifier following the repo's convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: replace SPDX tag with full Apache 2.0 license boilerplate

The full license text is unambiguously compliant with Apache 2.0 Section 4(a),
which requires giving recipients "a copy of this License".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: pass MaintenanceMuter directly to pipelineBuilder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: remove dead orgID param from task constructors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* rename buildReceiverStage -> createReceiverStage

* refactor: replace maintenanceMuteStage with notify.NewMuteStage

MaintenanceMuter already satisfies types.Muter, and pipelineBuilder has
its own pb.metrics, so the hand-rolled maintenanceMuteStage wrapper is
redundant. Use notify.NewMuteStage(pb.muter, pb.metrics) directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: hoist MuteStage construction out of the receiver loop

MuteStage holds no per-receiver state, so one instance shared across
all receivers is sufficient — matching how is/ss are handled upstream.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: always initialize maintenanceStore; remove nil guards

Tests now use a real sqlrulestore-backed MaintenanceMuter instead of
passing nil. With nil no longer a valid input, remove the nil guards
in server.go and pipeline_builder.go.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: move MaintenanceMuter to Server and pass it to pipelineBuilder.New

- Remove muter from pipelineBuilder struct and newPipelineBuilder();
  pass it as a parameter to New() instead, consistent with inhibitor/silencer
- Store muter on Server so GetAlerts can call Mutes() alongside the
  inhibitor and silencer, ensuring maintenance-suppressed alerts show
  the correct muted status in API responses

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* remove redundant MemMarker wrapper

* feat: surface maintenance-suppressed alerts via mutedBy in GetAlerts

Alerts suppressed by an active maintenance window were being correctly
muted in the notification pipeline but appeared as state=active in the
v2 GetAlerts response, since MaintenanceMuter.Mutes had no marker
side-effect (unlike inhibitor/silencer).

Add MaintenanceMuter.MutedBy returning the matching window IDs, and
plumb a mutedByFunc callback through NewGettableAlertsFromAlertProvider
into AlertToOpenAPIAlert. The upstream v2 API forces state=suppressed
when mutedBy is non-empty, so the frontend's existing state-based
rendering picks it up without further changes.

Use the dedicated mutedBy field rather than SilencedBy to avoid
violating the "complete set of silence IDs" contract that anything
querying silences by ID would rely on.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* code cleanup

* refactor: move maintenance (planned downtime) to alertmanager packages

Types move from pkg/types/ruletypes/ to pkg/types/alertmanagertypes/:
- maintenance.go, recurrence.go, schedule.go (+ tests)

Store impl moves from pkg/ruler/rulestore/sqlrulestore/ to
pkg/alertmanager/alertmanagerstore/sqlalertmanagerstore/.

Maintenance windows mute alerts, so they belong with alertmanager
rather than the rule types.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: add unit tests for MaintenanceMuter

Covers Mutes/MutedBy semantics (empty label, rule match, empty-RuleIDs
matches-all, future windows, multi-window) and the result cache
(single-fetch within TTL, stale-cache fallback on store error,
re-fetch after expiry, concurrency safety).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update schema changes

* Re-add marker

* fix NewMaintenanceStore in tests

* Go lint fixes

* test: use mockery-generated mock for MaintenanceStore in muter tests

Replace hand-written fakeMaintenanceStore with a mockery-generated
MockMaintenanceStore, consistent with the alertmanagertest pattern.
Also adds MaintenanceStore to .mockery.yml so the mock stays in sync.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: regenerate mocks via make gen-mocks

Picks up new MockHandler for the Handler interface in pkg/alertmanager
and regenerates MockMaintenanceStore with canonical mockery formatting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* cleanup test

* test: add e2e muting tests for maintenance window behaviour

* fix updates: omit empty endTime from serialization

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 04:20:16 +00:00
..