Files
signoz/cmd
Pandey c9610df66d
Some checks failed
build-staging / prepare (push) Has been cancelled
build-staging / js-build (push) Has been cancelled
build-staging / go-build (push) Has been cancelled
build-staging / staging (push) Has been cancelled
Release Drafter / update_release_draft (push) Has been cancelled
refactor(ruler): move rules and planned maintenance handlers to signozapiserver (#10957)
* refactor(ruler): define Ruler and Handler interfaces with signozruler implementation

Expand the Ruler interface with rule management and planned maintenance
methods matching rules.Manager signatures. Add Handler interface for
HTTP endpoints. Implement handler in signozruler wrapping ruler.Ruler,
and update provider to embed *rules.Manager for interface satisfaction.

* refactor(ruler): move eval_delay from query-service constants to ruler config

Replace constants.GetEvalDelay() with config.EvalDelay on ruler.Config,
defaulting to 2m. This removes the signozruler dependency on
pkg/query-service/constants.

* refactor(ruler): use time.Duration for eval_delay config

Match the convention used by all other configs in the codebase.
TextDuration is for preserving human-readable text through JSON
round-trips in user-facing rule definitions, not for internal config.

* refactor(ruler): add godoc comments and spacing to Ruler interface

* refactor(ruler): wire ruler handler through signoz.New and signozapiserver

- Add Start/Stop to Ruler interface for lifecycle management
- Add rulerCallback to signoz.New() for EE customization
- Wire ruler.Handler through Handlers, signozapiserver provider
- Register 12 routes in signozapiserver/ruler.go (7 rules, 5 downtime)
- Update cmd/community and cmd/enterprise to pass rulerCallback
- Move rules.Manager creation from server.go to signoz.New via callback
- Change APIHandler.ruleManager type from *rules.Manager to ruler.Ruler
- Remove makeRulesManager from both OSS and EE server.go

* refactor(ruler): remove old rules and downtime_schedules routes from http_handler

Remove 7 rules CRUD routes and 5 downtime_schedules routes plus their
handler methods from http_handler.go. These are now served by
signozapiserver/ruler.go via handler.New() with OpenAPIDef.

The 4 v1 history routes (stats, timeline, top_contributors,
overall_status) remain in http_handler.go as they depend on
interfaces.Reader and have v2 equivalents already in signozapiserver.

* refactor(ruler): use ProviderFactory pattern and register in factory.Registry

Replace the rulerCallback with rulerProviderFactories following the
standard ProviderFactory pattern (like auditorProviderFactories). The
ruler is now created via factory.NewProviderFromNamedMap and registered
in factory.Registry for lifecycle management. Start/Stop are no longer
called manually in server.go.

- Ruler interface embeds factory.Service (Start/Stop return error)
- signozruler.NewFactory accepts all deps including EE task funcs
- provider uses named field (not embedding) with explicit delegation
- cmd/community passes nil task funcs, cmd/enterprise passes EE funcs
- Remove NewRulerProviderFactories (replaced by callback from cmd/)
- Remove manual Start/Stop from both OSS and EE server.go

* fix(ruler): make Start block on stopC per factory.Service contract

rules.Manager.Start is non-blocking (run() just closes a channel).
Add stopC to provider so Start blocks until Stop closes it, matching
the factory.Service contract used by the Registry.

* refactor(ruler): remove unused RM() accessor from EE APIHandler

* refactor(ruler): remove RuleManager from APIHandlerOpts

Use Signoz.Ruler directly instead of passing it through opts.

* refactor(ruler): add /api/v1/rules/test and mark /api/v1/testRule as deprecated

* refactor(ruler): use binding.JSON.BindBody for downtime schedule decode

* refactor(ruler): add TODOs for raw string params on Ruler interface

Mark CreateRule, EditRule, PatchRule, TestNotification, and DeleteRule
with TODOs to accept typed params instead of raw JSON strings. Requires
changing the storage model since the manager stores raw JSON as Data.

* refactor(ruler): add TODO on MaintenanceStore to not expose store directly

* docs: regenerate OpenAPI spec and frontend API clients with ruler routes

* refactor(ruler): rename downtime_schedules tag to downtimeschedules

* refactor(ruler): add query params to ListDowntimeSchedules OpenAPIDef

Add ListPlannedMaintenanceParams struct with active/recurring fields.
Use binding.Query.BindQuery in the handler instead of raw URL parsing.
Add RequestQuery to the OpenAPIDef so params appear in the OpenAPI spec
and generated frontend client.

* refactor(ruler): add GettableTestRule response type to TestRule endpoint

Define GettableTestRule struct with AlertCount and Message fields.
Use it as the Response in TestRule OpenAPIDef so the generated frontend
client has a proper response type instead of string.

* refactor(ruler): tighten schema with oneOf unions and required fields

Surface the polymorphism in RuleThresholdData and EvaluationEnvelope via
JSONSchemaOneOf (the same pattern as QueryEnvelope), so the generated
TS types are discriminated unions with typed `spec` instead of unknown.
Also mark `alert`, `ruleType`, and `condition` required on PostableRule
so the generated TS types are non-optional for callers.

* refactor(ruler): add Enum() on EvaluationKind, ScheduleType, ThresholdKind

Surface the fixed set of accepted values for these valuer-wrapped kind
types so OpenAPI emits proper string-enum schemas and the generated TS
types become string-literal unions instead of plain string.

* refactor(ruler): mark required fields on nested rule and maintenance types

Surface fields already enforced by Validate()/UnmarshalJSON as required
in the OpenAPI schema so the generated TS types match runtime behavior.

Touches RuleCondition (compositeQuery, op, matchType), RuleThresholdData
(kind, spec), BasicRuleThreshold (name, target, op, matchType),
RollingWindow (evalWindow, frequency), CumulativeWindow (schedule,
frequency, timezone), EvaluationEnvelope (kind, spec), Schedule
(timezone), GettablePlannedMaintenance (name, schedule).

Does not mark server-populated fields (id, createdAt, updatedAt, status,
kind) on GettablePlannedMaintenance required, since the same struct is
reused for request bodies in MaintenanceStore.CreatePlannedMaintenance.

* refactor(ruler): tighten AlertCompositeQuery, QueryType, PanelType schema

Missed in the earlier tightening pass. AlertCompositeQuery.queries,
panelType, queryType are all required for a valid composite query;
QueryType and PanelType are valuer-wrapped with fixed value sets, so
expose them as enums in the OpenAPI schema.

* refactor(ruler): wrap sql.ErrNoRows as TypeNotFound in by-ID lookups

GetStoredRule and GetPlannedMaintenanceByID previously returned bun's
raw Scan error, so a missing ID leaked "sql: no rows in result set" to
the HTTP response with a 500 status. WrapNotFoundErrf converts
sql.ErrNoRows into TypeNotFound so render.Error emits 404 with a stable
`not_found` code, and passes other errors through unchanged.

* refactor(ruler): move migrated rules routes to /api/v2/rules

The 7 rules routes now live at /api/v2/rules, /api/v2/rules/{id}, and
/api/v2/rules/test — served via handler.New with render.Success and
render.Error. The legacy /api/v1/rules paths will be restored in the
query-service http handler in a follow-up so existing clients keep
receiving the SuccessResponse envelope unchanged.

Drop the /api/v1/testRule deprecated alias from signozapiserver; the
original lives on main's http_handler.go and is restored alongside the
other v1 paths.

Downtime schedule routes stay at /api/v1/downtime_schedules — single
track, no legacy restore planned.

* refactor(ruler): restore /api/v1/rules legacy handlers for back-compat

Bring the 7 rule CRUD/test handlers and their router.HandleFunc lines
back to http_handler.go so /api/v1/rules, /api/v1/rules/{id}, and
/api/v1/testRule continue to emit the legacy SuccessResponse envelope.
The v2 versions under signozapiserver are the new home for the render
envelope used by generated clients.

Delegation uses aH.ruleManager (populated from opts.Signoz.Ruler in
NewAPIHandler), so a single ruler.Ruler instance serves both paths — no
second rules.Manager is instantiated.

Downtime schedules stay single-track under signozapiserver; the 5
downtime handlers are not restored.

* docs: regenerate OpenAPI spec and frontend clients for /api/v2/rules

* refactor(ruler): return 201 Created on POST /api/v2/rules

A successful create now responds with 201 Created and the full
GettableRule body, matching REST convention for resource creation.
Regenerates the OpenAPI spec and frontend clients to reflect the new
status code.

* refactor(ruler): restore dropped sorter TODO in legacy listRules

The legacy listRules handler was copied verbatim from main during the
v1 back-compat restore, but an inner blank line and the load-bearing
`// todo(amol): need to add sorter` comment were stripped. Put them
back so the legacy block round-trips cleanly against main.

* refactor(ruler): return 201 Created on POST /api/v1/downtime_schedules

Match the REST convention already applied to POST /api/v2/rules:
successful creates respond with 201 Created. Response body remains
empty (nil); the generated frontend client surface is unchanged since
no response type was declared.

A richer "return the created resource" response body is a separate
follow-up — holding off until the ruletypes naming cleanup lands.

* fix(ruler): signal Healthy only after manager.Start closes m.block

The ruler provider didn't implement factory.Healthy, so the registry
fell back to factory.closedC and marked the service StateRunning the
instant its Start goroutine spawned — before rules.Manager.Start had
closed m.block. /api/v2/healthz therefore returned 200 while rule
evaluation was still gated, and integration tests that POSTed a rule
immediately after the readiness check saw their task goroutines stuck
on <-m.block until the next frequency tick.

Add a healthyC channel and close it inside Start only after
manager.Start returns; implement factory.Healthy so the registry and
/api/v2/healthz wait on the real readiness signal.

* fix: add the withhealthy interface

* fix(ruler): alias legacy RULES_EVAL_DELAY env var in backward-compat

The eval_delay config was moved from query-service constants (read from
RULES_EVAL_DELAY) onto ruler.Config (read via mapstructure from
SIGNOZ_RULER_EVAL__DELAY). That silently broke the legacy env var for
any existing deployment — notably the alerts integration-test fixture
which sets RULES_EVAL_DELAY=0s to let rules evaluate against just-
inserted data. The resulting default 2m delay pushed the query window
far enough back that the fixture's rate spike fell outside it, causing
8 of 24 parametrize cases in 02_basic_alert_conditions.py to fail with
"Expected N alerts to be fired but got 0 alerts".

Add RULES_EVAL_DELAY to mergeAndEnsureBackwardCompatibility alongside
the ~10 other aliased legacy env vars. Emits the standard deprecation
warning and overrides config.Ruler.EvalDelay.
2026-04-18 08:25:16 +00:00
..