* chore: added phase counts feature
* chore: added queries for pod phase counts in custom group by
* chore: added unknown phase count
* fix: isPodUIDInGroupBy in buildPodRecords
* chore: 3 cte --> 2 cte
* chore: pod phase with local table of time series as counts
* chore: comment correction
* chore: corrected comment
* chore: value column for samples table added
* chore: removed query G for phase counts
* chore: rename variable
* chore: added PodPhaseNum constants to types
* feat(global): add mcp_url to global config
Adds an optional mcp_url field to the global config so the frontend can
gate the MCP settings page on its presence. When unset the API returns
"mcp_url": null (pointer + nullable:"true"); when set it emits the
parsed URL as a string.
* feat(global): surface mcp_url in frontend types
Adds mcp_url to the manual GlobalConfigData type and refreshes the
generated OpenAPI client so consumers can read the new field.
* docs(global): use <unset> placeholder for mcp_url example
Matches the style of external_url and ingestion_url above it.
* style(global): separate mcp_url prep from return in GetConfig
Adds a blank line between the nullable-conversion block and the return
statement so the two logical phases read as distinct blocks.
* feat(global): mark endpoint fields as required in the API schema
The backend always emits external_url, ingestion_url and mcp_url on
GET /api/v1/global/config (mcp_url as literal null when unset), so the
JSON keys are always present. Add required:"true" to all three and
regenerate the OpenAPI + frontend client so consumers get non-optional
types.
* revert(global): drop mcp_url from legacy GlobalConfigData type
The legacy hand-written type for the non-Orval getGlobalConfig client
should be left alone; consumers that need mcp_url go through the
generated Orval client.
* feat: setup types and interface for waterfall v3
v3 is required for udpating the response json of
the waterfall api. There wont' be any logical change.
Using this requirement as an opportunity to move
waterfall api to provider codebase architecture from
older query-service
* refactor: move type conversion logic to types pkg
* chore: add reason for using snake case in response
* fix: update span.attributes to map of string to any
To support otel format of diffrent types of attributes
* fix: remove unused fields and rename span type
To avoid confusing with otel span
* refactor: convert waterfall api to modules format
* chore: add same test cases as for old waterfall api
* chore: avoid sorting on every traversal
* fix: remove unused fields and rename span type
To avoid confusing with otel span
* fix: rename timestamp to milli for readability
* fix: add timeout to module context
* fix: use typed paramter field in logs
* chore: generate openapi spec for v3 waterfall
* fix: remove timeout since waterfall take longer
* fix: use int16 for status code as per db schema
* fix: update openapi specs
* refactor: break down GetWaterfall method for readability
* chore: avoid returning nil, nil
* refactor: move type creation and constants to types package
- Move DB/table/cache/windowing constants to tracedetailtypes package
- Add NewWaterfallTrace and NewWaterfallResponse constructors in types
- Use constructors in module.go instead of inline struct literals
- Reorder waterfall.go so public functions precede private ones
* refactor: extract ClickHouse queries into a store abstraction
Move GetTraceSummary and GetTraceSpans out of module.go into a
traceStore interface backed by clickhouseTraceStore in store.go.
The module struct now holds a traceStore instead of a raw
telemetrystore.TelemetryStore, keeping DB access separate from
business logic.
* refactor: move error to types as well
* refactor: separate out store calls and computations
* refactor: breakdown GetSelectedSpans for readability
* refactor: return 404 on missing trace and other cleanup
* refactor: use same method for cache key creation
* chore: remove unused duration nano field
* chore: use sqlbuilder in clickhouse store where possible
* refactor: move waterfall traverse logic to types
and extract out auto expanded span calculation
* chore: convert all timestamp to nano for consitancy
* chore: rename waterfall response to gettableX format
* chore: fix method calls in test after refactoring
* refactor: remove unused methods
* chore: fix openapi spec
* chore: better names for methods and vars
* chore: remove caching to match from v2
* chore: update openapi client
* refactor: move selection decision to types
* chore: move types to the top
* refactor: avoid passing the whole telementry store in a module
* refactor: move waterfall constants to module config
* chore: update openapi specs
* chore: update openapi clints
* chore: baseline setup
* chore: endpoint detail update
* chore: added logic for hosts v3 api
* fix: bug fix
* chore: disk usage
* chore: added validate function
* chore: added some unit tests
* chore: return status as a string
* chore: yarn generate api
* chore: removed isSendingK8sAgentsMetricsCode
* chore: moved funcs
* chore: added validation on order by
* chore: updated spec
* chore: nil pointer dereference fix in req.Filter
* chore: added temporalities of metrics
* chore: unified composite key function
* chore: code improvements
* chore: hostStatusNone added for clarity that this field can be left empty as well in payload
* chore: yarn generate api
* chore: return errors from getMetadata and lint fix
* chore: return errors from getMetadata and lint fix
* chore: added hostName logic
* chore: modified getMetadata query
* chore: add type for response and files rearrange
* chore: warnings added passing from queryResponse warning to host lists response struct
* chore: added better metrics existence check
* chore: added a TODO remark
* chore: added required metrics check
* chore: distributed samples table to local table change for get metadata
* chore: frontend fix
* chore: endpoint correction
* chore: endpoint modification openapi
* chore: escape backtick to prevent sql injection
* chore: rearrage
* chore: improvements
* chore: validate order by to validate function
* chore: improved description
* chore: added TODOs and made filterByStatus a part of filter struct
* chore: ignore empty string hosts in get active hosts
* feat(infra-monitoring): v2 hosts list - return counts of active & inactive hosts for custom group by attributes (#10956)
* chore: add functionality for showing active and inactive counts in custom group by
* chore: bug fix
* chore: added subquery for active and total count
* chore: ignore empty string hosts in get active hosts
* fix: sinceUnixMilli for determining active hosts compute once per request
* chore: refactor code
* chore: rename HostsList -> ListHosts
* chore: rearrangement
* chore: inframonitoring types renaming
* chore: added types package
* chore: file structure further breakdown for clarity
* chore: comments correction
* chore: removed temporalities
* chore: comments resolve
* chore: added json tag required: true
* chore: added status unauthorized
* chore: remove a defensive nil map check, the function ensure non-nil map when err nil
* chore: make sort stable in case of tiebreaker by comparing composite group by keys
* chore: regen api client for inframonitoring
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: add functionality for showing active and inactive counts in custom group by
* chore: bug fix
* chore: added subquery for active and total count
* chore: ignore empty string hosts in get active hosts
* fix: sinceUnixMilli for determining active hosts compute once per request
* chore: refactor code
* refactor(ruler): add Rule v2 read type and rename storage Rule to StorableRule
* refactor(ruler): map GettableRule to Rule before responding on v2 routes
* docs(openapi): regenerate spec with RuletypesRule on v2 rules routes
* docs(frontend): regenerate API clients with RuletypesRuleDTO
* refactor(ruler): validate uuid-v7 on delete rule handler
* refactor(ruler): add Enum() on AlertType
* refactor(ruler): convert RepeatType and RepeatOn to valuer.String with Enum()
* refactor(ruler): mark required fields on Recurrence
* refactor(ruler): mark required tag on CumulativeSchedule.Type
* refactor(ruler): rename GettablePlannedMaintenance to PlannedMaintenance
* docs: regenerate OpenAPI spec and frontend clients with tightened schema
* refactor(ruler): add PostablePlannedMaintenance input type with Validate
* refactor(ruler): rename EditPlannedMaintenance to Update and GetAll to List
* refactor(ruler): switch Create/Update to *PostablePlannedMaintenance
* refactor(ruler): convert PlannedMaintenance.Id string to ID valuer.UUID
* refactor(ruler): return *PlannedMaintenance from CreatePlannedMaintenance
* docs: regenerate OpenAPI spec and frontend clients for Postable/ID changes
* refactor(ruler): type PlannedMaintenance.Status as MaintenanceStatus enum
* refactor(ruler): type PlannedMaintenance.Kind as MaintenanceKind enum
* refactor(ruler): mark GettableRule.Id required
* refactor(ruler): mark GettableRule.State required
* refactor(ruler): make GettableRule timestamps non-pointer and users nullable
* refactor(ruler): return bare array from v2 ListRules instead of wrapped object
* docs: regenerate OpenAPI spec and frontend clients for schema pass
* refactor(ruler): define Ruler and Handler interfaces with signozruler implementation
Expand the Ruler interface with rule management and planned maintenance
methods matching rules.Manager signatures. Add Handler interface for
HTTP endpoints. Implement handler in signozruler wrapping ruler.Ruler,
and update provider to embed *rules.Manager for interface satisfaction.
* refactor(ruler): move eval_delay from query-service constants to ruler config
Replace constants.GetEvalDelay() with config.EvalDelay on ruler.Config,
defaulting to 2m. This removes the signozruler dependency on
pkg/query-service/constants.
* refactor(ruler): use time.Duration for eval_delay config
Match the convention used by all other configs in the codebase.
TextDuration is for preserving human-readable text through JSON
round-trips in user-facing rule definitions, not for internal config.
* refactor(ruler): add godoc comments and spacing to Ruler interface
* refactor(ruler): wire ruler handler through signoz.New and signozapiserver
- Add Start/Stop to Ruler interface for lifecycle management
- Add rulerCallback to signoz.New() for EE customization
- Wire ruler.Handler through Handlers, signozapiserver provider
- Register 12 routes in signozapiserver/ruler.go (7 rules, 5 downtime)
- Update cmd/community and cmd/enterprise to pass rulerCallback
- Move rules.Manager creation from server.go to signoz.New via callback
- Change APIHandler.ruleManager type from *rules.Manager to ruler.Ruler
- Remove makeRulesManager from both OSS and EE server.go
* refactor(ruler): remove old rules and downtime_schedules routes from http_handler
Remove 7 rules CRUD routes and 5 downtime_schedules routes plus their
handler methods from http_handler.go. These are now served by
signozapiserver/ruler.go via handler.New() with OpenAPIDef.
The 4 v1 history routes (stats, timeline, top_contributors,
overall_status) remain in http_handler.go as they depend on
interfaces.Reader and have v2 equivalents already in signozapiserver.
* refactor(ruler): use ProviderFactory pattern and register in factory.Registry
Replace the rulerCallback with rulerProviderFactories following the
standard ProviderFactory pattern (like auditorProviderFactories). The
ruler is now created via factory.NewProviderFromNamedMap and registered
in factory.Registry for lifecycle management. Start/Stop are no longer
called manually in server.go.
- Ruler interface embeds factory.Service (Start/Stop return error)
- signozruler.NewFactory accepts all deps including EE task funcs
- provider uses named field (not embedding) with explicit delegation
- cmd/community passes nil task funcs, cmd/enterprise passes EE funcs
- Remove NewRulerProviderFactories (replaced by callback from cmd/)
- Remove manual Start/Stop from both OSS and EE server.go
* fix(ruler): make Start block on stopC per factory.Service contract
rules.Manager.Start is non-blocking (run() just closes a channel).
Add stopC to provider so Start blocks until Stop closes it, matching
the factory.Service contract used by the Registry.
* refactor(ruler): remove unused RM() accessor from EE APIHandler
* refactor(ruler): remove RuleManager from APIHandlerOpts
Use Signoz.Ruler directly instead of passing it through opts.
* refactor(ruler): add /api/v1/rules/test and mark /api/v1/testRule as deprecated
* refactor(ruler): use binding.JSON.BindBody for downtime schedule decode
* refactor(ruler): add TODOs for raw string params on Ruler interface
Mark CreateRule, EditRule, PatchRule, TestNotification, and DeleteRule
with TODOs to accept typed params instead of raw JSON strings. Requires
changing the storage model since the manager stores raw JSON as Data.
* refactor(ruler): add TODO on MaintenanceStore to not expose store directly
* docs: regenerate OpenAPI spec and frontend API clients with ruler routes
* refactor(ruler): rename downtime_schedules tag to downtimeschedules
* refactor(ruler): add query params to ListDowntimeSchedules OpenAPIDef
Add ListPlannedMaintenanceParams struct with active/recurring fields.
Use binding.Query.BindQuery in the handler instead of raw URL parsing.
Add RequestQuery to the OpenAPIDef so params appear in the OpenAPI spec
and generated frontend client.
* refactor(ruler): add GettableTestRule response type to TestRule endpoint
Define GettableTestRule struct with AlertCount and Message fields.
Use it as the Response in TestRule OpenAPIDef so the generated frontend
client has a proper response type instead of string.
* refactor(ruler): tighten schema with oneOf unions and required fields
Surface the polymorphism in RuleThresholdData and EvaluationEnvelope via
JSONSchemaOneOf (the same pattern as QueryEnvelope), so the generated
TS types are discriminated unions with typed `spec` instead of unknown.
Also mark `alert`, `ruleType`, and `condition` required on PostableRule
so the generated TS types are non-optional for callers.
* refactor(ruler): add Enum() on EvaluationKind, ScheduleType, ThresholdKind
Surface the fixed set of accepted values for these valuer-wrapped kind
types so OpenAPI emits proper string-enum schemas and the generated TS
types become string-literal unions instead of plain string.
* refactor(ruler): mark required fields on nested rule and maintenance types
Surface fields already enforced by Validate()/UnmarshalJSON as required
in the OpenAPI schema so the generated TS types match runtime behavior.
Touches RuleCondition (compositeQuery, op, matchType), RuleThresholdData
(kind, spec), BasicRuleThreshold (name, target, op, matchType),
RollingWindow (evalWindow, frequency), CumulativeWindow (schedule,
frequency, timezone), EvaluationEnvelope (kind, spec), Schedule
(timezone), GettablePlannedMaintenance (name, schedule).
Does not mark server-populated fields (id, createdAt, updatedAt, status,
kind) on GettablePlannedMaintenance required, since the same struct is
reused for request bodies in MaintenanceStore.CreatePlannedMaintenance.
* refactor(ruler): tighten AlertCompositeQuery, QueryType, PanelType schema
Missed in the earlier tightening pass. AlertCompositeQuery.queries,
panelType, queryType are all required for a valid composite query;
QueryType and PanelType are valuer-wrapped with fixed value sets, so
expose them as enums in the OpenAPI schema.
* refactor(ruler): wrap sql.ErrNoRows as TypeNotFound in by-ID lookups
GetStoredRule and GetPlannedMaintenanceByID previously returned bun's
raw Scan error, so a missing ID leaked "sql: no rows in result set" to
the HTTP response with a 500 status. WrapNotFoundErrf converts
sql.ErrNoRows into TypeNotFound so render.Error emits 404 with a stable
`not_found` code, and passes other errors through unchanged.
* refactor(ruler): move migrated rules routes to /api/v2/rules
The 7 rules routes now live at /api/v2/rules, /api/v2/rules/{id}, and
/api/v2/rules/test — served via handler.New with render.Success and
render.Error. The legacy /api/v1/rules paths will be restored in the
query-service http handler in a follow-up so existing clients keep
receiving the SuccessResponse envelope unchanged.
Drop the /api/v1/testRule deprecated alias from signozapiserver; the
original lives on main's http_handler.go and is restored alongside the
other v1 paths.
Downtime schedule routes stay at /api/v1/downtime_schedules — single
track, no legacy restore planned.
* refactor(ruler): restore /api/v1/rules legacy handlers for back-compat
Bring the 7 rule CRUD/test handlers and their router.HandleFunc lines
back to http_handler.go so /api/v1/rules, /api/v1/rules/{id}, and
/api/v1/testRule continue to emit the legacy SuccessResponse envelope.
The v2 versions under signozapiserver are the new home for the render
envelope used by generated clients.
Delegation uses aH.ruleManager (populated from opts.Signoz.Ruler in
NewAPIHandler), so a single ruler.Ruler instance serves both paths — no
second rules.Manager is instantiated.
Downtime schedules stay single-track under signozapiserver; the 5
downtime handlers are not restored.
* docs: regenerate OpenAPI spec and frontend clients for /api/v2/rules
* refactor(ruler): return 201 Created on POST /api/v2/rules
A successful create now responds with 201 Created and the full
GettableRule body, matching REST convention for resource creation.
Regenerates the OpenAPI spec and frontend clients to reflect the new
status code.
* refactor(ruler): restore dropped sorter TODO in legacy listRules
The legacy listRules handler was copied verbatim from main during the
v1 back-compat restore, but an inner blank line and the load-bearing
`// todo(amol): need to add sorter` comment were stripped. Put them
back so the legacy block round-trips cleanly against main.
* refactor(ruler): return 201 Created on POST /api/v1/downtime_schedules
Match the REST convention already applied to POST /api/v2/rules:
successful creates respond with 201 Created. Response body remains
empty (nil); the generated frontend client surface is unchanged since
no response type was declared.
A richer "return the created resource" response body is a separate
follow-up — holding off until the ruletypes naming cleanup lands.
* fix(ruler): signal Healthy only after manager.Start closes m.block
The ruler provider didn't implement factory.Healthy, so the registry
fell back to factory.closedC and marked the service StateRunning the
instant its Start goroutine spawned — before rules.Manager.Start had
closed m.block. /api/v2/healthz therefore returned 200 while rule
evaluation was still gated, and integration tests that POSTed a rule
immediately after the readiness check saw their task goroutines stuck
on <-m.block until the next frequency tick.
Add a healthyC channel and close it inside Start only after
manager.Start returns; implement factory.Healthy so the registry and
/api/v2/healthz wait on the real readiness signal.
* fix: add the withhealthy interface
* fix(ruler): alias legacy RULES_EVAL_DELAY env var in backward-compat
The eval_delay config was moved from query-service constants (read from
RULES_EVAL_DELAY) onto ruler.Config (read via mapstructure from
SIGNOZ_RULER_EVAL__DELAY). That silently broke the legacy env var for
any existing deployment — notably the alerts integration-test fixture
which sets RULES_EVAL_DELAY=0s to let rules evaluate against just-
inserted data. The resulting default 2m delay pushed the query window
far enough back that the fixture's rate spike fell outside it, causing
8 of 24 parametrize cases in 02_basic_alert_conditions.py to fail with
"Expected N alerts to be fired but got 0 alerts".
Add RULES_EVAL_DELAY to mergeAndEnsureBackwardCompatibility alongside
the ~10 other aliased legacy env vars. Emits the standard deprecation
warning and overrides config.Ruler.EvalDelay.
* fix(member): better UX for pending invite users
* fix(member): add integration tests and reuse timezone util
* fix(member): rename deprecated and remove dead files
* fix(member): do not use hypened endpoints
* fix(member): user friendly button text
* fix(member): update the API endpoints and integration tests
* fix(member): simplify handler naming convention
* fix(member): added v2 API for update my password
* fix(member): remove more dead code
* fix(member): fix integration tests
* fix(member): fix integration tests
* refactor(alertmanager): move API handlers to signozapiserver
Extract Handler interface in pkg/alertmanager/handler.go and move
the implementation from api.go to signozalertmanager/handler.go.
Register all alertmanager routes (channels, route policies, alerts)
in signozapiserver via handler.New() with OpenAPIDef. Remove
AlertmanagerAPI injection from http_handler.go.
This enables future AuditDef instrumentation on these routes.
* fix(review): rename param, add /api/v1/channels/test endpoint
- Rename `am` to `alertmanagerService` in NewHandlers
- Add /api/v1/channels/test as the canonical test endpoint
- Mark /api/v1/testChannel as deprecated
- Regenerate OpenAPI spec
* fix(review): use camelCase for channel orgId json tag
* fix(review): remove section comments from alertmanager routes
* fix(review): use routepolicies tag without hyphen
* chore: regenerate frontend API clients for alertmanager routes
* fix: add required/nullable/enum tags to alertmanager OpenAPI types
- PostableRoutePolicy: mark expression, name, channels as required
- GettableRoutePolicy: change CreatedAt/UpdatedAt from pointer to value
- Channel: mark name, type, data, orgId as required
- ExpressionKind: add Enum() for rule/policy values
- Regenerate OpenAPI spec and frontend clients
* fix: use typed response for GetAlerts endpoint
* fix: add Receiver request type to channel mutation endpoints
CreateChannel, UpdateChannelByID, TestChannel, and TestChannelDeprecated
all read req.Body as a Receiver config. The OpenAPIDef must declare
the request type so the generated SDK includes the body parameter.
* fix: change CreateChannel access from EditAccess to AdminAccess
Aligns CreateChannel endpoint with the rest of the channel mutation
endpoints (update/delete) which all require admin access. This is
consistent with the frontend where notifications are not accessible
to editors.
* chore: initial commit
* chore: added metricNamespace as a new param
* chore: go generate openapi, update spec
* chore: frontend yarn generate:api
* chore: added metricnamespace support in /fields/values as well as added integration tests
* chore: corrected comment
* chore: added unit tests for getMetricsKeys and getMeterSourceMetricKeys
---------
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>