* feat: add store methods for minimal trace fetch
* feat: break down waterfall module to handle large spans
Handling large traces in two steps to avoid high
memory allocation
* refactor: keep the waterfall changes in new api version
This is to avoid the contract change in existing v3
* chore: avoid unnecessary diffs
* refactor: move conversion logic to types
* chore: update openapi specs
* refactor: use sqlbuider for queries
* chore: fix comment
* chore: avoid passing request type to module
* refactor: avoid passing whole summary object around
* chore: remove trace_id from querying since its already known
* chore: remove unused reference column from query
* chore: update openapi specs
* chore: added changes for crosshair sync for tooltip
* chore: minor cleanup
* chore: updated the core structure
* chore: updated the types
* chore: minor cleanup
* feat: added changes for sereis highlighting on crosshair sync
* test: added test for crosshair series highlight changes
* chore: pr review fixes
* chore: handled other cases of groupby
* chore: updated tests
* chore: added migration setup
* feat(sqlmigration): add integration_dashboards table (migration 079)
Adds the `integration_dashboards` relations table that stores the
integration-specific identity for dashboards provisioned from cloud
or builtin integrations. Columns: id, org_id, dashboard_id, provider,
slug, created_at, updated_at. Includes a unique index on dashboard_id.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(sqlmigration): backfill cloud integration dashboards to DB (migration 080)
One-time idempotent migration that provisions dashboard rows for all
orgs with existing cloud integration services where metrics are enabled.
Each dashboard is inserted into the `dashboard` table with
source="integration" and locked=true, and a companion row is added to
`integration_dashboards` with provider="cloud_integrations" and
slug="{provider}-{service}-{dashboard}" (e.g. aws-alb-overview).
Idempotency is enforced by checking (org_id, provider, slug) on
integration_dashboards before each insert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(sqlmigration): clean up stale 079 artifacts, add 079 schema migration
Remove the pre-rename 079_migrate_cloud_integration_dashboards.go and
079_cloud_integration_dashboards/ directory that were left behind when
the backfill migration was renumbered to 080. Add the missing
079_add_integration_dashboards.go (schema-only migration creating the
integration_dashboards table) which provider.go already references.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: adding comment for fk
* refactor: renaming table name
* refactor: rename and restructure cloud integration dashboard migration types
* chore: file rename
* refactor: dashboard creation and listing flow change
* refactor: removing loose strings
* refactor: adding DeleteBySource on dashboard module
* refactor: review changes and update service flow change
* refactor: simplify comments
* ci: lint staticcheck fix
* refactor: renaming migration and adding integration tests
* ci: py fmt lint fixes
* feat: adding ListSharedServices store method
* ci: golangci-lint fix
* feat(integrations): persist installed integration dashboards in DB
Provisions dashboard DB rows when an integration is installed and
deprovisions them on uninstall. Adds a backfill migration (087) for
users with already-installed integrations. Removes the on-the-fly
filesystem serving path from http_handler in favor of the standard
dashboard module.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: changing dashboard ID and other cleanup
* chore: update code structure for better readability and maintainability
* refactor: removing deprecated cloud integrations and merging
integration types
* refactor: renaming migration files and removing deprecated tests
* refactor: using BunDBCtx method instead
* ci: fix py fmt lint
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: custom notifiers in alert manager
* chore: lint fixs
* chore: fix email linter
* chore: added tracing to msteamsv2 notifier
* feat: alert manager template to template title and notification body
* chore: updated test name + code for timeout errors
* chore: added utils for using variables with $ notation
* chore: exposed templates for alertmanager types
* feat: added preprocessor for alert templater
* chore: hooked preProcess function in expandTitle and body, added labels and annotations in alertdata
* chore: fix lint issues
* chore: added handling for missing variable used in template
* feat: converted alerttemplater to interface and updated tests
* refactor: added extractCommonKV instead of 2 different functions
* test: fix preprocessor test case
* feat: added support for and in templating
* chore: lint fix
* chore: renamed the interface
* chore: added test for missing function
* refactor: test case and sb related changed
* refactor: comments and test improvements
* chore: lint fix
* chore: updated comments
* feat: added basic html markdown templater
* chore: updated newline to markdown format
* feat: slack blockkit renderer using goldmark
* test: added test for html rendering
* feat: integrated slack blockit in markdownrenderer package and removed plaintext format
* chore: updated br with new line in test and logs added
* refactor: alert manager templater
* feat: added no-op formatter in markdown rederer
* chore: return missing variables as sorted list
* feat: alert notification processor
* chore: refactor notification processor and send processor in ReceiverIntegrations
* chore: return isDefaultTemplated true even in case of blank default template
* feat: updated email notifier
* feat: update ms team notifier with notification processor
* refactor: ms teams notifier
* chore: msteams note
* feat: added notification processor in opsgenie notifier
* feat: added notification processor in slack notifier
* feat: added notification processor in pagerduty notifier
* chore: added IsCustomTemplated helper function in result struct
* feat: added notification processor in webhook notifier
* chore: updated alertmanagernotify package with updated notifier signature
* feat: slack mrkdwn renderer
* feat: added new format in markdown renderer
* test: simplify TestRenderSlackMrkdwn
* test: add new test cases for Slack MRKDWN rendering
* feat: updated slack notifier with slack mrkdwn format
* fix: webhook notifier update annotations before preparing data
* fix: added handling for labels and annotations with `.` and `-`
* fix: handled <no value> in templated response
* test: added test in notification procesor for no value
* refactor: review comments
* refactor: lint fixes
* chore: updated licenses for notifiers
* chore: updated email notifier from upstream
* chore: lint fixes
* feat: added no value extension to render <no value> in html
* feat: email rendering with custom template in notification processor
* chore: integration of custom templating in rule manager
* chore: added action links to email and slack notifiers
* chore: fix linter and merge conflict issues
* feat: added `Literal` for CompareOperator and MatchType and expose from ruleManager
* chore: error logging + NoOp type definition
* feat: return single templating result from with flag for template type
* fix: variables with symbols in template
* feat: slack mrkdwn renderer
* feat: custom raw html renderer to escape <no value>
* chore: integrated slack mrkdwn renderer and added NoOp formatter
* fix: email template directory for notification processor
* chore: remove static templates from pagerduty notifications
* chore: removed notifier test files
* fix: concurrent rendering in markdown renderer
* refactor: changes as per internal review
* chore: lint issue
* chore: removed special handling for softline break
* refactor: removed logger as markdown renderer dependency
* refactor: changed markdown renderer from interface to package-level functions
* refactor: changes as per internal review
* chore: removed notification processor
* chore: updated webhook notifier to send templated title and body in notification
* refactor: msteams skip logs and traces as factsset, slack code refactor
* chore: remove private annotations from pagerduty notifier
* chore: updated email template based on new template struct
* chore: update receiver integrations
* chore: outdated comment
* chore: move to templates/alertmanager
* chore: address comments
* chore: add example for templates
---------
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
* feat(planned-downtime): explicit toggle for all vs specific alert rules
Replace the implicit "empty alert list silences everything" behavior
with a Radio toggle ("All alert rules" / "Specific alert rules") so
users can't accidentally silence every alert by forgetting to select
rules. The list view now displays an explicit "All alert rules" tag
instead of a dash for schedules that silence everything.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: remove redundant messaging
* chore: reuse existing variable
* chore: fix typo
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
* add maintenanceMuteStage to move planned maintenance to alertmanager
Rules previously skipped rule.Eval() entirely during maintenance windows.
This change moves suppression to MaintenanceMuter, injected as a Stage
in the alertmanager notification pipeline. Now rules always evaluate and
everys suppression is handled by alertmanager.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: wrap routing pipeline once instead of per-route injection
Replace the per-route-entry loop with a single MultiStage wrap so
maintenance suppression runs once per dispatch group before routing.
* refactor: move maintenance mute stage into custom pipelineBuilder
Copy notify.PipelineBuilder locally so we can inject mms between the
silence stage and the receiver stage (GossipSettle → Inhibit →
TimeActive → TimeMute → Silence → mms → Receiver), matching the
correct suppression order the team requires.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: add license header to pipeline_builder.go
Copied code originates from Apache-2.0 licensed Prometheus Alertmanager;
add dual copyright + SPDX identifier following the repo's convention.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: replace SPDX tag with full Apache 2.0 license boilerplate
The full license text is unambiguously compliant with Apache 2.0 Section 4(a),
which requires giving recipients "a copy of this License".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: pass MaintenanceMuter directly to pipelineBuilder
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: remove dead orgID param from task constructors
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* rename buildReceiverStage -> createReceiverStage
* refactor: replace maintenanceMuteStage with notify.NewMuteStage
MaintenanceMuter already satisfies types.Muter, and pipelineBuilder has
its own pb.metrics, so the hand-rolled maintenanceMuteStage wrapper is
redundant. Use notify.NewMuteStage(pb.muter, pb.metrics) directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: hoist MuteStage construction out of the receiver loop
MuteStage holds no per-receiver state, so one instance shared across
all receivers is sufficient — matching how is/ss are handled upstream.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: always initialize maintenanceStore; remove nil guards
Tests now use a real sqlrulestore-backed MaintenanceMuter instead of
passing nil. With nil no longer a valid input, remove the nil guards
in server.go and pipeline_builder.go.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: move MaintenanceMuter to Server and pass it to pipelineBuilder.New
- Remove muter from pipelineBuilder struct and newPipelineBuilder();
pass it as a parameter to New() instead, consistent with inhibitor/silencer
- Store muter on Server so GetAlerts can call Mutes() alongside the
inhibitor and silencer, ensuring maintenance-suppressed alerts show
the correct muted status in API responses
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* remove redundant MemMarker wrapper
* feat: surface maintenance-suppressed alerts via mutedBy in GetAlerts
Alerts suppressed by an active maintenance window were being correctly
muted in the notification pipeline but appeared as state=active in the
v2 GetAlerts response, since MaintenanceMuter.Mutes had no marker
side-effect (unlike inhibitor/silencer).
Add MaintenanceMuter.MutedBy returning the matching window IDs, and
plumb a mutedByFunc callback through NewGettableAlertsFromAlertProvider
into AlertToOpenAPIAlert. The upstream v2 API forces state=suppressed
when mutedBy is non-empty, so the frontend's existing state-based
rendering picks it up without further changes.
Use the dedicated mutedBy field rather than SilencedBy to avoid
violating the "complete set of silence IDs" contract that anything
querying silences by ID would rely on.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* code cleanup
* refactor: move maintenance (planned downtime) to alertmanager packages
Types move from pkg/types/ruletypes/ to pkg/types/alertmanagertypes/:
- maintenance.go, recurrence.go, schedule.go (+ tests)
Store impl moves from pkg/ruler/rulestore/sqlrulestore/ to
pkg/alertmanager/alertmanagerstore/sqlalertmanagerstore/.
Maintenance windows mute alerts, so they belong with alertmanager
rather than the rule types.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test: add unit tests for MaintenanceMuter
Covers Mutes/MutedBy semantics (empty label, rule match, empty-RuleIDs
matches-all, future windows, multi-window) and the result cache
(single-fetch within TTL, stale-cache fallback on store error,
re-fetch after expiry, concurrency safety).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Update schema changes
* Re-add marker
* fix NewMaintenanceStore in tests
* Go lint fixes
* test: use mockery-generated mock for MaintenanceStore in muter tests
Replace hand-written fakeMaintenanceStore with a mockery-generated
MockMaintenanceStore, consistent with the alertmanagertest pattern.
Also adds MaintenanceStore to .mockery.yml so the mock stays in sync.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: regenerate mocks via make gen-mocks
Picks up new MockHandler for the Handler interface in pkg/alertmanager
and regenerates MockMaintenanceStore with canonical mockery formatting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* cleanup test
* test: add e2e muting tests for maintenance window behaviour
* Add label expression support to planned downtime
Alert instances can now be scoped by label expression
(e.g. env == "prod"), scoping suppression below the rule level.
A window with no rule IDs and a label expression silences any
alert whose labels match, regardless of which rule fired it;
when rule IDs are also present, the expression is evaluated only
within the matched rules.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Remove redundant || undefined from labelExpression assignment
* Move label expression evaluation into ShouldSkip
ShouldSkip now owns all three suppression checks in sequence:
rule ID match → schedule active → label expression.
IsActive passes nil labels so the expression check is skipped
(no instance labels available for UI status).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* remove redundant LabelSet->map conversion
* implement Down migration to drop label_expression column
* fix lset type and update openapi spec
* fix(tests): resolve envprovider env isolation, factory name length, and ShouldSkip signature
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* remove unused function `evaluateLabelExpression`
* chore: rename label expression to scope
* test(maintenance): add tests for scope label expression filtering
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(e2e): verify scope-based maintenance muting in alertmanager flow
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Use `AND` instead of `&&`
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
* fix(maintenance): consolidate label-set-to-env conversion to avoid expr panic
Move ConvertLabelSetToEnv to alertmanagertypes so both the maintenance
scope evaluator and the route-policy evaluator share one implementation.
Dotted label keys (e.g. kubernetes.node) are expanded into nested maps,
preventing the expr-lang panic that occurs when one key is a prefix of
another.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(planned-downtime): use clickable learn more link in scope tooltip
* chore(sqlmigration): renumber add_scope_to_planned_maintenance to 086
078 collided with add_sa_managed_role_txn; bumped to the next free number
and reordered registration after add_source_to_dashboard (085).
* refactor(maintenance): extract scope expression eval and surface errors
- Move ConvertLabelSetToEnv and EvalScopeExpression into expression.go
with companion tests in expression_test.go.
- EvalScopeExpression now returns (bool, error) instead of swallowing
compile/run failures and non-bool outputs; ShouldSkip logs the error
via slog.Default() and falls back to not suppressing (safety-first).
- Update test fixtures to the SQL-style operator form (`=`, AND, OR)
matching the placeholder and reviewer suggestions.
* chore: use `=` instead of `==` in expressions
* fix(maintenance): satisfy forbidigo/sloglint in scope eval
- Replace fmt.Errorf with pkg/errors Wrapf/Newf using a new
ErrCodeInvalidScopeExpression code.
- Use slog ErrorContext (with context.Background()) instead of Error to
satisfy sloglint.
* perf(maintenance): fold prefix-conflict detection into ConvertLabelSetToEnv
ConvertLabelSetToEnv now returns (env, conflict). The rulebased provider
drops its O(n^2) pre-scan and logs based on the flag, restoring the
previous O(n*d) cost while keeping the shared helper.
* chore: add docs URL for invalid scope
* refactor: don't log in types package
* remove down migration
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
The startTimeText and endTimeText useMemo hooks did not reference
timezone in their callback bodies, so including it in the dependency
arrays caused unnecessary recomputations whenever the timezone form
field changed.
* feat(web): add support for generating settings type
* feat(web): add support for generating settings type
* feat(web): add support for generating settings type
* refactor: rename generate settings to generate config web-settings
- Rename cmd/settings.go to cmd/genconfig.go
- Restructure command as `generate config web-settings`
- Move schema output to docs/config/web-settings.json
- Update frontend script to generate:config:web-settings
- Update CI checks to match new command names
- Strip Web prefix from generated JSON Schema definitions
* feat(ai-assistant): collapse thinking + tool-call steps into one row
Long sequences of thinking and tool-call rows in the chat were noisy and
pushed the actual answer below the fold. ActivityGroup folds any run of
consecutive thinking + tool events behind a single "Worked through N steps"
summary that expands on click. While streaming, the trailing group reads
"Working… · Xs · N steps" with a live elapsed-time tick that re-stamps on
approval/clarification resume.
ThinkingStep now reads "Thinking…" while live and "Thought for a few
seconds" once done. Liveness is derived purely from render position
(trailing item in a trailing live group); persisted history blocks default
to not-live so they render the same wording without depending on
server-stored timing.
* refactor(ai-assistant): tighten ActivityGroup after review
- Bare-render lone activity items: a single thinking or tool step no
longer renders as "Worked through 1 step" — the underlying chevron is
enough disclosure.
- Memoize the group partition in MessageBubble and StreamingMessage so
store updates that don't touch the message's blocks/events don't churn
the underlying step children.
- Bump the elapsed-timer tick from 500ms to 1000ms (display is
integer-second precision) and suppress the elapsed token until ≥ 1s.
- Add aria-expanded + aria-controls on the disclosure button and rename
the SCSS keyframe to activityGroupPulse to avoid a global collision.
- Document the same-instance invariant ActivityGroup's timer relies on
in groupStreamingEvents.
* refactor(ai-assistant): apply PR review feedback on ActivityGroup
- Reuse formatTime() from utils/timeUtils for the elapsed-time label
instead of a local formatter.
- Tighten the isLive JSDoc on ActivityGroup and ThinkingStep so the doc
only captures the non-obvious "why" (timer re-stamp on resume; vague
copy because the API doesn't persist precise timing).
* refactor(ai-assistant): unify activity rows under ActivityGroup
Drop the bare-render shortcut for single-item activity groups and route
every "what the agent did" row through ActivityGroup. The summary now
adapts to the item count and kind — single-item groups read
"Thinking… / Thought for a few seconds" or the tool's display text
instead of the awkward "Worked through 1 step", and single-item
expansion renders the underlying content body directly (no second
chevron disclosure).
Extracts ThinkingContent / ToolCallContent body sub-components and a
small thinkingLabel / getToolDisplayLabel helper so ActivityGroup can
reuse them without duplicating markup.
* refactor(ai-assistant): apply ActivityGroup review feedback
- Rename common CSS module class names to scoped variants
(.group → .activityGroup, .header → .activityHeader, etc.) so
matches against module classes carry intent and don't collide
with future styles in adjacent files.
- Swap the disclosure <button> for a <div> with onClick — drops the
signoz Button component option since its action-button defaults
(focus ring, base padding, hover background) visually regressed
the quiet full-width row. Matches the existing ThinkingStep /
ToolCallStep disclosure pattern.
- Drop the useId + aria-controls plumbing; with the disclosure
collapsed inline beneath the header, the id wasn't carrying
semantic weight beyond what aria-expanded already provides.
- Thread stable id fields through ActivityItem and the RenderGroup
types so list keys come from typed data rather than the loop
index. Persisted tool blocks key off the server-assigned
toolCallId; streaming items and thinking blocks key off their
position in the append-only source array.
* chore: added changes to migerate alert chart component to new charts
* chore: minor changes
* chore: minor changes
* chore: pr review changes
* chore: minor refactor
* chore: added migration setup
* feat(sqlmigration): add integration_dashboards table (migration 079)
Adds the `integration_dashboards` relations table that stores the
integration-specific identity for dashboards provisioned from cloud
or builtin integrations. Columns: id, org_id, dashboard_id, provider,
slug, created_at, updated_at. Includes a unique index on dashboard_id.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(sqlmigration): backfill cloud integration dashboards to DB (migration 080)
One-time idempotent migration that provisions dashboard rows for all
orgs with existing cloud integration services where metrics are enabled.
Each dashboard is inserted into the `dashboard` table with
source="integration" and locked=true, and a companion row is added to
`integration_dashboards` with provider="cloud_integrations" and
slug="{provider}-{service}-{dashboard}" (e.g. aws-alb-overview).
Idempotency is enforced by checking (org_id, provider, slug) on
integration_dashboards before each insert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(sqlmigration): clean up stale 079 artifacts, add 079 schema migration
Remove the pre-rename 079_migrate_cloud_integration_dashboards.go and
079_cloud_integration_dashboards/ directory that were left behind when
the backfill migration was renumbered to 080. Add the missing
079_add_integration_dashboards.go (schema-only migration creating the
integration_dashboards table) which provider.go already references.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: adding comment for fk
* refactor: renaming table name
* refactor: rename and restructure cloud integration dashboard migration types
* chore: file rename
* refactor: dashboard creation and listing flow change
* refactor: removing loose strings
* refactor: adding DeleteBySource on dashboard module
* refactor: review changes and update service flow change
* refactor: simplify comments
* ci: lint staticcheck fix
* refactor: renaming migration and adding integration tests
* ci: py fmt lint fixes
* feat: adding ListSharedServices store method
* ci: golangci-lint fix
* refactor: code cleanup
* chore: revert changed due to js lint
* refactor: test assertion changes
* refactor: using bindparam for sql generation
* chore: migrate integration dashboards json to v5 (#11419)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
IsDotMetricsEnabled always returns true so the _dot.json variants were
always served. Replace each non-dot dashboard JSON with the dot content,
delete the _dot.json files, and remove the dead flag-check logic from
HydrateFileUris.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: replace Ant Design Switch with Signoz UI Switch across multiple components
* fix: update snapshot of failing test
* feat: update snapshot
* refactor: update imports to use Signoz UI Switch from the new path across multiple components
* refactor: update banned components to use Signoz UI imports for Typography and Switch
* refactor: replace Ant Design Switch with Signoz UI Switch
* chore: updated the signozhq version and removed ts-expect-error from button
* chore: renamed authz test with authz.test.tsx
* chore: remove error from useAuthZ public API and fallbackOnError from GuardAuthZ
* chore: updated test cases
* chore: updated test cases
* chore: restore error to useAuthZ API with fail-open default in GuardAuthZ
* chore: updated test cases
* fix(user-info): surfaced errors for reset password and fixed issues
* fix(user-info): removed notification from atnd and used toast and showerrormodal in userinfo
* fix(user-info): refactor and added tests
* fix(user-info): code refactor
* chore: baseline setup
* chore: endpoint detail update
* chore: added logic for hosts v3 api
* fix: bug fix
* chore: disk usage
* chore: added validate function
* chore: added some unit tests
* chore: return status as a string
* chore: yarn generate api
* chore: removed isSendingK8sAgentsMetricsCode
* chore: moved funcs
* chore: added validation on order by
* chore: added pods list logic
* chore: updated openapi yml
* chore: updated spec
* chore: pods api meta start time
* chore: nil pointer check
* chore: nil pointer dereference fix in req.Filter
* chore: added temporalities of metrics
* chore: added pods metrics temporality
* chore: unified composite key function
* chore: code improvements
* chore: added pods list api updates
* chore: hostStatusNone added for clarity that this field can be left empty as well in payload
* chore: yarn generate api
* chore: return errors from getMetadata and lint fix
* chore: return errors from getMetadata and lint fix
* chore: added hostName logic
* chore: modified getMetadata query
* chore: add type for response and files rearrange
* chore: warnings added passing from queryResponse warning to host lists response struct
* chore: added better metrics existence check
* chore: added a TODO remark
* chore: added required metrics check
* chore: distributed samples table to local table change for get metadata
* chore: frontend fix
* chore: endpoint correction
* chore: endpoint modification openapi
* chore: escape backtick to prevent sql injection
* chore: rearrage
* chore: improvements
* chore: validate order by to validate function
* chore: improved description
* chore: added TODOs and made filterByStatus a part of filter struct
* chore: ignore empty string hosts in get active hosts
* feat(infra-monitoring): v2 hosts list - return counts of active & inactive hosts for custom group by attributes (#10956)
* chore: add functionality for showing active and inactive counts in custom group by
* chore: bug fix
* chore: added subquery for active and total count
* chore: ignore empty string hosts in get active hosts
* fix: sinceUnixMilli for determining active hosts compute once per request
* chore: refactor code
* chore: rename HostsList -> ListHosts
* chore: rearrangement
* chore: inframonitoring types renaming
* chore: added types package
* chore: file structure further breakdown for clarity
* chore: comments correction
* chore: removed temporalities
* chore: pods code restructuring
* chore: comments resolve
* chore: added json tag required: true
* chore: removed pod metric temporalities
* chore: removed internal server error
* chore: added status unauthorized
* chore: remove a defensive nil map check, the function ensure non-nil map when err nil
* chore: cleanup and rename
* chore: make sort stable in case of tiebreaker by comparing composite group by keys
* chore: regen api client for inframonitoring
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: added phase counts feature
* chore: added queries for pod phase counts in custom group by
* chore: added required tags
* chore: added support for pod phase unknown
* chore: removed pods - order by phase
* chore: improved api description to document -1 as no data in numeric fields
* fix: rebase fixes
* chore: added unknown phase count
* fix: isPodUIDInGroupBy in buildPodRecords
* chore: 3 cte --> 2 cte
* chore: pod phase with local table of time series as counts
* chore: comment correction
* chore: corrected comment
* chore: value column for samples table added
* chore: removed query G for phase counts
* chore: rename variable
* chore: added PodPhaseNum constants to types
* feat(infra-monitoring): v2 pods list apis - phase counts when custom grouping (#11088)
* chore: added phase counts feature
* chore: added queries for pod phase counts in custom group by
* chore: added unknown phase count
* fix: isPodUIDInGroupBy in buildPodRecords
* chore: 3 cte --> 2 cte
* chore: pod phase with local table of time series as counts
* chore: comment correction
* chore: corrected comment
* chore: value column for samples table added
* chore: removed query G for phase counts
* chore: rename variable
* chore: added PodPhaseNum constants to types
* chore: nodes list v2 full blown
* chore: metadata fix
* chore: updated comment
* chore: namespaces code
* chore: v2 nodes api
* chore: rename
* chore: v2 clusters list api
* chore: namespaces code
* chore: rename
* chore: review clusters PR
* chore: pvcs code added
* chore: updated endpoint and spec
* chore: pvcs todo
* chore: added condition
* chore: added filter
* chore: added code for deployments
* chore: query nit
* chore: statefulsets code added
* chore: base filter added
* chore: added base deployments change
* chore: added base condition
* chore: v2 jobs list api added
* chore: added daemonsets api
* chore: added pod phase counts
* chore: for pods and nodes, replace none with no_data
* chore: node and pod counts structs added
* chore: namespace record uses PodCountsByPhase
* chore: cluster record uses PodCountsByPhase, NodeCountsByReadiness
* chore: deployment record uses PodCountsByPhase
* chore: statefulset record uses PodCountsByPhase
* chore: job record uses PodCountsByPhase
* chore: daemonset record uses PodCountsByPhase
* chore: added remaining metrics to check
* chore: metrics existence check
* chore: statefulset metrics added
* chore: added jobs metrics
* chore: added metrics
* chore: feature added
* chore: cosmetic changes
* chore: replaced common order by key with entity specific attr key
* chore: moved paginateByName to types and added unit tests
* chore: added pageGroups
* chore: assert added instead of require
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Ashwin Bhatkal <ashwin96@gmail.com>
* feat(no-auth): wire preflight global-config check and gate AppRoutes render & cleanAuthStorage util
* feat(no-auth): setup interceptor and ui hiding for no auth mode
* feat(no-auth): replace hide pattern with disable+tooltip via NoAuthGuard
* feat(no-auth): replace localstorage approach with module-level singleton
* feat(no-auth): added no-auth announcement banner and added authguard on sa page
* feat(no-auth): added more authguard
* feat(no-auth): fixes and refactor after rebase
* feat(no-auth): added noauth guard at more places and added tests
* feat(no-auth): refactor and feedback fix
* feat(no-auth): added noauth guard at more places and refactor
* feat(no-auth): changed banner text and code refactor
* feat(no-auth): added doc link under learn more text
* feat(no-auth): removed ui guards and special handling for the no auth mode'
* feat(no-auth): updated test case
* feat(onboarding): add Cert Manager, GraphQL, Railway, ASP.NET Core Metrics, Istio, log/slog, Scala, Apache Druid, Azure CDN FrontDoor datasources and update Fly.io, Azure Blob Storage
- Add new onboarding entries for Cert Manager, GraphQL, Railway, ASP.NET Core Metrics,
Istio Metrics, log/slog, Scala, Apache Druid, and Azure CDN / Front Door
- Add SVG logos for all new datasources
- Update Fly.io entry with logs support and new docs link
- Add One Click Azure option to Azure Blob Storage entry
- Azure CDN FrontDoor links directly to integrations page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: format onboarding config with oxfmt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(alerts-new): show tabs and breadcrumbs on create alert
* fix(pr): address comments
* fix(composite-query): not automatically showing the create alerts when have this query param
* fix(breadcrumb): align ui with periscope
* fix: backend changes for message key postprocessing
* fix: message postprocessing
* chore: update in e2e tests
* fix: table view
* fix: support body as json in FE
* chore: separate frontend from backend changes
* chore: remove dead code
* add maintenanceMuteStage to move planned maintenance to alertmanager
Rules previously skipped rule.Eval() entirely during maintenance windows.
This change moves suppression to MaintenanceMuter, injected as a Stage
in the alertmanager notification pipeline. Now rules always evaluate and
everys suppression is handled by alertmanager.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: wrap routing pipeline once instead of per-route injection
Replace the per-route-entry loop with a single MultiStage wrap so
maintenance suppression runs once per dispatch group before routing.
* refactor: move maintenance mute stage into custom pipelineBuilder
Copy notify.PipelineBuilder locally so we can inject mms between the
silence stage and the receiver stage (GossipSettle → Inhibit →
TimeActive → TimeMute → Silence → mms → Receiver), matching the
correct suppression order the team requires.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: add license header to pipeline_builder.go
Copied code originates from Apache-2.0 licensed Prometheus Alertmanager;
add dual copyright + SPDX identifier following the repo's convention.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: replace SPDX tag with full Apache 2.0 license boilerplate
The full license text is unambiguously compliant with Apache 2.0 Section 4(a),
which requires giving recipients "a copy of this License".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: pass MaintenanceMuter directly to pipelineBuilder
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: remove dead orgID param from task constructors
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* rename buildReceiverStage -> createReceiverStage
* refactor: replace maintenanceMuteStage with notify.NewMuteStage
MaintenanceMuter already satisfies types.Muter, and pipelineBuilder has
its own pb.metrics, so the hand-rolled maintenanceMuteStage wrapper is
redundant. Use notify.NewMuteStage(pb.muter, pb.metrics) directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: hoist MuteStage construction out of the receiver loop
MuteStage holds no per-receiver state, so one instance shared across
all receivers is sufficient — matching how is/ss are handled upstream.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: always initialize maintenanceStore; remove nil guards
Tests now use a real sqlrulestore-backed MaintenanceMuter instead of
passing nil. With nil no longer a valid input, remove the nil guards
in server.go and pipeline_builder.go.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: move MaintenanceMuter to Server and pass it to pipelineBuilder.New
- Remove muter from pipelineBuilder struct and newPipelineBuilder();
pass it as a parameter to New() instead, consistent with inhibitor/silencer
- Store muter on Server so GetAlerts can call Mutes() alongside the
inhibitor and silencer, ensuring maintenance-suppressed alerts show
the correct muted status in API responses
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* remove redundant MemMarker wrapper
* feat: surface maintenance-suppressed alerts via mutedBy in GetAlerts
Alerts suppressed by an active maintenance window were being correctly
muted in the notification pipeline but appeared as state=active in the
v2 GetAlerts response, since MaintenanceMuter.Mutes had no marker
side-effect (unlike inhibitor/silencer).
Add MaintenanceMuter.MutedBy returning the matching window IDs, and
plumb a mutedByFunc callback through NewGettableAlertsFromAlertProvider
into AlertToOpenAPIAlert. The upstream v2 API forces state=suppressed
when mutedBy is non-empty, so the frontend's existing state-based
rendering picks it up without further changes.
Use the dedicated mutedBy field rather than SilencedBy to avoid
violating the "complete set of silence IDs" contract that anything
querying silences by ID would rely on.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* code cleanup
* refactor: move maintenance (planned downtime) to alertmanager packages
Types move from pkg/types/ruletypes/ to pkg/types/alertmanagertypes/:
- maintenance.go, recurrence.go, schedule.go (+ tests)
Store impl moves from pkg/ruler/rulestore/sqlrulestore/ to
pkg/alertmanager/alertmanagerstore/sqlalertmanagerstore/.
Maintenance windows mute alerts, so they belong with alertmanager
rather than the rule types.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test: add unit tests for MaintenanceMuter
Covers Mutes/MutedBy semantics (empty label, rule match, empty-RuleIDs
matches-all, future windows, multi-window) and the result cache
(single-fetch within TTL, stale-cache fallback on store error,
re-fetch after expiry, concurrency safety).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Update schema changes
* Re-add marker
* fix NewMaintenanceStore in tests
* Go lint fixes
* test: use mockery-generated mock for MaintenanceStore in muter tests
Replace hand-written fakeMaintenanceStore with a mockery-generated
MockMaintenanceStore, consistent with the alertmanagertest pattern.
Also adds MaintenanceStore to .mockery.yml so the mock stays in sync.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: regenerate mocks via make gen-mocks
Picks up new MockHandler for the Handler interface in pkg/alertmanager
and regenerates MockMaintenanceStore with canonical mockery formatting.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* cleanup test
* test: add e2e muting tests for maintenance window behaviour
* fix updates: omit empty endTime from serialization
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(ai-assistant): add product analytics events
Wire 14 frontend product-analytics events for the AI Assistant feature
so we can measure the open→send funnel, action conversion, voice usage,
and feature adoption. All events go through the existing `logEvent`
helper, with a shared `useAIAssistantAnalyticsContext` hook providing
`{ threadId, page, mode }`.
Events shipped:
- AI Assistant: Opened (source: icon | shortcut | deeplink)
- AI Assistant: New chat clicked
- AI Assistant: Message sent
- AI Assistant: Suggested prompt clicked
- AI Assistant: Cancel clicked
- AI Assistant: Regenerate clicked
- AI Assistant: Message copied
- AI Assistant: Feedback submitted
- AI Assistant: Resource opened
- AI Assistant: Doc opened
- AI Assistant: Apply filter clicked
- AI Assistant: Thread opened from history
- AI Assistant: Voice input used
- AI Assistant: Voice input failed
Additional changes:
- Suppress duplicate `Opened` fires when expanding drawer/modal to the
full-screen page (markExpandFromInApp / consumeExpandFromInApp flag).
- Toast + analytics + sessionStorage-persisted hide for voice failures
on Chromium derivatives that lack the Google Speech API key.
- Browser info (name, version, platform, userAgent) attached to voice
events to triage browser-specific failures.
Skipped per scope: executionId on Cancel clicked, toolName on action
events, turnCategory on Feedback submitted, promptCategory on suggested
prompts — would require store/DTO changes beyond instrumentation.
* fix(ai-assistant): address review feedback on analytics events
- Replace markExpand module flag with router state so the Opened event
stays correct across StrictMode double-mounts and aborted navigations.
- Guard the voice push-to-talk shortcut on voiceUnavailable so it can't
bypass the persisted hide-after-failure flag.
- Fire SuggestedPromptClicked (category: follow_up) alongside MessageSent
on server-emitted follow_up chips so click-through can be measured.
- Normalize the page/currentPage attribute to its ROUTES template via
matchPath, bounding cardinality and avoiding customer IDs in analytics.
- Pick browsers from userAgentData via a derivative-first priority list,
fall through to UA sniffing for generic Chromium hits, and probe
navigator.brave to distinguish Brave from plain Chrome.
* refactor(ai-assistant): simplify analytics scaffolding
- Trim getBrowserInfo to UA-sniffing + Brave probe; drop the brand
priority list, isGenericBrand gate, and userAgent/platform fields
the backend can derive from request headers anyway.
- Inline the router-state shape at its three call sites instead of
exporting a named interface for { fromInApp?: boolean }.
- Tighten comments across the module — keep the non-obvious "why"
bits, drop the restated ones.
* fix(ai-assistant): apply PR review feedback on analytics events
- HeaderRightSection: rename Opened source 'icon' -> 'header' to reflect
where the icon lives, not how it looks.
- AIAssistantPage: normalize pathname on NewChatClicked so the
conversation id doesn't leak into the page attribute.
- ConversationView: invert the streaming useEffect to an early bailout
when not streaming for readability.
- ActionsSection: extract resource-type case strings into a ResourceType
constants object shared by targetModuleForResource and resourceRoute.
- VirtualizedMessages + ActionsSection: replace 'follow_up' / 'empty_state'
magic strings with a SuggestedPromptCategory constants in events.ts.
* fix(planned-downtime): timezone handling
Don't convert the start/end times to UTC for the request. Serialize
as per the input timezone instead.
* Fix date/string conversion issues
---------
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
* feat(sqlmigration): add integration_dashboards table (migration 079)
Adds the `integration_dashboards` relations table that stores the
integration-specific identity for dashboards provisioned from cloud
or builtin integrations. Columns: id, org_id, dashboard_id, provider,
slug, created_at, updated_at. Includes a unique index on dashboard_id.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: adding comment for fk
* refactor: renaming table name
* chore: file rename
* refactor: removing org_id column and adding fk relation
* refactor: rename integration dashboards factory to singular
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Bundles four small UX fixes — three regressions from the typography
(#11199) and icons (#11222) migrations, plus the DashboardDescription
fallout from #11352:
- Widget panel title truncates to "Title..." even when the panel has
plenty of horizontal space. The title container had no `flex: 1` /
`min-width: 0`, so it collapsed to content width and the 80% cap
triggered early truncation. Make the title row a real flex item.
- Variable editor "Default Value" label and helper text run together
on one line. `Typography` from `@signozhq/ui` defaults to
`display: inline`, so the helper text sat next to the label. Force
block layout in the default-value-section.
- Cross-Panel Sync info icon was the outline `Info`, inconsistent with
the `SolidInfoCircle` used everywhere else (widget header, threshold,
status message). Swap to the standard icon at size "md".
- After #11352, DeleteButton renders as an antd `<Button>`, but the
DashboardDescription action menu still targeted `.ant-typography`
for the delete entry, so it picked up the list-page module's 8px /
12px styling and went out of sync with its peers. Consolidate the
three near-duplicate `section-1` / `section-2` / `delete-dashboard`
blocks into a single `section .ant-btn` rule, with section dividers
and the danger color as the only per-section overrides.
* fix: maintenance ignores recurrence when fixed times also set
* send empty start/end dates in frontend for recurring windows
* handle zero start and end times in schedule
* Revert "send empty start/end dates in frontend for recurring windows"
This reverts commit 87bc3fae274ccfd9ce98aeae5ac379fadf657df3.
* Remove start and end time from recurrence
* fix display timezone
* remove redundant param `shouldKeepLocalTime`
* handle empty initial start time
* fix CI issues
* Revert "fix CI issues"
This reverts commit 772e6486bb03ec836ebdce436e820aa0d1defdda.
* Revert "handle empty initial start time"
This reverts commit 82e7c72a338b019dea57def1c61795ca749aacc0.
* Revert "remove redundant param `shouldKeepLocalTime`"
This reverts commit ed942426745b8b534cdc47dc8b885beef0d6c2f1.
* Revert "fix display timezone"
This reverts commit 9b2a61674e883f2b47f5bd52413e257ef6f861d3.
* Revert "Remove start and end time from recurrence"
This reverts commit ab0df8e22d6099772eec79af11d2453a9d95e157.
* Revert "Revert "send empty start/end dates in frontend for recurring windows""
This reverts commit 15a4166d3740877b601f16ba208dd3c291b387f2.
* Revert "handle zero start and end times in schedule"
This reverts commit 58a5aecb82f1aa4f8d5549e391f1f2c5c7574be2.
* Revert "send empty start/end dates in frontend for recurring windows"
This reverts commit 0470cc7a84f6e9f91cccd73d7841b884342031d4.
* log maintenance window for schedule-recurrence timestamp mismatch
---------
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
The Delete dashboard entry in the dashboards action menu was rendered with
a `<Flex justify="center">` and a custom `TableLinkText` span. This caused
the icon and label to be center-aligned, sized differently, and spaced
differently from the four sibling entries (View, Open in New Tab, Copy
Link, Export JSON) which use an antd `<Button>` with `.action-btn` styling.
Switch the Delete entry to the same antd `<Button>` structure as the rest
of the menu so the icon size, icon-to-text spacing, and left alignment
all match. While here, collapse the `section-1` / `section-2` wrappers
into a single `.actionContent` and move the action-menu styles into a
co-located CSS module (`DashboardActions.module.scss`) with a `deleteBtn`
modifier that carries the divider and the danger color via the
`--danger-background` semantic token.
* feat(role-fga): added feature flag gate on roles fga - create and details page
* feat(role-fga): updated tests
* feat(role-fga): added is role gate fetching logic including feature flag loading
* feat(role-fga): fix the rolesselect search not working for the dropdown options
* feat(role-fga): updated tests and refactor
Orval v8's pre-generation validator (@scalar/openapi-parser) treats every
`$ref` key as a JSON Reference. Our spec embeds Perses' `common.JSONRef`
struct, which has a property literally named `$ref`, so validation aborts
with `INVALID_REFERENCE`. Set `input.unsafeDisableValidation: true` to
bypass — codegen itself handles the spec correctly, and the spec is
backend-generated and CI-gated.
ClosesSigNoz/engineering-pod#4963
* feat: span details floating drawer added
* feat: span details folder rename
* feat: replace draggable package
* feat: fix pinning. fix drag on top
* feat: add bound to drags while floating
* feat: add collapsible sections in trace details
* feat: use resizable for waterfall table as well
* feat: copy link change and url clear on span close
* feat: fix span details headr
* feat: key value label style fixes
* feat: linked spans
* feat: style fixes
* feat: setup types and interface for waterfall v3
v3 is required for udpating the response json of
the waterfall api. There wont' be any logical change.
Using this requirement as an opportunity to move
waterfall api to provider codebase architecture from
older query-service
* refactor: move type conversion logic to types pkg
* chore: add reason for using snake case in response
* fix: update span.attributes to map of string to any
To support otel format of diffrent types of attributes
* fix: remove unused fields and rename span type
To avoid confusing with otel span
* refactor: convert waterfall api to modules format
* chore: add same test cases as for old waterfall api
* chore: avoid sorting on every traversal
* fix: remove unused fields and rename span type
To avoid confusing with otel span
* fix: rename timestamp to milli for readability
* fix: add timeout to module context
* fix: use typed paramter field in logs
* feat: api integration
* feat: add limit
* feat: minor change
* feat: supress click
* chore: generate openapi spec for v3 waterfall
* feat: fix test
* feat: fix test
* feat: lint fix
* feat: span details ux
* feat: analytics
* feat: add icons
* feat: added loading to flamegraph and timeout to webworker
* feat: sync error and loading state for flamegraph for n/w and computation logic
* feat: auto scroll horizontally to span
* feat: show total span count
* feat: disable anaytics span tab for now
* feat: add span details loader
* feat: prevent api call on closing span detail
* fix: remove timeout since waterfall take longer
* fix: use int16 for status code as per db schema
* fix: update openapi specs
* feat: make filter and search work with flamegraph
* feat: filter ui fix
* feat: remove trace header
* feat: new filter ui
* feat: setup types and interface for waterfall v3
v3 is required for udpating the response json of
the waterfall api. There wont' be any logical change.
Using this requirement as an opportunity to move
waterfall api to provider codebase architecture from
older query-service
* refactor: move type conversion logic to types pkg
* chore: add reason for using snake case in response
* fix: update span.attributes to map of string to any
To support otel format of diffrent types of attributes
* fix: remove unused fields and rename span type
To avoid confusing with otel span
* refactor: convert waterfall api to modules format
* chore: add same test cases as for old waterfall api
* chore: avoid sorting on every traversal
* fix: remove unused fields and rename span type
To avoid confusing with otel span
* fix: rename timestamp to milli for readability
* fix: add timeout to module context
* fix: use typed paramter field in logs
* chore: generate openapi spec for v3 waterfall
* fix: remove timeout since waterfall take longer
* fix: use int16 for status code as per db schema
* fix: update openapi specs
* feat: api integration
* feat: automatically scroll left on vertical scroll
* feat: reduce time
* feat: set limit to 100k for flamegraph
* feat: show child count in waterfall
* fix: align timeline and span length in flamegraph and waterfall
* feat: fix flamegraph and waterfall bg color
* feat: show caution on sampled flamegraph
* feat: api integration v3
* feat: disable scroll to view for collapse and uncollapse
* feat: setup types and interface for waterfall v3
v3 is required for udpating the response json of
the waterfall api. There wont' be any logical change.
Using this requirement as an opportunity to move
waterfall api to provider codebase architecture from
older query-service
* refactor: move type conversion logic to types pkg
* chore: add reason for using snake case in response
* fix: update span.attributes to map of string to any
To support otel format of diffrent types of attributes
* fix: remove unused fields and rename span type
To avoid confusing with otel span
* refactor: convert waterfall api to modules format
* chore: add same test cases as for old waterfall api
* chore: avoid sorting on every traversal
* fix: remove unused fields and rename span type
To avoid confusing with otel span
* fix: rename timestamp to milli for readability
* fix: add timeout to module context
* fix: use typed paramter field in logs
* chore: generate openapi spec for v3 waterfall
* fix: remove timeout since waterfall take longer
* fix: use int16 for status code as per db schema
* fix: update openapi specs
* refactor: break down GetWaterfall method for readability
* chore: avoid returning nil, nil
* refactor: move type creation and constants to types package
- Move DB/table/cache/windowing constants to tracedetailtypes package
- Add NewWaterfallTrace and NewWaterfallResponse constructors in types
- Use constructors in module.go instead of inline struct literals
- Reorder waterfall.go so public functions precede private ones
* refactor: extract ClickHouse queries into a store abstraction
Move GetTraceSummary and GetTraceSpans out of module.go into a
traceStore interface backed by clickhouseTraceStore in store.go.
The module struct now holds a traceStore instead of a raw
telemetrystore.TelemetryStore, keeping DB access separate from
business logic.
* refactor: move error to types as well
* refactor: separate out store calls and computations
* refactor: breakdown GetSelectedSpans for readability
* refactor: return 404 on missing trace and other cleanup
* refactor: use same method for cache key creation
* chore: remove unused duration nano field
* chore: use sqlbuilder in clickhouse store where possible
* feat: dropdown added to span details
* feat: fix color duplications
* feat: no data screen
* feat: old trace btn added
* feat: minor fix
* feat: rename copy to copy value
* feat: delete unused file
* feat: use semantic tokens
* feat: use semantic tokens
* feat: add crosshair
* feat: fix test
* feat: disable crosshair in waterfall
* feat: fix colors
* feat: minor fix
* feat: add status codes
* feat: load all spans in waterfall under limit
* feat: uncollapse spans on select from flamegraph
* feat: style fix
* feat: add service name
* feat: open in new tab
* feat: add trace details header
* feat: add trace details header styles
* feat: add trace details header styles
* feat: minor changes
* feat: floating fields set
* feat: filters init
* feat: filter toggle added
* feat: fix color
* fix: scroll to span in frontend mode
* feat: delete waterfall go
* feat: minor change
* feat: minor change
* feat: lint fix
* feat: analytics spans
* feat: color by field
* feat: save color by pref in user pref
* feat: migrate v2 pinned attr
* feat: preview fields
* feat: minor refactors
* feat: minor refactors
* feat: v3 behind feature flag
* feat: minor refactors
* feat: packages remove
* feat: packages remove
* feat: remove common component
* feat: remove antd component usage
* feat: leaf node indent fix
* feat: fix mouse wheel in json view
* feat: update signoz ui
* feat: remove feature flag
* feat: fixed the waterfall span hover card
* feat: fix hidden filters
* feat: trace details always visible
* feat: correct status code
* fix: pagination calls in waterfall
* feat: fix failing test
* feat: show error count
* feat: fix waterfall child sibling indent
* feat: change how we show span hover data in waterfall
* feat: fix logs in span details styles
* feat: minor fixes
* feat: make trace id copyable
* feat: add status message to highlight section
* feat: persist user choosing old view
* feat: add more fields in color by
* feat: add llm as fast filter
* feat: show api error correctly
* feat: update test cases
* feat: revert route change
* feat: revert route change
* feat: replace antd btns
* feat: allow removing all fields in preview
* feat: send selected span when flamegraph is sampled
* feat: only scroll when span is not in view
* feat: auto expand on highlight errors
* feat: move analytics panel
* feat: additional check
* feat: minor fix
* feat: minor fix
* feat: dont use antd button and tooltip
* feat: dont use antd button and tooltip
* feat: update icons
* feat: minor change
* feat: minor change
* feat: move to zustand
* feat: update test cases
* feat: update border color
* feat: add icons
* feat: support filter on parent keys
* feat: add links to non filterable keys
* feat: minor fix
* feat: use pinned attributes accross views
* feat: update tests
* feat: hide v3
* feat: migrate to css modules
* feat: fix minor style
* feat: fix test
* feat: enable new trace details
* feat: remove unnecessary waterfall api calls if span already in the list
* feat: minor change
---------
Co-authored-by: Nikhil Soni <nikhil.soni@signoz.io>
* feat(role-sa-fga): updated roles detail permission panel with the new allowedVerb gate
* feat(role-sa-fga): added anonymous in roles, sa routes to allow user access without managed role
* feat(role-sa-fga): gated roles create and details page behind a valid license check
* feat(role-sa-fga): added test and some refactor
* fix: order by ignored in formula query
* fix: order by ignored in formula query
* fix: added intergation test
* fix: revert integarion test changes
* fix: added an independent integration test
* fix: make py-fmt
* fix: removed comment
---------
Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
Co-authored-by: Pandey <vibhupandey28@gmail.com>
* fix: added cost() to cloneable interface
* fix: added a new metrics and converted into counters
* fix: address comments
* fix: simplify test
* fix: use assert instead of require
* feat(sa-fga): changed the id from kind to kind+type
* feat(sa-fga): service account fga changes with common components for errors
* feat(sa-fga): added fga at more places in service account
* feat(sa-fga): refactor based on feedbacks
* feat(sa-fga): refactor and role page fga
* fix(authz): add attach detach permissions on metaresource
* feat(sa-fga): refactor and role page fga
* feat(sa-fga): test case fixes
* feat(sa-fga): enabled role detail page and remove the config flag
* feat(sa-fga): test case fixes
* feat(sa-fga): udpated the role details metaresource condition to list/create
* feat(sa-fga): test case fixes
* feat(sa-fga): feedback fixes from the copliot comments
* feat(sa-fga): feedback fixes from the reveiw comments and authztootip upgrade
* feat(sa-fga): feedback fixes from the testing and refactors
* feat(sa-fga): test cases fixes
* feat(sa-fga): added beta for the roles page
* feat(sa-fga): added roles doc and roles read check with name in the url param
* Revert "fix(authz): add attach detach permissions on metaresource"
This reverts commit 34938bb4ce.
---------
Co-authored-by: vikrantgupta25 <vikrant@signoz.io>
* fix(deps): upgrade dependencies to resolve high/critical security alerts
Upgrade pgx/v5 (v5.8.0→v5.9.2), prometheus (v0.310.0→v0.311.3),
gosaml2 (v0.9.0→v0.11.0), goxmldsig (v1.2.0→v1.6.0), and
urllib3 (2.6.3→2.7.0) to fix all open high/critical Dependabot alerts.
Adapt parser.ParseExpr calls to use the new Parser interface introduced
in prometheus v0.311.x.
* refactor: reuse a single PromQL parser instance instead of creating per call
Add Parser() to the prometheus.Prometheus interface so a single
parser.Parser is created at startup and shared across all consumers.
For the legacy v2 querier and PromQLFilterExtractor (which don't have
access to the Prometheus interface), store a parser instance on the
struct, created once during construction.
* refactor: centralize PromQL parser creation via prometheus.NewParser()
Add pkg/prometheus/parser.go with a Parser type alias and NewParser()
factory function, mirroring the existing Engine/NewEngine pattern.
All consumers now create parsers through this single entry point
instead of calling parser.NewParser(parser.Options{}) directly.
* fix(infra-monitoring): error due to invalid operators on query builder
* fix(query-builder): keep not_in and transform to nin, same for other operators
* chore(code-cleanup): clean the duplicated code and bugs
* feat(k8s-base-details): migrate logs/traces/events to query builder v5
* feat(infra-monitoring-details): migrate metrics to query range v5 (#11161)
* fix(query-builder): not updating query on hit enter / better context organization
* fix(hooks): missing cancel param
* fix(infra-monitoring): not invalidating queries on refresh button
* refactor(infra-monitoring): handle keys not found & avoid re-renders on query search change (#11312)
* fix(infra-monitoring): do not render error when key not found
* fix(query-search): reduce amount of re-renders due to need of initial expression
* chore: baseline setup
* chore: endpoint detail update
* chore: added logic for hosts v3 api
* fix: bug fix
* chore: disk usage
* chore: added validate function
* chore: added some unit tests
* chore: return status as a string
* chore: yarn generate api
* chore: removed isSendingK8sAgentsMetricsCode
* chore: moved funcs
* chore: added validation on order by
* chore: added pods list logic
* chore: updated openapi yml
* chore: updated spec
* chore: pods api meta start time
* chore: nil pointer check
* chore: nil pointer dereference fix in req.Filter
* chore: added temporalities of metrics
* chore: added pods metrics temporality
* chore: unified composite key function
* chore: code improvements
* chore: added pods list api updates
* chore: hostStatusNone added for clarity that this field can be left empty as well in payload
* chore: yarn generate api
* chore: return errors from getMetadata and lint fix
* chore: return errors from getMetadata and lint fix
* chore: added hostName logic
* chore: modified getMetadata query
* chore: add type for response and files rearrange
* chore: warnings added passing from queryResponse warning to host lists response struct
* chore: added better metrics existence check
* chore: added a TODO remark
* chore: added required metrics check
* chore: distributed samples table to local table change for get metadata
* chore: frontend fix
* chore: endpoint correction
* chore: endpoint modification openapi
* chore: escape backtick to prevent sql injection
* chore: rearrage
* chore: improvements
* chore: validate order by to validate function
* chore: improved description
* chore: added TODOs and made filterByStatus a part of filter struct
* chore: ignore empty string hosts in get active hosts
* feat(infra-monitoring): v2 hosts list - return counts of active & inactive hosts for custom group by attributes (#10956)
* chore: add functionality for showing active and inactive counts in custom group by
* chore: bug fix
* chore: added subquery for active and total count
* chore: ignore empty string hosts in get active hosts
* fix: sinceUnixMilli for determining active hosts compute once per request
* chore: refactor code
* chore: rename HostsList -> ListHosts
* chore: rearrangement
* chore: inframonitoring types renaming
* chore: added types package
* chore: file structure further breakdown for clarity
* chore: comments correction
* chore: removed temporalities
* chore: pods code restructuring
* chore: comments resolve
* chore: added json tag required: true
* chore: removed pod metric temporalities
* chore: removed internal server error
* chore: added status unauthorized
* chore: remove a defensive nil map check, the function ensure non-nil map when err nil
* chore: cleanup and rename
* chore: make sort stable in case of tiebreaker by comparing composite group by keys
* chore: regen api client for inframonitoring
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: added phase counts feature
* chore: added queries for pod phase counts in custom group by
* chore: added required tags
* chore: added support for pod phase unknown
* chore: removed pods - order by phase
* chore: improved api description to document -1 as no data in numeric fields
* fix: rebase fixes
* chore: added unknown phase count
* fix: isPodUIDInGroupBy in buildPodRecords
* chore: 3 cte --> 2 cte
* chore: pod phase with local table of time series as counts
* chore: comment correction
* chore: corrected comment
* chore: value column for samples table added
* chore: removed query G for phase counts
* chore: rename variable
* chore: added PodPhaseNum constants to types
* feat(infra-monitoring): v2 pods list apis - phase counts when custom grouping (#11088)
* chore: added phase counts feature
* chore: added queries for pod phase counts in custom group by
* chore: added unknown phase count
* fix: isPodUIDInGroupBy in buildPodRecords
* chore: 3 cte --> 2 cte
* chore: pod phase with local table of time series as counts
* chore: comment correction
* chore: corrected comment
* chore: value column for samples table added
* chore: removed query G for phase counts
* chore: rename variable
* chore: added PodPhaseNum constants to types
* chore: nodes list v2 full blown
* chore: metadata fix
* chore: updated comment
* chore: namespaces code
* chore: v2 nodes api
* chore: rename
* chore: v2 clusters list api
* chore: namespaces code
* chore: rename
* chore: review clusters PR
* chore: pvcs code added
* chore: updated endpoint and spec
* chore: pvcs todo
* chore: added condition
* chore: added filter
* chore: added code for deployments
* chore: query nit
* chore: statefulsets code added
* chore: base filter added
* chore: added base deployments change
* chore: added base condition
* chore: v2 jobs list api added
* chore: added daemonsets api
* chore: added pod phase counts
* chore: for pods and nodes, replace none with no_data
* chore: node and pod counts structs added
* chore: namespace record uses PodCountsByPhase
* chore: cluster record uses PodCountsByPhase, NodeCountsByReadiness
* chore: deployment record uses PodCountsByPhase
* chore: statefulset record uses PodCountsByPhase
* chore: job record uses PodCountsByPhase
* chore: daemonset record uses PodCountsByPhase
* chore: added remaining metrics to check
* chore: metrics existence check
* chore: statefulset metrics added
* chore: added jobs metrics
* chore: added metrics
* chore: updated PR things
* chore: changes to generated files
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Ashwin Bhatkal <ashwin96@gmail.com>
* fix: query fix in conditionFor
* fix: update test suite
* revert: stmt builder test changes
* test: add unit test for resource tags in json enabled flagger
* fix: package tests
* chore: run non body tests in json enabled
* chore: fmt py
* chore: comment fix
* fix: uvx checks
* chore: compressing tests into max 5
* fix: fmt py
* chore: bring in new fixture for building raw query
* fix: comment remove
* fix: comment fixed
* feat(alerts): add docs and agent skill info banner to ClickHouse query editor
Shows a contextual info banner when creating alert rules using ClickHouse
query mode, with doc links that vary by alert type (logs/traces/metrics).
Agent skill link is shown for logs and traces but skipped for metrics.
* chore: change allow referrer and add noopener
2026-05-14 19:21:55 +00:00
944 changed files with 52683 additions and 69807 deletions
git diff --compact-summary --exit-code || (echo; echo "Unexpected difference in authz permissions. Run go run cmd/enterprise/*.go generate authz locally and commit."; exit 1)
go run cmd/enterprise/*.go generate config web-settings
git diff --compact-summary --exit-code || (echo; echo "Unexpected difference in web settings schema. Run go run cmd/enterprise/*.go generate config web-settings locally and commit."; exit 1)
git diff --compact-summary --exit-code || (echo; echo "Unexpected difference in generated api clients. Run pnpm generate:api in frontend/ locally and commit."; exit 1)
git diff --compact-summary --exit-code || (echo; echo "Unexpected difference in generated web settings types. Run pnpm generate:config:web-settings in frontend/ locally and commit."; exit 1)
# The directory containing the static build files.
directory:/etc/signoz/web
# Settings exposed to the web.
settings:
posthog:
# Whether to enable PostHog in web.
enabled:true
appcues:
# Whether to enable Appcues in web.
enabled:true
##################### Cache #####################
cache:
@@ -174,6 +182,11 @@ alertmanager:
poll_interval:1m
# The URL under which Alertmanager is externally reachable (for example, if Alertmanager is served via a reverse proxy). Used for generating relative and absolute links back to Alertmanager itself.
external_url:http://localhost:8080
# The list of globs from which SigNoz's alertmanager notification templates are loaded (e.g. the email.signoz.html layout).
# This mirrors the upstream alertmanager `templates` config option. The upstream default templates (default.tmpl, email.tmpl)
# are always loaded from the embedded alertmanager assets, so only SigNoz's own templates need to be listed here.
# The global configuration for the alertmanager. All the exahustive fields can be found in the upstream: https://github.com/prometheus/alertmanager/blob/efa05feffd644ba4accb526e98a8c6545d26a783/config/config.go#L833
global:
# ResolveTimeout is the time after which an alert is declared resolved if it has not been updated.
@@ -78,12 +78,13 @@ All tables follow a consistent primary key pattern using a `id` column (referenc
## How to write migrations?
For schema migrations, use the [SQLMigration](/pkg/sqlmigration/sqlmigration.go) interface. The migrations are split into multiple packages based on the starting number of the series of the migration. For example, migrations with starting number `100` are in the `s100sqlmigration` package (read as series 100 sql migrations), migrations with starting number `200` are in the `s200sqlmigration` package, and so on. When creating migrations, adhere to these guidelines:
For schema migrations, use the [SQLMigration](/pkg/sqlmigration/sqlmigration.go) interface and write the migration in the same package. When creating migrations, adhere to these guidelines:
-Use the [SQLSchema](/pkg/sqlschema/sqlschema.go) interface to write migrations. SQLSchema is responsible for generating idempotent SQL statements to alter the database schema. For instance, if you want to add a column to the `users` table, you can use the `AddColumn` method to add the column. If the column already exists, the method will return no SQL statements.
-Do not implement **`ON CASCADE` foreign key constraints**. Deletion operations should be handled explicitly in application logic rather than delegated to the database.
- Do not **import types from the types package** in the `sqlmigration` package. Instead, define the required types within the migration package itself. This practice ensures migration stability as the core types evolve over time.
- Do not implement **`Down` migrations**. As the codebase matures, we may introduce this capability, but for now, the `Down` function should remain empty.
- Always write **idempotent** migrations. This means that if the migration is run multiple times, it should not cause an error.
- A migration which is **dependent on the underlying dialect** (sqlite, postgres, etc) should be written as part of the [SQLDialect](/pkg/sqlstore/sqlstore.go) interface. The implementation needs to go in the dialect specific package of the respective database.
returnerrors.New(errors.TypeLicenseUnavailable,errors.CodeLicenseUnavailable,"a valid license is not available").WithAdditional("this feature requires a valid license").WithAdditional(err.Error())
returnerrors.New(errors.TypeLicenseUnavailable,errors.CodeLicenseUnavailable,"a valid license is not available").WithAdditional("this feature requires a valid license").WithAdditional(err.Error())
returnerrors.New(errors.TypeLicenseUnavailable,errors.CodeLicenseUnavailable,"a valid license is not available").WithAdditional("this feature requires a valid license").WithAdditional(err.Error())
returnnil,errors.New(errors.TypeLicenseUnavailable,errors.CodeLicenseUnavailable,"a valid license is not available").WithAdditional("this feature requires a valid license").WithAdditional(err.Error())
returnnil,errors.New(errors.TypeLicenseUnavailable,errors.CodeLicenseUnavailable,"a valid license is not available").WithAdditional("this feature requires a valid license").WithAdditional(err.Error())
returnerrors.New(errors.TypeLicenseUnavailable,errors.CodeLicenseUnavailable,"a valid license is not available").WithAdditional("this feature requires a valid license").WithAdditional(err.Error())
returnerrors.New(errors.TypeLicenseUnavailable,errors.CodeLicenseUnavailable,"a valid license is not available").WithAdditional("this feature requires a valid license").WithAdditional(err.Error())
returnerrors.New(errors.TypeLicenseUnavailable,errors.CodeLicenseUnavailable,"a valid license is not available").WithAdditional("this feature requires a valid license").WithAdditional(err.Error())
* Returns a paginated list of Kubernetes DaemonSets with key aggregated pod metrics: CPU usage and memory working set summed across pods owned by the daemonset, plus average CPU/memory request and limit utilization (daemonSetCPURequest, daemonSetCPULimit, daemonSetMemoryRequest, daemonSetMemoryLimit). Each row also reports the latest known node-level counters from kube-state-metrics: desiredNodes (k8s.daemonset.desired_scheduled_nodes, the number of nodes the daemonset wants to run on) and currentNodes (k8s.daemonset.current_scheduled_nodes, the number of nodes the daemonset currently runs on) — note these are node counts, not pod counts. It also reports per-group podCountsByPhase ({ pending, running, succeeded, failed, unknown } from each pod's latest k8s.pod.phase value). Each daemonset includes metadata attributes (k8s.daemonset.name, k8s.namespace.name, k8s.cluster.name). The response type is 'list' for the default k8s.daemonset.name grouping or 'grouped_list' for custom groupBy keys; in both modes every row aggregates pods owned by daemonsets in the group. Supports filtering via a filter expression, custom groupBy, ordering by cpu / cpu_request / cpu_limit / memory / memory_request / memory_limit / desired_nodes / current_nodes, and pagination via offset/limit. Also reports missing required metrics and whether the requested time range falls before the data retention boundary. Numeric metric fields (daemonSetCPU, daemonSetCPURequest, daemonSetCPULimit, daemonSetMemory, daemonSetMemoryRequest, daemonSetMemoryLimit, desiredNodes, currentNodes) return -1 as a sentinel when no data is available for that field.
* Returns a paginated list of Kubernetes Deployments with key aggregated pod metrics: CPU usage and memory working set summed across pods owned by the deployment, plus average CPU/memory request and limit utilization (deploymentCPURequest, deploymentCPULimit, deploymentMemoryRequest, deploymentMemoryLimit). Each row also reports the latest known desiredPods (k8s.deployment.desired) and availablePods (k8s.deployment.available) replica counts and per-group podCountsByPhase ({ pending, running, succeeded, failed, unknown } from each pod's latest k8s.pod.phase value). Each deployment includes metadata attributes (k8s.deployment.name, k8s.namespace.name, k8s.cluster.name). The response type is 'list' for the default k8s.deployment.name grouping or 'grouped_list' for custom groupBy keys; in both modes every row aggregates pods owned by deployments in the group. Supports filtering via a filter expression, custom groupBy, ordering by cpu / cpu_request / cpu_limit / memory / memory_request / memory_limit / desired_pods / available_pods, and pagination via offset/limit. Also reports missing required metrics and whether the requested time range falls before the data retention boundary. Numeric metric fields (deploymentCPU, deploymentCPURequest, deploymentCPULimit, deploymentMemory, deploymentMemoryRequest, deploymentMemoryLimit, desiredPods, availablePods) return -1 as a sentinel when no data is available for that field.
* Returns the waterfall view of spans including all spans if total spans are under a limit, a max count otherwise. Aggregations are dropped compared to v3
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.