mirror of
https://github.com/SigNoz/signoz.git
synced 2026-02-25 09:42:25 +00:00
Compare commits
4 Commits
ns/waterfa
...
query-rang
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f05c574606 | ||
|
|
4e4c9ce5af | ||
|
|
7605775a38 | ||
|
|
cb1a2a8a13 |
@@ -82,6 +82,12 @@ exporters:
|
||||
timeout: 45s
|
||||
sending_queue:
|
||||
enabled: false
|
||||
metadataexporter:
|
||||
cache:
|
||||
provider: in_memory
|
||||
dsn: tcp://clickhouse:9000/signoz_metadata
|
||||
enabled: true
|
||||
timeout: 45s
|
||||
service:
|
||||
telemetry:
|
||||
logs:
|
||||
@@ -93,19 +99,19 @@ service:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [signozspanmetrics/delta, batch]
|
||||
exporters: [clickhousetraces, signozmeter]
|
||||
exporters: [clickhousetraces, metadataexporter, signozmeter]
|
||||
metrics:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [signozclickhousemetrics, signozmeter]
|
||||
exporters: [signozclickhousemetrics, metadataexporter, signozmeter]
|
||||
metrics/prometheus:
|
||||
receivers: [prometheus]
|
||||
processors: [batch]
|
||||
exporters: [signozclickhousemetrics, signozmeter]
|
||||
exporters: [signozclickhousemetrics, metadataexporter, signozmeter]
|
||||
logs:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [clickhouselogsexporter, signozmeter]
|
||||
exporters: [clickhouselogsexporter, metadataexporter, signozmeter]
|
||||
metrics/meter:
|
||||
receivers: [signozmeter]
|
||||
processors: [batch/meter]
|
||||
|
||||
@@ -82,6 +82,12 @@ exporters:
|
||||
timeout: 45s
|
||||
sending_queue:
|
||||
enabled: false
|
||||
metadataexporter:
|
||||
cache:
|
||||
provider: in_memory
|
||||
dsn: tcp://clickhouse:9000/signoz_metadata
|
||||
enabled: true
|
||||
timeout: 45s
|
||||
service:
|
||||
telemetry:
|
||||
logs:
|
||||
@@ -93,19 +99,19 @@ service:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [signozspanmetrics/delta, batch]
|
||||
exporters: [clickhousetraces, signozmeter]
|
||||
exporters: [clickhousetraces, metadataexporter, signozmeter]
|
||||
metrics:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [signozclickhousemetrics, signozmeter]
|
||||
exporters: [signozclickhousemetrics, metadataexporter, signozmeter]
|
||||
metrics/prometheus:
|
||||
receivers: [prometheus]
|
||||
processors: [batch]
|
||||
exporters: [signozclickhousemetrics, signozmeter]
|
||||
exporters: [signozclickhousemetrics, metadataexporter, signozmeter]
|
||||
logs:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [clickhouselogsexporter, signozmeter]
|
||||
exporters: [clickhouselogsexporter, metadataexporter, signozmeter]
|
||||
metrics/meter:
|
||||
receivers: [signozmeter]
|
||||
processors: [batch/meter]
|
||||
|
||||
216
docs/contributing/query-range.md
Normal file
216
docs/contributing/query-range.md
Normal file
@@ -0,0 +1,216 @@
|
||||
# Query Range v5 — Design Principles & Architectural Contracts
|
||||
|
||||
## Purpose of This Document
|
||||
|
||||
This document defines the design principles, invariants, and architectural contracts of the Query Range v5 system. It is intended for the authors working on the querier and querier related parts codebase. Any change to the system must align with the principles described here. If a change would violate a principle, it must be flagged and discussed.
|
||||
|
||||
---
|
||||
|
||||
## Core Architectural Principle
|
||||
|
||||
**The user speaks OpenTelemetry. The storage speaks ClickHouse. The system translates between them. These two worlds must never leak into each other.**
|
||||
|
||||
Every design choice in Query Range flows from this separation. The user-facing API surface deals exclusively in `TelemetryFieldKey`: a representation of fields as they exist in the OpenTelemetry data model. The storage layer deals in ClickHouse column expressions, table names, and SQL fragments. The translation between them is mediated by a small set of composable abstractions with strict boundaries.
|
||||
|
||||
---
|
||||
|
||||
## The Central Type: `TelemetryFieldKey`
|
||||
|
||||
`TelemetryFieldKey` is the atomic unit of the entire query system. Every filter, aggregation, group-by, order-by, and select operation is expressed in terms of field keys. Understanding its design contracts is non-negotiable.
|
||||
|
||||
### Identity
|
||||
|
||||
A field key is identified by three dimensions:
|
||||
|
||||
- **Name** — the field name as the user knows it (`service.name`, `http.method`, `trace_id`)
|
||||
- **FieldContext** — where the field lives in the OTel model (`resource`, `attribute`, `span`, `log`, `body`, `scope`, `event`, `metric`)
|
||||
- **FieldDataType** — the data type (`string`, `bool`, `number`/`float64`/`int64`, array variants)
|
||||
|
||||
### Invariant: Same name does not mean same field
|
||||
|
||||
`status` as an attribute, `status` as a body JSON key, and `status` as a span-level field are **three different fields**. The context disambiguates. Code that resolves or compares field keys must always consider all three dimensions, never just the name.
|
||||
|
||||
### Invariant: Normalization happens once, at the boundary
|
||||
|
||||
`TelemetryFieldKey.Normalize()` is called during JSON unmarshaling. After normalization, the text representation `resource.service.name:string` and the programmatic construction `{Name: "service.name", FieldContext: Resource, FieldDataType: String}` are identical.
|
||||
|
||||
**Consequence:** Downstream code must never re-parse or re-normalize field keys. If you find yourself splitting on `.` or `:` deep in the pipeline, something is wrong — the normalization should have already happened.
|
||||
|
||||
### Invariant: The text format is `context.name:datatype`
|
||||
|
||||
Parsing rules (implemented in `GetFieldKeyFromKeyText` and `Normalize`):
|
||||
|
||||
1. Data type is extracted from the right, after the last `:`.
|
||||
2. Field context is extracted from the left, before the first `.`, if it matches a known context prefix.
|
||||
3. Everything remaining is the name.
|
||||
|
||||
Special case: `log.body.X` normalizes to `{FieldContext: body, Name: X}` — the `log.body.` prefix collapses because body fields under log are a nested context.
|
||||
|
||||
### Invariant: Historical aliases must be preserved
|
||||
|
||||
The `fieldContexts` map includes aliases (`tag` -> `attribute`, `spanfield` -> `span`, `logfield` -> `log`). These exist because older database entries use these names. Removing or changing these aliases will break existing saved queries and dashboard configurations.
|
||||
|
||||
---
|
||||
|
||||
## The Abstraction Stack
|
||||
|
||||
The query pipeline is built from four interfaces that compose vertically. Each layer has a single responsibility. Each layer depends only on the layers below it. This layering is intentional and must be preserved.
|
||||
|
||||
```
|
||||
StatementBuilder <- Orchestrates everything into executable SQL
|
||||
├── AggExprRewriter <- Rewrites aggregation expressions (maps field refs to columns)
|
||||
├── ConditionBuilder <- Builds WHERE predicates (field + operator + value -> SQL)
|
||||
└── FieldMapper <- Maps TelemetryFieldKey -> ClickHouse column expression
|
||||
```
|
||||
|
||||
### FieldMapper
|
||||
|
||||
**Contract:** Given a `TelemetryFieldKey`, return a ClickHouse column expression that yields the value for that field when used in a SELECT.
|
||||
|
||||
**Principle:** This is the *only* place where field-to-column translation happens. No other layer should contain knowledge of how fields map to storage. If you need a column expression, go through the FieldMapper.
|
||||
|
||||
**Why:** The user says `http.request.method`. ClickHouse might store it as `attributes_string['http.request.method']`, or as a materialized column `` `attribute_string_http$$request$$method` ``, or via a JSON access path in a body column. This variation is entirely contained within the FieldMapper. Everything above it is storage-agnostic.
|
||||
|
||||
### ConditionBuilder
|
||||
|
||||
**Contract:** Given a field key, an operator, and a value, produce a valid SQL predicate for a WHERE clause.
|
||||
|
||||
**Dependency:** Uses FieldMapper for the left-hand side of the condition.
|
||||
|
||||
**Principle:** The ConditionBuilder owns all the complexity of operator semantics, i.e type casting, array operators (`hasAny`/`hasAll` vs `=`), existence checks, and negative operator behavior. This complexity must not leak upward into the StatementBuilder.
|
||||
|
||||
### AggExprRewriter
|
||||
|
||||
**Contract:** Given a user-facing aggregation expression like `sum(duration_nano)`, resolve field references within it and produce valid ClickHouse SQL.
|
||||
|
||||
**Dependency:** Uses FieldMapper to resolve field names within expressions.
|
||||
|
||||
**Principle:** Aggregation expressions are user-authored strings that contain field references. The rewriter parses them, identifies field references, resolves each through the FieldMapper, and reassembles the expression.
|
||||
|
||||
### StatementBuilder
|
||||
|
||||
**Contract:** Given a complete `QueryBuilderQuery`, a time range, and a request type, produces an executable SQL statement.
|
||||
|
||||
**Dependency:** Uses all three abstractions above.
|
||||
|
||||
**Principle:** This is the composition layer. It does not contain field mapping logic, condition building logic, or expression rewriting logic. It orchestrates the other abstractions. If you find storage-specific logic creeping into the StatementBuilder, push it down into the appropriate abstraction.
|
||||
|
||||
### Invariant: No layer skipping
|
||||
|
||||
The StatementBuilder must not call FieldMapper directly to build conditions, it goes through the ConditionBuilder. The AggExprRewriter must not hardcode column names, it goes through the FieldMapper. Skipping layers creates hidden coupling and makes the system fragile to storage changes.
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions as Constraints
|
||||
|
||||
### Constraint: Formula evaluation happens in Go, not in ClickHouse
|
||||
|
||||
Formulas (`A + B`, `A / B`, `sqrt(A*A + B*B)`) are evaluated application-side by `FormulaEvaluator`, not via ClickHouse JOINs.
|
||||
|
||||
**Why this is a constraint, not just an implementation choice:** The original JOIN-based approach was abandoned because ClickHouse evaluates joins right-to-left, serializing execution unnecessarily. Running queries independently allows parallelism and caching of intermediate results. Any future optimization must not reintroduce the JOIN pattern without solving the serialization problem.
|
||||
|
||||
**Consequence:** Individual query results must be independently cacheable. Formula evaluation must handle label matching, timestamp alignment, and missing values without requiring the queries to coordinate at the SQL level.
|
||||
|
||||
### Constraint: Zero-defaulting is aggregation-dependent
|
||||
|
||||
Only additive/counting aggregations (`count`, `count_distinct`, `sum`, `rate`) default missing values to zero. Statistical aggregations (`avg`, `min`, `max`, percentiles) must show gaps.
|
||||
|
||||
**Why:** Absence of data has different meanings. No error requests in a time bucket means error count = 0. No requests at all means average latency is *unknown*, not 0. Conflating these is a correctness bug, not a display preference.
|
||||
|
||||
**Enforcement:** `GetQueriesSupportingZeroDefault` determines which queries can default to zero. The `FormulaEvaluator` consumes this via `canDefaultZero`. Changes to aggregation handling must preserve this distinction.
|
||||
|
||||
### Constraint: Existence semantics differ for positive vs negative operators
|
||||
|
||||
- **Positive operators** (`=`, `>`, `LIKE`, `IN`, etc.) implicitly assert field existence. `http.method = GET` means "the field exists AND equals GET".
|
||||
- **Negative operators** (`!=`, `NOT IN`, `NOT LIKE`, etc.) do **not** add an existence check. `http.method != GET` includes records where the field doesn't exist at all.
|
||||
|
||||
**Why:** The user's intent with negative operators is ambiguous. Rather than guess, we take the broader interpretation. Users can add an explicit `EXISTS` filter if they want the narrower one. This is documented in `AddDefaultExistsFilter`.
|
||||
|
||||
**Consequence:** Any new operator must declare its existence behavior in `AddDefaultExistsFilter`. Do not add operators without considering this.
|
||||
|
||||
### Constraint: Post-processing functions operate on result sets, not in SQL
|
||||
|
||||
Functions like `cutOffMin`, `ewma`, `median`, `timeShift`, `fillZero`, `runningDiff`, and `cumulativeSum` are applied in Go on the returned time series, not pushed into ClickHouse SQL.
|
||||
|
||||
**Why:** These are sequential time-series transformations that require complete, ordered result sets. Pushing them into SQL would complicate query generation, prevent caching of raw results, and make the functions harder to test. They are applied via `ApplyFunctions` after query execution.
|
||||
|
||||
**Consequence:** New time-series transformation functions should follow this pattern i.e implement them as Go functions on `*TimeSeries`, not as SQL modifications.
|
||||
|
||||
### Constraint: The API surface rejects unknown fields with suggestions
|
||||
|
||||
All request types use custom `UnmarshalJSON` that calls `DisallowUnknownFields`. Unknown fields trigger error messages with Levenshtein-based suggestions ("did you mean: 'groupBy'?").
|
||||
|
||||
**Why:** Silent acceptance of unknown fields causes subtle bugs. A misspelled `groupBy` results in ungrouped data with no indication of what went wrong. Failing fast with suggestions turns errors into actionable feedback.
|
||||
|
||||
**Consequence:** Any new request type or query spec struct must implement custom unmarshaling with `UnmarshalJSONWithContext`. Do not use default `json.Unmarshal` for user-facing types.
|
||||
|
||||
### Constraint: Validation is context-sensitive to request type
|
||||
|
||||
What's valid depends on the `RequestType`. For aggregation requests (`time_series`, `scalar`, `distribution`), fields like `groupBy`, `aggregations`, `having`, and aggregation-referenced `orderBy` are validated. For non-aggregation requests (`raw`, `raw_stream`, `trace`), these fields are ignored.
|
||||
|
||||
**Why:** A raw log query doesn't have aggregations, so requiring `aggregations` would be wrong. But a time-series query without aggregations is meaningless. The validation rules are request-type-aware to avoid both false positives and false negatives.
|
||||
|
||||
**Consequence:** When adding new fields to query specs, consider which request types they apply to and gate validation accordingly.
|
||||
|
||||
---
|
||||
|
||||
## The Composite Query Model
|
||||
|
||||
### Structure
|
||||
|
||||
A `QueryRangeRequest` contains a `CompositeQuery` which holds `[]QueryEnvelope`. Each envelope is a discriminated union: a `Type` field determines how `Spec` is decoded.
|
||||
|
||||
### Invariant: Query names are unique within a composite query
|
||||
|
||||
Builder queries must have unique names. Formulas reference queries by name (`A`, `B`, `A.0`, `A.my_alias`). Duplicate names would make formula evaluation ambiguous.
|
||||
|
||||
### Invariant: Multi-aggregation uses indexed or aliased references
|
||||
|
||||
A single builder query can have multiple aggregations. They are accessed in formulas via:
|
||||
- Index: `A.0`, `A.1` (zero-based)
|
||||
- Alias: `A.total`, `A.error_count`
|
||||
|
||||
The default (just `A`) resolves to index 0. This is the formula evaluation contract and must be preserved.
|
||||
|
||||
### Invariant: Type-specific decoding through signal detection
|
||||
|
||||
Builder queries are decoded by first peeking at the `signal` field in the raw JSON, then unmarshaling into the appropriate generic type (`QueryBuilderQuery[TraceAggregation]`, `QueryBuilderQuery[LogAggregation]`, `QueryBuilderQuery[MetricAggregation]`). This two-pass decoding is intentional — it allows each signal to have its own aggregation schema while sharing the query structure.
|
||||
|
||||
---
|
||||
|
||||
## The Metadata Layer
|
||||
|
||||
### MetadataStore
|
||||
|
||||
The `MetadataStore` interface provides runtime field discovery and type resolution. It answers questions like "what fields exist for this signal?" and "what are the data types of field X?".
|
||||
|
||||
### Principle: Fields can be ambiguous until resolved
|
||||
|
||||
The same name can map to multiple `TelemetryFieldKey` variants (different contexts, different types). The metadata store returns *all* variants. Resolution to a single field happens during query building, using the query's signal and any explicit context/type hints from the user.
|
||||
|
||||
**Consequence:** Code that calls `GetKey` or `GetKeys` must handle multiple results. Do not assume a name maps to a single field.
|
||||
|
||||
### Principle: Materialized fields are a performance optimization, not a semantic distinction
|
||||
|
||||
A materialized field and its non-materialized equivalent represent the same logical field. The `Materialized` flag tells the FieldMapper to generate a simpler column expression. The user should never need to know whether a field is materialized.
|
||||
|
||||
### Principle: JSON body fields require access plans
|
||||
|
||||
Fields inside JSON body columns (`body.response.errors[].code`) need pre-computed `JSONAccessPlan` trees that encode the traversal path, including branching at array boundaries between `Array(JSON)` and `Array(Dynamic)` representations. These plans are computed during metadata resolution, not during query execution.
|
||||
|
||||
---
|
||||
|
||||
## Summary of Inviolable Rules
|
||||
|
||||
1. **User-facing types never contain ClickHouse column names or SQL fragments.**
|
||||
2. **Field-to-column translation only happens in FieldMapper.**
|
||||
3. **Normalization happens once at the API boundary, never deeper.**
|
||||
4. **Historical aliases in fieldContexts and fieldDataTypes must not be removed.**
|
||||
5. **Formula evaluation stays in Go — do not push it into ClickHouse JOINs.**
|
||||
6. **Zero-defaulting is aggregation-type-dependent — do not universally default to zero.**
|
||||
7. **Positive operators imply existence, negative operators do not.**
|
||||
8. **Post-processing functions operate on Go result sets, not in SQL.**
|
||||
9. **All user-facing types reject unknown JSON fields with suggestions.**
|
||||
10. **Validation rules are gated by request type.**
|
||||
11. **Query names must be unique within a composite query.**
|
||||
12. **The four-layer abstraction stack (FieldMapper -> ConditionBuilder -> AggExprRewriter -> StatementBuilder) must not be bypassed or flattened.**
|
||||
@@ -308,3 +308,15 @@ export const PublicDashboardPage = Loadable(
|
||||
/* webpackChunkName: "Public Dashboard Page" */ 'pages/PublicDashboard'
|
||||
),
|
||||
);
|
||||
|
||||
export const AlertTypeSelectionPage = Loadable(
|
||||
() =>
|
||||
import(
|
||||
/* webpackChunkName: "Alert Type Selection Page" */ 'pages/AlertTypeSelection'
|
||||
),
|
||||
);
|
||||
|
||||
export const MeterExplorerPage = Loadable(
|
||||
() =>
|
||||
import(/* webpackChunkName: "Meter Explorer Page" */ 'pages/MeterExplorer'),
|
||||
);
|
||||
|
||||
@@ -1,12 +1,10 @@
|
||||
import { RouteProps } from 'react-router-dom';
|
||||
import ROUTES from 'constants/routes';
|
||||
import AlertTypeSelectionPage from 'pages/AlertTypeSelection';
|
||||
import MessagingQueues from 'pages/MessagingQueues';
|
||||
import MeterExplorer from 'pages/MeterExplorer';
|
||||
|
||||
import {
|
||||
AlertHistory,
|
||||
AlertOverview,
|
||||
AlertTypeSelectionPage,
|
||||
AllAlertChannels,
|
||||
AllErrors,
|
||||
ApiMonitoring,
|
||||
@@ -29,6 +27,8 @@ import {
|
||||
LogsExplorer,
|
||||
LogsIndexToFields,
|
||||
LogsSaveViews,
|
||||
MessagingQueuesMainPage,
|
||||
MeterExplorerPage,
|
||||
MetricsExplorer,
|
||||
OldLogsExplorer,
|
||||
Onboarding,
|
||||
@@ -399,28 +399,28 @@ const routes: AppRoutes[] = [
|
||||
{
|
||||
path: ROUTES.MESSAGING_QUEUES_KAFKA,
|
||||
exact: true,
|
||||
component: MessagingQueues,
|
||||
component: MessagingQueuesMainPage,
|
||||
key: 'MESSAGING_QUEUES_KAFKA',
|
||||
isPrivate: true,
|
||||
},
|
||||
{
|
||||
path: ROUTES.MESSAGING_QUEUES_CELERY_TASK,
|
||||
exact: true,
|
||||
component: MessagingQueues,
|
||||
component: MessagingQueuesMainPage,
|
||||
key: 'MESSAGING_QUEUES_CELERY_TASK',
|
||||
isPrivate: true,
|
||||
},
|
||||
{
|
||||
path: ROUTES.MESSAGING_QUEUES_OVERVIEW,
|
||||
exact: true,
|
||||
component: MessagingQueues,
|
||||
component: MessagingQueuesMainPage,
|
||||
key: 'MESSAGING_QUEUES_OVERVIEW',
|
||||
isPrivate: true,
|
||||
},
|
||||
{
|
||||
path: ROUTES.MESSAGING_QUEUES_KAFKA_DETAIL,
|
||||
exact: true,
|
||||
component: MessagingQueues,
|
||||
component: MessagingQueuesMainPage,
|
||||
key: 'MESSAGING_QUEUES_KAFKA_DETAIL',
|
||||
isPrivate: true,
|
||||
},
|
||||
@@ -463,21 +463,21 @@ const routes: AppRoutes[] = [
|
||||
{
|
||||
path: ROUTES.METER,
|
||||
exact: true,
|
||||
component: MeterExplorer,
|
||||
component: MeterExplorerPage,
|
||||
key: 'METER',
|
||||
isPrivate: true,
|
||||
},
|
||||
{
|
||||
path: ROUTES.METER_EXPLORER,
|
||||
exact: true,
|
||||
component: MeterExplorer,
|
||||
component: MeterExplorerPage,
|
||||
key: 'METER_EXPLORER',
|
||||
isPrivate: true,
|
||||
},
|
||||
{
|
||||
path: ROUTES.METER_EXPLORER_VIEWS,
|
||||
exact: true,
|
||||
component: MeterExplorer,
|
||||
component: MeterExplorerPage,
|
||||
key: 'METER_EXPLORER_VIEWS',
|
||||
isPrivate: true,
|
||||
},
|
||||
|
||||
@@ -1216,19 +1216,12 @@ func (r *ClickHouseReader) GetFlamegraphSpansForTrace(ctx context.Context, orgID
|
||||
}
|
||||
|
||||
processingPostCache := time.Now()
|
||||
selectedSpansForRequest := selectedSpans
|
||||
limit := min(req.Limit, tracedetail.MaxLimitWithoutSampling)
|
||||
totalSpanCount := tracedetail.GetTotalSpanCount(selectedSpans)
|
||||
if totalSpanCount > uint64(limit) {
|
||||
selectedSpansForRequest = tracedetail.GetSelectedSpansForFlamegraphForRequest(req.SelectedSpanID, selectedSpans, startTime, endTime)
|
||||
}
|
||||
zap.L().Info("getFlamegraphSpansForTrace: processing post cache", zap.Duration("duration", time.Since(processingPostCache)), zap.String("traceID", traceID),
|
||||
zap.Uint64("totalSpanCount", totalSpanCount))
|
||||
selectedSpansForRequest := tracedetail.GetSelectedSpansForFlamegraphForRequest(req.SelectedSpanID, selectedSpans, startTime, endTime)
|
||||
zap.L().Info("getFlamegraphSpansForTrace: processing post cache", zap.Duration("duration", time.Since(processingPostCache)), zap.String("traceID", traceID))
|
||||
|
||||
trace.Spans = selectedSpansForRequest
|
||||
trace.StartTimestampMillis = startTime / 1000000
|
||||
trace.EndTimestampMillis = endTime / 1000000
|
||||
trace.HasMore = totalSpanCount > uint64(limit)
|
||||
return trace, nil
|
||||
}
|
||||
|
||||
|
||||
@@ -10,8 +10,6 @@ var (
|
||||
SPAN_LIMIT_PER_REQUEST_FOR_FLAMEGRAPH float64 = 50
|
||||
SPAN_LIMIT_PER_LEVEL int = 100
|
||||
TIMESTAMP_SAMPLING_BUCKET_COUNT int = 50
|
||||
|
||||
MaxLimitWithoutSampling uint = 120_000
|
||||
)
|
||||
|
||||
func ContainsFlamegraphSpan(slice []*model.FlamegraphSpan, item *model.FlamegraphSpan) bool {
|
||||
@@ -186,12 +184,3 @@ func GetSelectedSpansForFlamegraphForRequest(selectedSpanID string, selectedSpan
|
||||
|
||||
return selectedSpansForRequest
|
||||
}
|
||||
|
||||
func GetTotalSpanCount(spans [][]*model.FlamegraphSpan) uint64 {
|
||||
levelCount := len(spans)
|
||||
spanCount := uint64(0)
|
||||
for i := range levelCount {
|
||||
spanCount += uint64(len(spans[i]))
|
||||
}
|
||||
return spanCount
|
||||
}
|
||||
|
||||
@@ -337,7 +337,6 @@ type GetWaterfallSpansForTraceWithMetadataParams struct {
|
||||
|
||||
type GetFlamegraphSpansForTraceParams struct {
|
||||
SelectedSpanID string `json:"selectedSpanId"`
|
||||
Limit uint `json:"limit"`
|
||||
}
|
||||
|
||||
type SpanFilterParams struct {
|
||||
|
||||
@@ -336,7 +336,6 @@ type GetFlamegraphSpansForTraceResponse struct {
|
||||
EndTimestampMillis uint64 `json:"endTimestampMillis"`
|
||||
DurationNano uint64 `json:"durationNano"`
|
||||
Spans [][]*FlamegraphSpan `json:"spans"`
|
||||
HasMore bool `json:"hasMore"`
|
||||
}
|
||||
|
||||
type OtelSpanRef struct {
|
||||
|
||||
@@ -169,6 +169,7 @@ func NewSQLMigrationProviderFactories(
|
||||
sqlmigration.NewAddAnonymousPublicDashboardTransactionFactory(sqlstore),
|
||||
sqlmigration.NewAddRootUserFactory(sqlstore, sqlschema),
|
||||
sqlmigration.NewAddUserEmailOrgIDIndexFactory(sqlstore, sqlschema),
|
||||
sqlmigration.NewMigrateRulesV4ToV5Factory(sqlstore, telemetryStore),
|
||||
)
|
||||
}
|
||||
|
||||
|
||||
209
pkg/sqlmigration/066_migrate_rules_v4_to_v5_post_deprecation.go
Normal file
209
pkg/sqlmigration/066_migrate_rules_v4_to_v5_post_deprecation.go
Normal file
@@ -0,0 +1,209 @@
|
||||
package sqlmigration
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"encoding/json"
|
||||
"log/slog"
|
||||
|
||||
"github.com/SigNoz/signoz/pkg/factory"
|
||||
"github.com/SigNoz/signoz/pkg/sqlstore"
|
||||
"github.com/SigNoz/signoz/pkg/telemetrystore"
|
||||
"github.com/SigNoz/signoz/pkg/transition"
|
||||
"github.com/uptrace/bun"
|
||||
"github.com/uptrace/bun/migrate"
|
||||
)
|
||||
|
||||
type migrateRulesV4ToV5 struct {
|
||||
store sqlstore.SQLStore
|
||||
telemetryStore telemetrystore.TelemetryStore
|
||||
logger *slog.Logger
|
||||
}
|
||||
|
||||
func NewMigrateRulesV4ToV5Factory(
|
||||
store sqlstore.SQLStore,
|
||||
telemetryStore telemetrystore.TelemetryStore,
|
||||
) factory.ProviderFactory[SQLMigration, Config] {
|
||||
return factory.NewProviderFactory(
|
||||
factory.MustNewName("migrate_rules_post_deprecation"),
|
||||
func(ctx context.Context, ps factory.ProviderSettings, c Config) (SQLMigration, error) {
|
||||
return &migrateRulesV4ToV5{
|
||||
store: store,
|
||||
telemetryStore: telemetryStore,
|
||||
logger: ps.Logger,
|
||||
}, nil
|
||||
})
|
||||
}
|
||||
|
||||
func (migration *migrateRulesV4ToV5) Register(migrations *migrate.Migrations) error {
|
||||
if err := migrations.Register(migration.Up, migration.Down); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (migration *migrateRulesV4ToV5) getLogDuplicateKeys(ctx context.Context) ([]string, error) {
|
||||
query := `
|
||||
SELECT name
|
||||
FROM (
|
||||
SELECT DISTINCT name FROM signoz_logs.distributed_logs_attribute_keys
|
||||
INTERSECT
|
||||
SELECT DISTINCT name FROM signoz_logs.distributed_logs_resource_keys
|
||||
)
|
||||
ORDER BY name
|
||||
`
|
||||
|
||||
rows, err := migration.telemetryStore.ClickhouseDB().Query(ctx, query)
|
||||
if err != nil {
|
||||
migration.logger.WarnContext(ctx, "failed to query log duplicate keys", "error", err)
|
||||
return nil, nil
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
var keys []string
|
||||
for rows.Next() {
|
||||
var key string
|
||||
if err := rows.Scan(&key); err != nil {
|
||||
migration.logger.WarnContext(ctx, "failed to scan log duplicate key", "error", err)
|
||||
continue
|
||||
}
|
||||
keys = append(keys, key)
|
||||
}
|
||||
|
||||
return keys, nil
|
||||
}
|
||||
|
||||
func (migration *migrateRulesV4ToV5) getTraceDuplicateKeys(ctx context.Context) ([]string, error) {
|
||||
query := `
|
||||
SELECT tagKey
|
||||
FROM signoz_traces.distributed_span_attributes_keys
|
||||
WHERE tagType IN ('tag', 'resource')
|
||||
GROUP BY tagKey
|
||||
HAVING COUNT(DISTINCT tagType) > 1
|
||||
ORDER BY tagKey
|
||||
`
|
||||
|
||||
rows, err := migration.telemetryStore.ClickhouseDB().Query(ctx, query)
|
||||
if err != nil {
|
||||
migration.logger.WarnContext(ctx, "failed to query trace duplicate keys", "error", err)
|
||||
return nil, nil
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
var keys []string
|
||||
for rows.Next() {
|
||||
var key string
|
||||
if err := rows.Scan(&key); err != nil {
|
||||
migration.logger.WarnContext(ctx, "failed to scan trace duplicate key", "error", err)
|
||||
continue
|
||||
}
|
||||
keys = append(keys, key)
|
||||
}
|
||||
|
||||
return keys, nil
|
||||
}
|
||||
|
||||
func (migration *migrateRulesV4ToV5) Up(ctx context.Context, db *bun.DB) error {
|
||||
logsKeys, err := migration.getLogDuplicateKeys(ctx)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
tracesKeys, err := migration.getTraceDuplicateKeys(ctx)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
tx, err := db.BeginTx(ctx, nil)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
defer func() {
|
||||
_ = tx.Rollback()
|
||||
}()
|
||||
|
||||
var rules []struct {
|
||||
ID string `bun:"id"`
|
||||
Data map[string]any `bun:"data"`
|
||||
}
|
||||
|
||||
err = tx.NewSelect().
|
||||
Table("rule").
|
||||
Column("id", "data").
|
||||
Scan(ctx, &rules)
|
||||
if err != nil {
|
||||
if err == sql.ErrNoRows {
|
||||
return nil
|
||||
}
|
||||
return err
|
||||
}
|
||||
|
||||
alertsMigrator := transition.NewAlertMigrateV5(migration.logger, logsKeys, tracesKeys)
|
||||
|
||||
count := 0
|
||||
|
||||
for _, rule := range rules {
|
||||
version, _ := rule.Data["version"].(string)
|
||||
|
||||
if version == "v5" {
|
||||
continue
|
||||
}
|
||||
|
||||
if version == "" {
|
||||
migration.logger.WarnContext(ctx, "unexpected empty version for rule", "rule_id", rule.ID)
|
||||
}
|
||||
|
||||
migration.logger.InfoContext(ctx, "migrating rule v4 to v5", "rule_id", rule.ID, "current_version", version)
|
||||
|
||||
// Check if the queries envelope already exists and is non-empty
|
||||
hasQueriesEnvelope := false
|
||||
if condition, ok := rule.Data["condition"].(map[string]any); ok {
|
||||
if compositeQuery, ok := condition["compositeQuery"].(map[string]any); ok {
|
||||
if queries, ok := compositeQuery["queries"].([]any); ok && len(queries) > 0 {
|
||||
hasQueriesEnvelope = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if hasQueriesEnvelope {
|
||||
// already has queries envelope, just bump version
|
||||
// this is because user made a mistake of choosing version
|
||||
migration.logger.InfoContext(ctx, "rule already has queries envelope, bumping version", "rule_id", rule.ID)
|
||||
rule.Data["version"] = "v5"
|
||||
} else {
|
||||
// old format, run full migration
|
||||
migration.logger.InfoContext(ctx, "rule has old format, running full migration", "rule_id", rule.ID)
|
||||
updated := alertsMigrator.Migrate(ctx, rule.Data)
|
||||
if !updated {
|
||||
migration.logger.WarnContext(ctx, "expected updated to be true but got false", "rule_id", rule.ID)
|
||||
continue
|
||||
}
|
||||
rule.Data["version"] = "v5"
|
||||
}
|
||||
|
||||
dataJSON, err := json.Marshal(rule.Data)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
_, err = tx.NewUpdate().
|
||||
Table("rule").
|
||||
Set("data = ?", string(dataJSON)).
|
||||
Where("id = ?", rule.ID).
|
||||
Exec(ctx)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
count++
|
||||
}
|
||||
if count != 0 {
|
||||
migration.logger.InfoContext(ctx, "migrate v4 alerts", "count", count)
|
||||
}
|
||||
|
||||
return tx.Commit()
|
||||
}
|
||||
|
||||
func (migration *migrateRulesV4ToV5) Down(ctx context.Context, db *bun.DB) error {
|
||||
return nil
|
||||
}
|
||||
@@ -355,6 +355,10 @@ func (r *PostableRule) validate() error {
|
||||
errs = append(errs, signozError.NewInvalidInputf(signozError.CodeInvalidInput, "composite query is required"))
|
||||
}
|
||||
|
||||
if r.Version != "v5" {
|
||||
errs = append(errs, signozError.NewInvalidInputf(signozError.CodeInvalidInput, "only version v5 is supported, got %q", r.Version))
|
||||
}
|
||||
|
||||
if isAllQueriesDisabled(r.RuleCondition.CompositeQuery) {
|
||||
errs = append(errs, signozError.NewInvalidInputf(signozError.CodeInvalidInput, "all queries are disabled in rule condition"))
|
||||
}
|
||||
|
||||
@@ -108,6 +108,7 @@ func TestParseIntoRule(t *testing.T) {
|
||||
"ruleType": "threshold_rule",
|
||||
"evalWindow": "5m",
|
||||
"frequency": "1m",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -150,6 +151,7 @@ func TestParseIntoRule(t *testing.T) {
|
||||
content: []byte(`{
|
||||
"alert": "DefaultsRule",
|
||||
"ruleType": "threshold_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -187,6 +189,7 @@ func TestParseIntoRule(t *testing.T) {
|
||||
initRule: PostableRule{},
|
||||
content: []byte(`{
|
||||
"alert": "PromQLRule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "promql",
|
||||
@@ -256,6 +259,7 @@ func TestParseIntoRuleSchemaVersioning(t *testing.T) {
|
||||
content: []byte(`{
|
||||
"alert": "SeverityLabelTest",
|
||||
"schemaVersion": "v1",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -344,6 +348,7 @@ func TestParseIntoRuleSchemaVersioning(t *testing.T) {
|
||||
content: []byte(`{
|
||||
"alert": "NoLabelsTest",
|
||||
"schemaVersion": "v1",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -384,6 +389,7 @@ func TestParseIntoRuleSchemaVersioning(t *testing.T) {
|
||||
content: []byte(`{
|
||||
"alert": "OverwriteTest",
|
||||
"schemaVersion": "v1",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -474,6 +480,7 @@ func TestParseIntoRuleSchemaVersioning(t *testing.T) {
|
||||
content: []byte(`{
|
||||
"alert": "V2Test",
|
||||
"schemaVersion": "v2",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -517,6 +524,7 @@ func TestParseIntoRuleSchemaVersioning(t *testing.T) {
|
||||
initRule: PostableRule{},
|
||||
content: []byte(`{
|
||||
"alert": "DefaultSchemaTest",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -569,6 +577,7 @@ func TestParseIntoRuleSchemaVersioning(t *testing.T) {
|
||||
func TestParseIntoRuleThresholdGeneration(t *testing.T) {
|
||||
content := []byte(`{
|
||||
"alert": "TestThresholds",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -639,6 +648,7 @@ func TestParseIntoRuleMultipleThresholds(t *testing.T) {
|
||||
"schemaVersion": "v2",
|
||||
"alert": "MultiThresholdAlert",
|
||||
"ruleType": "threshold_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -732,6 +742,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "AnomalyBelowTest",
|
||||
"ruleType": "anomaly_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -766,6 +777,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "AnomalyBelowTest",
|
||||
"ruleType": "anomaly_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -799,6 +811,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "AnomalyAboveTest",
|
||||
"ruleType": "anomaly_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -833,6 +846,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "AnomalyAboveTest",
|
||||
"ruleType": "anomaly_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -866,6 +880,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "AnomalyBelowAllTest",
|
||||
"ruleType": "anomaly_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -901,6 +916,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "AnomalyBelowAllTest",
|
||||
"ruleType": "anomaly_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -935,6 +951,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "AnomalyOutOfBoundsTest",
|
||||
"ruleType": "anomaly_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -969,6 +986,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "ThresholdTest",
|
||||
"ruleType": "threshold_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
@@ -1003,6 +1021,7 @@ func TestAnomalyNegationEval(t *testing.T) {
|
||||
ruleJSON: []byte(`{
|
||||
"alert": "ThresholdTest",
|
||||
"ruleType": "threshold_rule",
|
||||
"version": "v5",
|
||||
"condition": {
|
||||
"compositeQuery": {
|
||||
"queryType": "builder",
|
||||
|
||||
Reference in New Issue
Block a user