mirror of
https://github.com/SigNoz/signoz.git
synced 2026-02-18 06:52:34 +00:00
Compare commits
5 Commits
platform-p
...
ns/claude-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c295ef386d | ||
|
|
bf0394cc28 | ||
|
|
fa08ca2fac | ||
|
|
08c53fe7e8 | ||
|
|
c1fac00d2e |
136
.claude/CLAUDE.md
Normal file
136
.claude/CLAUDE.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
SigNoz is an open-source observability platform (APM, logs, metrics, traces) built on OpenTelemetry and ClickHouse. It provides a unified solution for monitoring applications with features including distributed tracing, log management, metrics dashboards, and alerting.
|
||||
|
||||
## Build and Development Commands
|
||||
|
||||
### Development Environment Setup
|
||||
```bash
|
||||
make devenv-up # Start ClickHouse and OTel Collector for local dev
|
||||
make devenv-clickhouse # Start only ClickHouse
|
||||
make devenv-signoz-otel-collector # Start only OTel Collector
|
||||
make devenv-clickhouse-clean # Clean ClickHouse data
|
||||
```
|
||||
|
||||
### Backend (Go)
|
||||
```bash
|
||||
make go-run-community # Run community backend server
|
||||
make go-run-enterprise # Run enterprise backend server
|
||||
make go-test # Run all Go unit tests
|
||||
go test -race ./pkg/... # Run tests for specific package
|
||||
go test -race ./pkg/querier/... # Example: run querier tests
|
||||
```
|
||||
|
||||
### Integration Tests (Python)
|
||||
```bash
|
||||
cd tests/integration
|
||||
uv sync # Install dependencies
|
||||
make py-test-setup # Start test environment (keep running with --reuse)
|
||||
make py-test # Run all integration tests
|
||||
make py-test-teardown # Stop test environment
|
||||
|
||||
# Run specific test
|
||||
uv run pytest --basetemp=./tmp/ -vv --reuse src/<suite>/<file>.py::test_name
|
||||
```
|
||||
|
||||
### Code Quality
|
||||
```bash
|
||||
# Go linting (golangci-lint)
|
||||
golangci-lint run
|
||||
|
||||
# Python formatting/linting
|
||||
make py-fmt # Format with black
|
||||
make py-lint # Run isort, autoflake, pylint
|
||||
```
|
||||
|
||||
### OpenAPI Generation
|
||||
```bash
|
||||
go run cmd/enterprise/*.go generate openapi
|
||||
```
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Backend Structure
|
||||
|
||||
The Go backend follows a **provider pattern** for dependency injection:
|
||||
|
||||
- **`pkg/signoz/`** - IoC container that wires all providers together
|
||||
- **`pkg/modules/`** - Business logic modules (user, organization, dashboard, etc.)
|
||||
- **`pkg/<provider>/`** - Provider implementations following consistent structure:
|
||||
- `<name>.go` - Interface definition
|
||||
- `config.go` - Configuration (implements `factory.Config`)
|
||||
- `<implname><name>/provider.go` - Implementation
|
||||
- `<name>test/` - Mock implementations for testing
|
||||
|
||||
### Key Packages
|
||||
- **`pkg/querier/`** - Query engine for telemetry data (logs, traces, metrics)
|
||||
- **`pkg/telemetrystore/`** - ClickHouse telemetry storage interface
|
||||
- **`pkg/sqlstore/`** - Relational database (SQLite/PostgreSQL) for metadata
|
||||
- **`pkg/apiserver/`** - HTTP API server with OpenAPI integration
|
||||
- **`pkg/alertmanager/`** - Alert management
|
||||
- **`pkg/authn/`, `pkg/authz/`** - Authentication and authorization
|
||||
- **`pkg/flagger/`** - Feature flags (OpenFeature-based)
|
||||
- **`pkg/errors/`** - Structured error handling
|
||||
|
||||
### Enterprise vs Community
|
||||
- **`cmd/community/`** - Community edition entry point
|
||||
- **`cmd/enterprise/`** - Enterprise edition entry point
|
||||
- **`ee/`** - Enterprise-only features
|
||||
|
||||
## Code Conventions
|
||||
|
||||
### Error Handling
|
||||
Use the custom `pkg/errors` package instead of standard library:
|
||||
```go
|
||||
errors.New(typ, code, message) // Instead of errors.New()
|
||||
errors.Newf(typ, code, message, args...) // Instead of fmt.Errorf()
|
||||
errors.Wrapf(err, typ, code, msg) // Wrap with context
|
||||
```
|
||||
|
||||
Define domain-specific error codes:
|
||||
```go
|
||||
var CodeThingNotFound = errors.MustNewCode("thing_not_found")
|
||||
```
|
||||
|
||||
### HTTP Handlers
|
||||
Handlers are thin adapters in modules that:
|
||||
1. Extract auth context from request
|
||||
2. Decode request body using `binding` package
|
||||
3. Call module functions
|
||||
4. Return responses using `render` package
|
||||
|
||||
Register routes in `pkg/apiserver/signozapiserver/` with `handler.New()` and `OpenAPIDef`.
|
||||
|
||||
### SQL/Database
|
||||
- Use Bun ORM via `sqlstore.BunDBCtx(ctx)`
|
||||
- Star schema with `organizations` as central entity
|
||||
- All tables have `id`, `created_at`, `updated_at`, `org_id` columns
|
||||
- Write idempotent migrations in `pkg/sqlmigration/`
|
||||
- No `ON CASCADE` deletes - handle in application logic
|
||||
|
||||
### REST Endpoints
|
||||
- Use plural resource names: `/v1/organizations`, `/v1/users`
|
||||
- Use `me` for current user/org: `/v1/organizations/me/users`
|
||||
- Follow RESTful conventions for CRUD operations
|
||||
|
||||
### Linting Rules (from .golangci.yml)
|
||||
- Don't use `errors` package - use `pkg/errors`
|
||||
- Don't use `zap` logger - use `slog`
|
||||
- Don't use `fmt.Errorf` or `fmt.Print*`
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
- Run with race detector: `go test -race ./...`
|
||||
- Provider mocks are in `<provider>test/` packages
|
||||
|
||||
### Integration Tests
|
||||
- Located in `tests/integration/`
|
||||
- Use pytest with testcontainers
|
||||
- Files prefixed with numbers for execution order (e.g., `01_database.py`)
|
||||
- Always use `--reuse` flag during development
|
||||
- Fixtures in `tests/integration/fixtures/`
|
||||
15
.claude/settings.json
Normal file
15
.claude/settings.json
Normal file
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"permissions": {
|
||||
"allow": [
|
||||
"Read",
|
||||
"Glob",
|
||||
"Grep",
|
||||
"Bash(git *)",
|
||||
"Bash(make *)",
|
||||
"Bash(cd *)",
|
||||
"Bash(ls *)",
|
||||
"Bash(go run *)",
|
||||
"Bash(yarn run *)"
|
||||
]
|
||||
}
|
||||
}
|
||||
21
.claude/skills/clickhouse-query/SKILL.md
Normal file
21
.claude/skills/clickhouse-query/SKILL.md
Normal file
@@ -0,0 +1,21 @@
|
||||
---
|
||||
description: Write optimised ClickHouse queries for SigNoz dashboards (traces, errors, logs)
|
||||
user_invocable: true
|
||||
---
|
||||
|
||||
# Writing ClickHouse Queries for SigNoz Dashboards
|
||||
|
||||
Read [clickhouse-traces-reference.md](./clickhouse-traces-reference.md) for full schema and query reference before writing any query. It covers:
|
||||
|
||||
- All table schemas (`distributed_signoz_index_v3`, `distributed_traces_v3_resource`, `distributed_signoz_error_index_v2`, etc.)
|
||||
- The mandatory resource filter CTE pattern and timestamp bucketing
|
||||
- Attribute access syntax (standard, indexed, resource)
|
||||
- Dashboard panel query templates (timeseries, value, table)
|
||||
- Real-world query examples (span counts, error rates, latency, event extraction)
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Understand the ask**: What metric/data does the user want? (e.g., error rate, latency, span count)
|
||||
2. **Pick the panel type**: Timeseries (time-series chart), Value (single number), or Table (rows).
|
||||
3. **Build the query** following the mandatory patterns from the reference doc.
|
||||
4. **Validate** the query uses all required optimizations (resource CTE, ts_bucket_start, indexed columns).
|
||||
460
.claude/skills/clickhouse-query/clickhouse-traces-reference.md
Normal file
460
.claude/skills/clickhouse-query/clickhouse-traces-reference.md
Normal file
@@ -0,0 +1,460 @@
|
||||
# ClickHouse Traces Query Reference for SigNoz
|
||||
|
||||
Source: https://signoz.io/docs/userguide/writing-clickhouse-traces-query/
|
||||
|
||||
All tables live in the `signoz_traces` database.
|
||||
|
||||
---
|
||||
|
||||
## Table Schemas
|
||||
|
||||
### distributed_signoz_index_v3 (Primary Spans Table)
|
||||
|
||||
The main table for querying span data. 30+ columns following OpenTelemetry conventions.
|
||||
|
||||
```sql
|
||||
(
|
||||
`ts_bucket_start` UInt64 CODEC(DoubleDelta, LZ4),
|
||||
`resource_fingerprint` String CODEC(ZSTD(1)),
|
||||
`timestamp` DateTime64(9) CODEC(DoubleDelta, LZ4),
|
||||
`trace_id` FixedString(32) CODEC(ZSTD(1)),
|
||||
`span_id` String CODEC(ZSTD(1)),
|
||||
`trace_state` String CODEC(ZSTD(1)),
|
||||
`parent_span_id` String CODEC(ZSTD(1)),
|
||||
`flags` UInt32 CODEC(T64, ZSTD(1)),
|
||||
`name` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`kind` Int8 CODEC(T64, ZSTD(1)),
|
||||
`kind_string` String CODEC(ZSTD(1)),
|
||||
`duration_nano` UInt64 CODEC(T64, ZSTD(1)),
|
||||
`status_code` Int16 CODEC(T64, ZSTD(1)),
|
||||
`status_message` String CODEC(ZSTD(1)),
|
||||
`status_code_string` String CODEC(ZSTD(1)),
|
||||
`attributes_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
|
||||
`attributes_number` Map(LowCardinality(String), Float64) CODEC(ZSTD(1)),
|
||||
`attributes_bool` Map(LowCardinality(String), Bool) CODEC(ZSTD(1)),
|
||||
`resources_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)), -- deprecated
|
||||
`resource` JSON(max_dynamic_paths = 100) CODEC(ZSTD(1)),
|
||||
`events` Array(String) CODEC(ZSTD(2)),
|
||||
`links` String CODEC(ZSTD(1)),
|
||||
`response_status_code` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`external_http_url` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`http_url` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`external_http_method` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`http_method` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`http_host` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`db_name` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`db_operation` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`has_error` Bool CODEC(T64, ZSTD(1)),
|
||||
`is_remote` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
-- Pre-indexed "selected" columns (use these instead of map access when available):
|
||||
`resource_string_service$$name` LowCardinality(String) DEFAULT resources_string['service.name'] CODEC(ZSTD(1)),
|
||||
`attribute_string_http$$route` LowCardinality(String) DEFAULT attributes_string['http.route'] CODEC(ZSTD(1)),
|
||||
`attribute_string_messaging$$system` LowCardinality(String) DEFAULT attributes_string['messaging.system'] CODEC(ZSTD(1)),
|
||||
`attribute_string_messaging$$operation` LowCardinality(String) DEFAULT attributes_string['messaging.operation'] CODEC(ZSTD(1)),
|
||||
`attribute_string_db$$system` LowCardinality(String) DEFAULT attributes_string['db.system'] CODEC(ZSTD(1)),
|
||||
`attribute_string_rpc$$system` LowCardinality(String) DEFAULT attributes_string['rpc.system'] CODEC(ZSTD(1)),
|
||||
`attribute_string_rpc$$service` LowCardinality(String) DEFAULT attributes_string['rpc.service'] CODEC(ZSTD(1)),
|
||||
`attribute_string_rpc$$method` LowCardinality(String) DEFAULT attributes_string['rpc.method'] CODEC(ZSTD(1)),
|
||||
`attribute_string_peer$$service` LowCardinality(String) DEFAULT attributes_string['peer.service'] CODEC(ZSTD(1))
|
||||
)
|
||||
ORDER BY (ts_bucket_start, resource_fingerprint, has_error, name, timestamp)
|
||||
```
|
||||
|
||||
### distributed_traces_v3_resource (Resource Lookup Table)
|
||||
|
||||
Used in the resource filter CTE pattern for efficient filtering by resource attributes.
|
||||
|
||||
```sql
|
||||
(
|
||||
`labels` String CODEC(ZSTD(5)),
|
||||
`fingerprint` String CODEC(ZSTD(1)),
|
||||
`seen_at_ts_bucket_start` Int64 CODEC(Delta(8), ZSTD(1))
|
||||
)
|
||||
```
|
||||
|
||||
### distributed_signoz_error_index_v2 (Error Events)
|
||||
|
||||
```sql
|
||||
(
|
||||
`timestamp` DateTime64(9) CODEC(DoubleDelta, LZ4),
|
||||
`errorID` FixedString(32) CODEC(ZSTD(1)),
|
||||
`groupID` FixedString(32) CODEC(ZSTD(1)),
|
||||
`traceID` FixedString(32) CODEC(ZSTD(1)),
|
||||
`spanID` String CODEC(ZSTD(1)),
|
||||
`serviceName` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`exceptionType` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`exceptionMessage` String CODEC(ZSTD(1)),
|
||||
`exceptionStacktrace` String CODEC(ZSTD(1)),
|
||||
`exceptionEscaped` Bool CODEC(T64, ZSTD(1)),
|
||||
`resourceTagsMap` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
|
||||
INDEX idx_error_id errorID TYPE bloom_filter GRANULARITY 4,
|
||||
INDEX idx_resourceTagsMapKeys mapKeys(resourceTagsMap) TYPE bloom_filter(0.01) GRANULARITY 64,
|
||||
INDEX idx_resourceTagsMapValues mapValues(resourceTagsMap) TYPE bloom_filter(0.01) GRANULARITY 64
|
||||
)
|
||||
```
|
||||
|
||||
### distributed_top_level_operations
|
||||
|
||||
```sql
|
||||
(
|
||||
`name` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`serviceName` LowCardinality(String) CODEC(ZSTD(1))
|
||||
)
|
||||
```
|
||||
|
||||
### distributed_span_attributes_keys
|
||||
|
||||
```sql
|
||||
(
|
||||
`tagKey` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`tagType` Enum8('tag' = 1, 'resource' = 2) CODEC(ZSTD(1)),
|
||||
`dataType` Enum8('string' = 1, 'bool' = 2, 'float64' = 3) CODEC(ZSTD(1)),
|
||||
`isColumn` Bool CODEC(ZSTD(1))
|
||||
)
|
||||
```
|
||||
|
||||
### distributed_span_attributes
|
||||
|
||||
```sql
|
||||
(
|
||||
`timestamp` DateTime CODEC(DoubleDelta, ZSTD(1)),
|
||||
`tagKey` LowCardinality(String) CODEC(ZSTD(1)),
|
||||
`tagType` Enum8('tag' = 1, 'resource' = 2) CODEC(ZSTD(1)),
|
||||
`dataType` Enum8('string' = 1, 'bool' = 2, 'float64' = 3) CODEC(ZSTD(1)),
|
||||
`stringTagValue` String CODEC(ZSTD(1)),
|
||||
`float64TagValue` Nullable(Float64) CODEC(ZSTD(1)),
|
||||
`isColumn` Bool CODEC(ZSTD(1))
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Mandatory Optimization Patterns
|
||||
|
||||
### 1. Resource Filter CTE
|
||||
|
||||
**Always** use a CTE to pre-filter resource fingerprints when filtering by resource attributes (service.name, environment, etc.). This is the single most impactful optimization.
|
||||
|
||||
```sql
|
||||
WITH __resource_filter AS (
|
||||
SELECT fingerprint
|
||||
FROM signoz_traces.distributed_traces_v3_resource
|
||||
WHERE (simpleJSONExtractString(labels, 'service.name') = 'myservice')
|
||||
AND seen_at_ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
)
|
||||
SELECT ...
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE resource_fingerprint GLOBAL IN __resource_filter
|
||||
AND ...
|
||||
```
|
||||
|
||||
- Multiple resource filters: chain with AND in the CTE WHERE clause.
|
||||
- Use `simpleJSONExtractString(labels, '<key>')` to extract resource attribute values.
|
||||
|
||||
### 2. Timestamp Bucketing
|
||||
|
||||
**Always** include `ts_bucket_start` filter alongside `timestamp` filter. Data is bucketed in 30-minute (1800-second) intervals.
|
||||
|
||||
```sql
|
||||
WHERE timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}}
|
||||
AND ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
```
|
||||
|
||||
The `- 1800` on the start ensures spans at bucket boundaries are not missed.
|
||||
|
||||
### 3. Use Indexed Columns Over Map Access
|
||||
|
||||
When a pre-indexed ("selected") column exists, use it instead of map access:
|
||||
|
||||
| Instead of | Use |
|
||||
|---|---|
|
||||
| `attributes_string['http.route']` | `attribute_string_http$$route` |
|
||||
| `attributes_string['db.system']` | `attribute_string_db$$system` |
|
||||
| `attributes_string['rpc.method']` | `attribute_string_rpc$$method` |
|
||||
| `attributes_string['peer.service']` | `attribute_string_peer$$service` |
|
||||
| `resources_string['service.name']` | `resource_string_service$$name` |
|
||||
|
||||
The naming convention: replace `.` with `$$` in the attribute name and prefix with `attribute_string_`, `attribute_number_`, or `attribute_bool_`.
|
||||
|
||||
### 4. Use Pre-extracted Columns
|
||||
|
||||
These top-level columns are faster than map access:
|
||||
- `http_method`, `http_url`, `http_host`
|
||||
- `db_name`, `db_operation`
|
||||
- `has_error`, `duration_nano`, `name`, `kind`
|
||||
- `response_status_code`
|
||||
|
||||
---
|
||||
|
||||
## Attribute Access Syntax
|
||||
|
||||
### Standard (non-indexed) attributes
|
||||
```sql
|
||||
attributes_string['http.status_code']
|
||||
attributes_number['response_time']
|
||||
attributes_bool['is_error']
|
||||
```
|
||||
|
||||
### Selected (indexed) attributes — direct column names
|
||||
```sql
|
||||
attribute_string_http$$route -- for http.route
|
||||
attribute_number_response$$time -- for response.time
|
||||
attribute_bool_is$$error -- for is.error
|
||||
```
|
||||
|
||||
### Resource attributes in SELECT / GROUP BY
|
||||
```sql
|
||||
resource.service.name::String
|
||||
resource.environment::String
|
||||
```
|
||||
|
||||
### Resource attributes in WHERE (via CTE)
|
||||
```sql
|
||||
simpleJSONExtractString(labels, 'service.name') = 'myservice'
|
||||
```
|
||||
|
||||
### Checking attribute existence
|
||||
```sql
|
||||
mapContains(attributes_string, 'http.method')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dashboard Panel Query Templates
|
||||
|
||||
### Timeseries Panel
|
||||
|
||||
Aggregates data over time intervals for chart visualization.
|
||||
|
||||
```sql
|
||||
WITH __resource_filter AS (
|
||||
SELECT fingerprint
|
||||
FROM signoz_traces.distributed_traces_v3_resource
|
||||
WHERE (simpleJSONExtractString(labels, 'service.name') = '{{service}}')
|
||||
AND seen_at_ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
)
|
||||
SELECT
|
||||
toStartOfInterval(timestamp, INTERVAL 1 MINUTE) AS ts,
|
||||
toFloat64(count()) AS value
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE
|
||||
resource_fingerprint GLOBAL IN __resource_filter AND
|
||||
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}} AND
|
||||
ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
GROUP BY ts
|
||||
ORDER BY ts ASC;
|
||||
```
|
||||
|
||||
### Value Panel
|
||||
|
||||
Returns a single aggregated number. Wrap the timeseries query and reduce with `avg()`, `sum()`, `min()`, `max()`, or `any()`.
|
||||
|
||||
```sql
|
||||
WITH __resource_filter AS (
|
||||
SELECT fingerprint
|
||||
FROM signoz_traces.distributed_traces_v3_resource
|
||||
WHERE (simpleJSONExtractString(labels, 'service.name') = '{{service}}')
|
||||
AND seen_at_ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
)
|
||||
SELECT
|
||||
avg(value) as value,
|
||||
any(ts) as ts
|
||||
FROM (
|
||||
SELECT
|
||||
toStartOfInterval(timestamp, INTERVAL 1 MINUTE) AS ts,
|
||||
toFloat64(count()) AS value
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE
|
||||
resource_fingerprint GLOBAL IN __resource_filter AND
|
||||
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}} AND
|
||||
ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
GROUP BY ts
|
||||
ORDER BY ts ASC
|
||||
)
|
||||
```
|
||||
|
||||
### Table Panel
|
||||
|
||||
Rows grouped by dimensions. Use `now() as ts` instead of a time interval column.
|
||||
|
||||
```sql
|
||||
WITH __resource_filter AS (
|
||||
SELECT fingerprint
|
||||
FROM signoz_traces.distributed_traces_v3_resource
|
||||
WHERE seen_at_ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
)
|
||||
SELECT
|
||||
now() as ts,
|
||||
resource.service.name::String as `service.name`,
|
||||
toFloat64(count()) AS value
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE
|
||||
resource_fingerprint GLOBAL IN __resource_filter AND
|
||||
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}} AND
|
||||
ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp AND
|
||||
`service.name` IS NOT NULL
|
||||
GROUP BY `service.name`, ts
|
||||
ORDER BY value DESC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Examples
|
||||
|
||||
### Timeseries — Error spans per service per minute
|
||||
|
||||
Shows `has_error` filtering, resource attribute in SELECT, and multi-series grouping.
|
||||
|
||||
```sql
|
||||
WITH __resource_filter AS (
|
||||
SELECT fingerprint
|
||||
FROM signoz_traces.distributed_traces_v3_resource
|
||||
WHERE seen_at_ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
)
|
||||
SELECT
|
||||
toStartOfInterval(timestamp, INTERVAL 1 MINUTE) AS ts,
|
||||
resource.service.name::String as `service.name`,
|
||||
toFloat64(count()) AS value
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE
|
||||
resource_fingerprint GLOBAL IN __resource_filter AND
|
||||
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}} AND
|
||||
has_error = true AND
|
||||
`service.name` IS NOT NULL AND
|
||||
ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
GROUP BY `service.name`, ts
|
||||
ORDER BY ts ASC;
|
||||
```
|
||||
|
||||
### Value — Average duration of GET requests
|
||||
|
||||
Shows the value-panel wrapping pattern (`avg(value)` / `any(ts)`) with a service resource filter.
|
||||
|
||||
```sql
|
||||
WITH __resource_filter AS (
|
||||
SELECT fingerprint
|
||||
FROM signoz_traces.distributed_traces_v3_resource
|
||||
WHERE (simpleJSONExtractString(labels, 'service.name') = 'api-service')
|
||||
AND seen_at_ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
)
|
||||
SELECT
|
||||
avg(value) as value,
|
||||
any(ts) as ts FROM (
|
||||
SELECT
|
||||
toStartOfInterval(timestamp, INTERVAL 1 MINUTE) AS ts,
|
||||
toFloat64(avg(duration_nano)) AS value
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE
|
||||
resource_fingerprint GLOBAL IN __resource_filter AND
|
||||
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}} AND
|
||||
ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp AND
|
||||
http_method = 'GET'
|
||||
GROUP BY ts
|
||||
ORDER BY ts ASC
|
||||
)
|
||||
```
|
||||
|
||||
### Table — Average duration by HTTP method
|
||||
|
||||
Shows `now() as ts` pattern, pre-extracted column usage, and non-null filtering.
|
||||
|
||||
```sql
|
||||
WITH __resource_filter AS (
|
||||
SELECT fingerprint
|
||||
FROM signoz_traces.distributed_traces_v3_resource
|
||||
WHERE (simpleJSONExtractString(labels, 'service.name') = 'api-gateway')
|
||||
AND seen_at_ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp
|
||||
)
|
||||
SELECT
|
||||
now() as ts,
|
||||
http_method,
|
||||
toFloat64(avg(duration_nano)) AS avg_duration_nano
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE
|
||||
resource_fingerprint GLOBAL IN __resource_filter AND
|
||||
timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}} AND
|
||||
ts_bucket_start BETWEEN $start_timestamp - 1800 AND $end_timestamp AND
|
||||
http_method IS NOT NULL AND http_method != ''
|
||||
GROUP BY http_method, ts
|
||||
ORDER BY avg_duration_nano DESC;
|
||||
```
|
||||
|
||||
### Advanced — Extract values from span events
|
||||
|
||||
Shows `arrayFilter`/`arrayMap` pattern for querying the `events` JSON array.
|
||||
|
||||
```sql
|
||||
WITH arrayFilter(x -> JSONExtractString(x, 'name')='Getting customer', events) AS filteredEvents
|
||||
SELECT toStartOfInterval(timestamp, INTERVAL 1 MINUTE) AS interval,
|
||||
toFloat64(count()) AS count,
|
||||
arrayJoin(arrayMap(x -> JSONExtractString(JSONExtractString(x, 'attributeMap'), 'customer_id'), filteredEvents)) AS resultArray
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE not empty(filteredEvents)
|
||||
AND timestamp > toUnixTimestamp(now() - INTERVAL 30 MINUTE)
|
||||
AND ts_bucket_start >= toUInt64(toUnixTimestamp(now() - toIntervalMinute(30))) - 1800
|
||||
GROUP BY (resultArray, interval) order by (resultArray, interval) ASC;
|
||||
```
|
||||
|
||||
### Advanced — Average latency between two specific spans
|
||||
|
||||
Shows cross-span latency calculation using `minIf()` and indexed service columns.
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
interval,
|
||||
round(avg(time_diff), 2) AS result
|
||||
FROM
|
||||
(
|
||||
SELECT
|
||||
interval,
|
||||
traceID,
|
||||
if(startTime1 != 0, if(startTime2 != 0, (toUnixTimestamp64Nano(startTime2) - toUnixTimestamp64Nano(startTime1)) / 1000000, nan), nan) AS time_diff
|
||||
FROM
|
||||
(
|
||||
SELECT
|
||||
toStartOfInterval(timestamp, toIntervalMinute(1)) AS interval,
|
||||
traceID,
|
||||
minIf(timestamp, if(resource_string_service$$name='driver', if(name = '/driver.DriverService/FindNearest', if((resources_string['component']) = 'gRPC', true, false), false), false)) AS startTime1,
|
||||
minIf(timestamp, if(resource_string_service$$name='route', if(name = 'HTTP GET /route', true, false), false)) AS startTime2
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE (timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}})
|
||||
AND (ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}})
|
||||
AND (resource_string_service$$name IN ('driver', 'route'))
|
||||
GROUP BY (interval, traceID)
|
||||
ORDER BY (interval, traceID) ASC
|
||||
)
|
||||
)
|
||||
WHERE isNaN(time_diff) = 0
|
||||
GROUP BY interval
|
||||
ORDER BY interval ASC;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SigNoz Dashboard Variables
|
||||
|
||||
These template variables are automatically replaced by SigNoz when the query runs:
|
||||
|
||||
| Variable | Description |
|
||||
|---|---|
|
||||
| `{{.start_datetime}}` | Start of selected time range (DateTime64) |
|
||||
| `{{.end_datetime}}` | End of selected time range (DateTime64) |
|
||||
| `$start_timestamp` | Start as Unix timestamp (seconds) |
|
||||
| `$end_timestamp` | End as Unix timestamp (seconds) |
|
||||
|
||||
---
|
||||
|
||||
## Query Optimization Checklist
|
||||
|
||||
Before finalizing any query, verify:
|
||||
|
||||
- [ ] **Resource filter CTE** is present when filtering by resource attributes (service.name, environment, etc.)
|
||||
- [ ] **`ts_bucket_start`** filter is included alongside `timestamp` filter, with `- 1800` on start
|
||||
- [ ] **`GLOBAL IN`** is used (not just `IN`) for the resource fingerprint subquery
|
||||
- [ ] **Indexed columns** are used over map access where available (e.g., `http_method` over `attributes_string['http.method']`)
|
||||
- [ ] **Pre-extracted columns** are used where available (`has_error`, `duration_nano`, `http_method`, `db_name`, etc.)
|
||||
- [ ] **`seen_at_ts_bucket_start`** filter is included in the resource CTE
|
||||
- [ ] Aggregation results are cast with `toFloat64()` for dashboard compatibility
|
||||
- [ ] For timeseries: results are ordered by time column ASC
|
||||
- [ ] For table panels: `now() as ts` is used instead of time intervals
|
||||
- [ ] For value panels: outer query uses `avg(value)` / `any(ts)` pattern
|
||||
37
.claude/skills/commit/SKILL.md
Normal file
37
.claude/skills/commit/SKILL.md
Normal file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
name: commit
|
||||
description: Create a conventional commit with staged changes
|
||||
allowed-tools: Bash(git:*)
|
||||
---
|
||||
|
||||
# Create Conventional Commit
|
||||
|
||||
Commit staged changes using conventional commit format: `type(scope): description`
|
||||
|
||||
## Types
|
||||
|
||||
- `feat:` - New feature
|
||||
- `fix:` - Bug fix
|
||||
- `chore:` - Maintenance/refactor/tooling
|
||||
- `test:` - Tests only
|
||||
- `docs:` - Documentation
|
||||
|
||||
## Process
|
||||
|
||||
1. Review staged changes: `git diff --cached`
|
||||
2. Determine type, optional scope, and description (imperative, <70 chars)
|
||||
3. Commit using HEREDOC:
|
||||
```bash
|
||||
git commit -m "$(cat <<'EOF'
|
||||
type(scope): description
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
4. Verify: `git log -1`
|
||||
|
||||
## Notes
|
||||
|
||||
- Description: imperative mood, lowercase, no period
|
||||
- Body: explain WHY, not WHAT (code shows what). Keep it concise and brief.
|
||||
- Do not include co-authored by claude in commit message, we want ownership and accountability to remain with the human contributor.
|
||||
- Do not automatically add files to stage unless asked to.
|
||||
22
.claude/skills/dev-server/SKILL.md
Normal file
22
.claude/skills/dev-server/SKILL.md
Normal file
@@ -0,0 +1,22 @@
|
||||
---
|
||||
description: How to start SigNoz frontend and backend dev servers
|
||||
---
|
||||
|
||||
# Dev Server Setup
|
||||
|
||||
Full guide: [development.md](../../docs/contributing/development.md)
|
||||
|
||||
## Start Order
|
||||
|
||||
1. **Infra**: Ensure clickhouse container is running using `docker ps | grep clickhouse`
|
||||
2. **Backend**: `make go-run-community` (serves at `localhost:8080`)
|
||||
3. **Frontend**: `cd frontend && yarn install && yarn dev` (serves at `localhost:3301`)
|
||||
- Requires `frontend/.env` with `FRONTEND_API_ENDPOINT=http://localhost:8080`
|
||||
- For git worktrees, frontend/.env can be created using command: `cp frontend/example.env frontend/.env`.
|
||||
|
||||
## Verify
|
||||
|
||||
- ClickHouse: `curl http://localhost:8123/ping` → "Ok."
|
||||
- OTel Collector: `curl http://localhost:13133`
|
||||
- Backend: `curl http://localhost:8080/api/v1/health` → `{"status":"ok"}`
|
||||
- Frontend: `http://localhost:3301`
|
||||
55
.claude/skills/raise-pr/SKILL.md
Normal file
55
.claude/skills/raise-pr/SKILL.md
Normal file
@@ -0,0 +1,55 @@
|
||||
---
|
||||
name: raise-pr
|
||||
description: Create a pull request with auto-filled template. Pass 'commit' to commit staged changes first.
|
||||
allowed-tools: Bash(gh:*, git:*), Read
|
||||
argument-hint: [commit?]
|
||||
---
|
||||
|
||||
# Raise Pull Request
|
||||
|
||||
Create a PR with auto-filled template from commits after origin/main.
|
||||
|
||||
## Arguments
|
||||
|
||||
- No argument: Create PR with existing commits
|
||||
- `commit`: Commit staged changes first, then create PR
|
||||
|
||||
## Process
|
||||
|
||||
1. **If `$ARGUMENTS` is "commit"**: Review staged changes and commit with descriptive message
|
||||
- Check for staged changes: `git diff --cached --stat`
|
||||
- If changes exist:
|
||||
- Review the changes: `git diff --cached`
|
||||
- Use commit skill for making the commit, i.e. follow conventional commit practices
|
||||
- Commit command: `git commit -m "message"`
|
||||
|
||||
2. **Analyze commits since origin/main**:
|
||||
- `git log origin/main..HEAD --pretty=format:"%s%n%b"` - get commit messages
|
||||
- `git diff origin/main...HEAD --stat` - see changes
|
||||
|
||||
3. **Read template**: `.github/pull_request_template.md`
|
||||
|
||||
4. **Generate PR**:
|
||||
- **Title**: Short (<70 chars), from commit messages or main change
|
||||
- **Body**: Fill template sections based on commits/changes:
|
||||
- Summary (why/what/approach) - end with "Closes #<issue_number>" if issue number is available from branch name (git branch --show-current)
|
||||
- Change Type checkboxes
|
||||
- Bug Context (if applicable)
|
||||
- Testing Strategy
|
||||
- Risk Assessment
|
||||
- Changelog (if user-facing)
|
||||
- Checklist
|
||||
|
||||
5. **Create PR**:
|
||||
```bash
|
||||
git push -u origin $(git branch --show-current)
|
||||
gh pr create --base main --title "..." --body "..."
|
||||
gh pr view
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Analyze ALL commits messages from origin/main to HEAD
|
||||
- Fill template sections based on code analysis
|
||||
- Leave template sections as they are if you can't determine the content
|
||||
- Don't add the changes to git stage, only commit or push whatever user has already staged
|
||||
254
.claude/skills/review/SKILL.md
Normal file
254
.claude/skills/review/SKILL.md
Normal file
@@ -0,0 +1,254 @@
|
||||
---
|
||||
name: review
|
||||
description: Review code changes for bugs, performance issues, and SigNoz convention compliance
|
||||
allowed-tools: Bash(git:*, gh:*), Read, Glob, Grep
|
||||
---
|
||||
|
||||
# Review Command
|
||||
|
||||
Perform a thorough code review following SigNoz's coding conventions and contributing guidelines and any potential bug introduced.
|
||||
|
||||
## Usage
|
||||
|
||||
Invoke this command to review code changes, files, or pull requests with actionable and concise feedback.
|
||||
|
||||
## Process
|
||||
|
||||
1. **Determine scope**:
|
||||
- Ask user what to review if not specified:
|
||||
- Specific files or directories
|
||||
- Current git diff (staged or unstaged)
|
||||
- Specific PR number or commit range
|
||||
- All changes since origin/main
|
||||
|
||||
2. **Gather context**:
|
||||
```bash
|
||||
# For current changes
|
||||
git diff --cached # Staged changes
|
||||
git diff # Unstaged changes
|
||||
|
||||
# For commit range
|
||||
git diff origin/main...HEAD # All changes since main
|
||||
|
||||
# for last commit only
|
||||
git diff HEAD~1..HEAD
|
||||
|
||||
# For specific PR
|
||||
gh pr view <number> --json files,additions,deletions
|
||||
gh pr diff <number>
|
||||
```
|
||||
|
||||
3. **Read all relevant files thoroughly**:
|
||||
- Use Read tool for modified files
|
||||
- Understand the context and purpose of changes
|
||||
- Check surrounding code for context
|
||||
|
||||
4. **Review against SigNoz guidelines**:
|
||||
- **Frontend**: Check [Frontend Guidelines](../../frontend/CONTRIBUTIONS.md)
|
||||
- **Backend/Architecture**: Check [CLAUDE.md](../CLAUDE.md) for provider pattern, error handling, SQL, REST, and linting conventions
|
||||
- **General**: Check [Contributing Guidelines](../../CONTRIBUTING.md)
|
||||
- **Commits**: Verify [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/)
|
||||
|
||||
5. **Verify feature intent**:
|
||||
- Read the PR description, commit message, or linked issue to understand *what* the change claims to do
|
||||
- Trace the code path end-to-end to confirm the change actually achieves its stated goal
|
||||
- Check that the happy path works as described
|
||||
- Identify any scenarios where the feature silently does nothing or produces wrong results
|
||||
|
||||
6. **Review for bug introduction**:
|
||||
- **Regressions**: Does the change break existing behavior? Check callers of modified functions/interfaces
|
||||
- **Edge cases**: Empty inputs, nil/undefined values, boundary conditions, concurrent access
|
||||
- **Error paths**: Are all error cases handled? Can errors be swallowed silently?
|
||||
- **State management**: Are state transitions correct? Can state become inconsistent?
|
||||
- **Race conditions**: Shared mutable state, async operations, missing locks or guards
|
||||
- **Type mismatches**: Unsafe casts, implicit conversions, `any` usage hiding real types
|
||||
|
||||
7. **Review for performance implications**:
|
||||
- **Backend**: N+1 queries, missing indexes, unbounded result sets, large allocations in hot paths, unnecessary DB round-trips
|
||||
- **Frontend**: Unnecessary re-renders from inline objects/functions as props, missing memoization on expensive computations, large bundle imports that should be lazy-loaded, unthrottled event handlers
|
||||
- **General**: O(n²) or worse algorithms on potentially large datasets, unnecessary network calls, missing pagination or limits
|
||||
|
||||
8. **Provide actionable, concise feedback** in structured format
|
||||
|
||||
## Review Checklist
|
||||
|
||||
For coding conventions and style, refer to the linked guideline docs. This checklist focuses on **review-specific concerns** that guidelines alone don't catch.
|
||||
|
||||
### Correctness & Intent
|
||||
- [ ] Change achieves what the PR/commit/issue describes
|
||||
- [ ] Happy path works end-to-end
|
||||
- [ ] Edge cases handled (empty, nil, boundary, concurrent)
|
||||
- [ ] Error paths don't swallow failures silently
|
||||
- [ ] No regressions to existing callers of modified code
|
||||
|
||||
### Security
|
||||
- [ ] No exposed secrets, API keys, credentials
|
||||
- [ ] No sensitive data in logs
|
||||
- [ ] Input validation at system boundaries
|
||||
- [ ] Authentication/authorization checked for new endpoints
|
||||
- [ ] No SQL injection or XSS risks
|
||||
|
||||
### Performance
|
||||
- [ ] No N+1 queries or unbounded result sets
|
||||
- [ ] No unnecessary re-renders (inline objects/functions as props, missing memoization)
|
||||
- [ ] No large imports that should be lazy-loaded
|
||||
- [ ] No O(n²) on potentially large datasets
|
||||
- [ ] Pagination/limits present where needed
|
||||
|
||||
### Testing
|
||||
- [ ] New functionality has tests
|
||||
- [ ] Edge cases and error paths tested
|
||||
- [ ] Tests are deterministic (no flakiness)
|
||||
|
||||
### Git/Commits
|
||||
- [ ] Commit messages follow `type(scope): description` ([Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/))
|
||||
- [ ] Commits are atomic and logical
|
||||
|
||||
## Output Format
|
||||
|
||||
Provide feedback in this structured format:
|
||||
|
||||
```markdown
|
||||
## Code Review
|
||||
|
||||
**Scope**: [What was reviewed]
|
||||
**Overall**: [1-2 sentence summary and general sentiment]
|
||||
|
||||
---
|
||||
|
||||
### 🚨 Critical Issues (Must Fix)
|
||||
|
||||
1. **[Category]** `file:line`
|
||||
**Problem**: [What's wrong]
|
||||
**Why**: [Why it matters]
|
||||
**Fix**: [Specific solution]
|
||||
```[language]
|
||||
// Example fix if helpful
|
||||
```
|
||||
|
||||
### ⚠️ Suggestions (Should Consider)
|
||||
|
||||
1. **[Category]** `file:line`
|
||||
**Issue**: [What could be improved]
|
||||
**Suggestion**: [Concrete improvement]
|
||||
|
||||
### ✅ Positive Highlights
|
||||
|
||||
- [Good practice observed]
|
||||
- [Well-implemented feature]
|
||||
|
||||
---
|
||||
|
||||
**References**:
|
||||
- [Relevant guideline links]
|
||||
```
|
||||
|
||||
## Review Categories
|
||||
|
||||
Use these categories for issues:
|
||||
|
||||
- **Bug / Regression**: Logic errors, edge cases, race conditions, broken existing behavior
|
||||
- **Feature Gap**: Change doesn't fully achieve its stated intent
|
||||
- **Security Risk**: Authentication, authorization, data exposure, injection
|
||||
- **Performance Issue**: Inefficient queries, unnecessary re-renders, memory leaks, unbounded data
|
||||
- **Convention Violation**: Style, patterns, architectural guidelines (link to relevant guideline doc)
|
||||
- **Code Quality**: Complexity, duplication, naming, type safety
|
||||
- **Testing**: Missing tests, inadequate coverage, flaky tests
|
||||
|
||||
## Example Review
|
||||
|
||||
```markdown
|
||||
## Code Review
|
||||
|
||||
**Scope**: Changes in `frontend/src/pages/TraceDetail/` (3 files, 245 additions)
|
||||
**Overall**: Good implementation of pagination feature. Found 2 critical issues and 3 suggestions.
|
||||
|
||||
---
|
||||
|
||||
### 🚨 Critical Issues (Must Fix)
|
||||
|
||||
1. **Security Risk** `TraceList.tsx:45`
|
||||
**Problem**: API token exposed in client-side code
|
||||
**Why**: Security vulnerability - tokens should never be in frontend
|
||||
**Fix**: Move authentication to backend, use session-based auth
|
||||
|
||||
2. **Performance Issue** `TraceList.tsx:89`
|
||||
**Problem**: Inline function passed as prop causes unnecessary re-renders
|
||||
**Why**: Violates frontend guideline, degrades performance with large lists
|
||||
**Fix**:
|
||||
```typescript
|
||||
const handleTraceClick = useCallback((traceId: string) => {
|
||||
navigate(`/trace/${traceId}`);
|
||||
}, [navigate]);
|
||||
```
|
||||
|
||||
### ⚠️ Suggestions (Should Consider)
|
||||
|
||||
1. **Code Quality** `TraceList.tsx:120-180`
|
||||
**Issue**: Function exceeds 40-line guideline
|
||||
**Suggestion**: Extract into smaller functions:
|
||||
- `filterTracesByTimeRange()`
|
||||
- `aggregateMetrics()`
|
||||
- `renderChartData()`
|
||||
|
||||
2. **Type Safety** `types.ts:23`
|
||||
**Issue**: Using `any` for trace attributes
|
||||
**Suggestion**: Define proper interface for TraceAttributes
|
||||
|
||||
3. **Convention** `TraceList.tsx:12`
|
||||
**Issue**: File imports not organized
|
||||
**Suggestion**: Let simple-import-sort auto-organize (will happen on save)
|
||||
|
||||
### ✅ Positive Highlights
|
||||
|
||||
- Excellent use of virtualization for large trace lists
|
||||
- Good error boundary implementation
|
||||
- Well-structured component hierarchy
|
||||
- Comprehensive unit tests included
|
||||
|
||||
---
|
||||
|
||||
**References**:
|
||||
- [Frontend Guidelines](../../frontend/CONTRIBUTIONS.md)
|
||||
- [useCallback best practices](https://kentcdodds.com/blog/usememo-and-usecallback)
|
||||
```
|
||||
|
||||
## Tone Guidelines
|
||||
|
||||
- **Be respectful**: Focus on code, not the person
|
||||
- **Be specific**: Always reference exact file:line locations
|
||||
- **Be concise**: Get to the point, avoid verbosity
|
||||
- **Be actionable**: Every comment should have clear resolution path
|
||||
- **Be balanced**: Acknowledge good work alongside issues
|
||||
- **Be educational**: Explain why something is an issue, link to guidelines
|
||||
|
||||
## Priority Levels
|
||||
|
||||
1. **Critical (🚨)**: Security, bugs, data corruption, crashes
|
||||
2. **Important (⚠️)**: Performance, maintainability, convention violations
|
||||
3. **Nice to have (💡)**: Style preferences, micro-optimizations
|
||||
|
||||
## Important Notes
|
||||
|
||||
- **Reference specific guidelines** from docs when applicable
|
||||
- **Provide code examples** for fixes when helpful
|
||||
- **Ask questions** if code intent is unclear
|
||||
- **Link to external resources** for educational value
|
||||
- **Distinguish** must-fix from should-consider
|
||||
- **Be concise** - reviewers value their time
|
||||
|
||||
## Critical Rules
|
||||
|
||||
- **NEVER** be vague - always specify file and line number
|
||||
- **NEVER** just point out problems - suggest solutions
|
||||
- **NEVER** review without reading the actual code
|
||||
- **ALWAYS** check against SigNoz's specific guidelines
|
||||
- **ALWAYS** provide rationale for each comment
|
||||
- **ALWAYS** be constructive and respectful
|
||||
|
||||
## Reference Documents
|
||||
|
||||
- [Frontend Guidelines](../../frontend/CONTRIBUTIONS.md) - React, TypeScript, styling
|
||||
- [Contributing Guidelines](../../CONTRIBUTING.md) - Workflow, commit conventions
|
||||
- [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) - Commit format
|
||||
- [CLAUDE.md](../CLAUDE.md) - Project architecture and conventions
|
||||
14
.claude/skills/traces/SKILL.md
Normal file
14
.claude/skills/traces/SKILL.md
Normal file
@@ -0,0 +1,14 @@
|
||||
---
|
||||
description: Architecture context for the traces module (query building, waterfall, flamegraph)
|
||||
---
|
||||
|
||||
# Traces Module
|
||||
|
||||
Read [traces-module.md](./traces-module.md) for full context before working on this module. It covers:
|
||||
|
||||
- Storage schema (`signoz_index_v3`, `trace_summary`) and gotchas
|
||||
- API endpoints (Query Range V5, waterfall, flamegraph, funnels)
|
||||
- Query building system (statement builder, field mapper, trace operators)
|
||||
- Backend processing pipelines and caching
|
||||
- Frontend component map, state flow, and API hooks
|
||||
- Key file index for backend and frontend
|
||||
191
.claude/skills/traces/traces-module.md
Normal file
191
.claude/skills/traces/traces-module.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# SigNoz Traces Module — Developer Guide
|
||||
|
||||
## Overview
|
||||
|
||||
```
|
||||
App → OTel SDK → OTLP Receiver → [signozspanmetrics, batch] →
|
||||
ClickHouse Exporter → signoz_traces DB → Query Service (Go) → Frontend (React)
|
||||
```
|
||||
|
||||
**Query Service layers**: HTTP Handlers (`http_handler.go`) → Querier (`querier.go`, orchestration/cache) → Statement Builders (`pkg/telemetrytraces/`) → ClickHouse
|
||||
|
||||
---
|
||||
|
||||
## Storage Schema
|
||||
|
||||
All tables in `signoz_traces` database. Schema DDL: `signoz-otel-collector/cmd/signozschemamigrator/schema_migrator/traces_migrations.go`.
|
||||
|
||||
### `distributed_signoz_index_v3` — Primary span storage
|
||||
|
||||
- **Engine**: MergeTree (plain — **no deduplication**, use `DISTINCT ON (span_id)`)
|
||||
- **Key columns**: `ts_bucket_start` (UInt64), `timestamp` (DateTime64(9)), `trace_id` (FixedString(32)), `span_id`, `duration_nano`, `has_error`, `name`, `resource_string_service$$name`, `attributes_string`, `events`, `links`
|
||||
- **ORDER BY**: `(ts_bucket_start, resource_fingerprint, has_error, name, timestamp)`
|
||||
- **Partition**: `toDate(timestamp)`
|
||||
|
||||
### `distributed_trace_summary` — Pre-aggregated trace metadata
|
||||
|
||||
- **Engine**: AggregatingMergeTree. Columns: `trace_id`, `start` (min), `end` (max), `num_spans` (sum)
|
||||
- **Populated by** `trace_summary_mv` — materialized view on `signoz_index_v3` that triggers per-batch, inserting partial aggregates. ClickHouse merges them asynchronously.
|
||||
- **CRITICAL**: Always query with `GROUP BY trace_id` (never raw `SELECT *`)
|
||||
|
||||
### Other tables
|
||||
|
||||
`distributed_tag_attributes_v2` (attribute keys for autocomplete), `distributed_span_attributes_keys` (which attributes exist)
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### 1. Query Range V5 — `POST /api/v5/query_range`
|
||||
|
||||
Primary query endpoint for traces (also logs/metrics). Supports query builder queries, trace operators, aggregations, filters, group by. See [QUERY_RANGE_API.md](../../docs/modules/QUERY_RANGE_API.md).
|
||||
|
||||
Key files: `pkg/telemetrytraces/statement_builder.go`, `trace_operator_statement_builder.go`, `pkg/querier/trace_operator_query.go`
|
||||
|
||||
### 2. Waterfall — `POST /api/v2/traces/waterfall/{traceId}`
|
||||
|
||||
Handler: `http_handler.go:1748` → Reader: `clickhouseReader/reader.go:873`
|
||||
|
||||
**Request**: `{ "selectedSpanId", "isSelectedSpanIDUnCollapsed", "uncollapsedSpans[]" }`
|
||||
**Response**: `{ startTimestampMillis, endTimestampMillis, totalSpansCount, totalErrorSpansCount, rootServiceName, rootServiceEntryPoint, serviceNameToTotalDurationMap, spans[], hasMissingSpans, uncollapsedSpans[] }`
|
||||
|
||||
**Pipeline**:
|
||||
1. Query `trace_summary` for time range → query `signoz_index_v3` with `DISTINCT ON (span_id)` and `ts_bucket_start >= start - 1800`
|
||||
2. Build span tree: map spanID→Span, link parent via CHILD_OF refs, create Missing Span nodes for absent parents
|
||||
3. Cache (key: `getWaterfallSpansForTraceWithMetadata-{traceID}`, TTL: 5 min, skipped if trace end within flux interval of 2 min from now)
|
||||
4. `GetSelectedSpans` (`tracedetail/waterfall.go:159`): find path to selectedSpanID, DFS into uncollapsed nodes, compute SubTreeNodeCount, return sliding window of **500 spans** (40% before, 60% after selected)
|
||||
|
||||
### 3. Flamegraph — `POST /api/v2/traces/flamegraph/{traceId}`
|
||||
|
||||
Handler: `http_handler.go:1781` → Reader: `reader.go:1091`
|
||||
|
||||
**Request**: `{ "selectedSpanId" }` **Response**: `{ startTimestampMillis, endTimestampMillis, durationNano, spans[][] }`
|
||||
|
||||
Same DB query as waterfall, but uses **BFS** (not DFS) to organize by level. Returns `[][]*FlamegraphSpan` (lighter model, no tagMap). Level sampling when > 100 spans/level: top 5 by latency + 50 timestamp buckets (2 each). Window: **50 levels**.
|
||||
|
||||
### 4. Other APIs
|
||||
|
||||
- **Trace Fields**: `GET/POST /api/v2/traces/fields` (handlers at `http_handler.go:4912-4921`)
|
||||
- **Trace Funnels**: CRUD at `/api/v1/trace-funnels/*`, analytics at `/{funnel_id}/analytics/*` (`pkg/modules/tracefunnel/`)
|
||||
|
||||
---
|
||||
|
||||
## Query Building System
|
||||
|
||||
### Query Structure
|
||||
|
||||
```go
|
||||
QueryBuilderQuery[TraceAggregation]{
|
||||
Signal: SignalTraces,
|
||||
Filter: &Filter{Expression: "service.name = 'api' AND duration_nano > 1000000"},
|
||||
Aggregations: []TraceAggregation{{Expression: "count()", Alias: "total"}},
|
||||
GroupBy: []GroupByKey{{TelemetryFieldKey: {Name: "service.name"}}},
|
||||
}
|
||||
```
|
||||
|
||||
### SQL Generation (`statement_builder.go`)
|
||||
|
||||
1. **Field resolution** via `field_mapper.go` — maps intrinsic (`trace_id`, `duration_nano`), calculated (`http_method`, `has_error`), and attribute fields (`attributes_string[...]`) to CH columns. Example: `"service.name"` → `"resource_string_service$$name"`
|
||||
2. **Time optimization** — if `trace_id` in filter, queries `trace_summary` first to narrow range
|
||||
3. **Filter building** via `condition_builder.go` — supports `=`, `!=`, `IN`, `LIKE`, `ILIKE`, `EXISTS`, `CONTAINS`, comparisons
|
||||
4. **Build SQL** by request type: `buildListQuery()`, `buildTimeSeriesQuery()`, `buildScalarQuery()`, `buildTraceQuery()`
|
||||
|
||||
### Trace Operators (`trace_operator_statement_builder.go`)
|
||||
|
||||
Combines multiple trace queries with set operations. Parses expression (e.g., `"A AND B"`) → builds CTE per query via `trace_operator_cte_builder.go` → combines with INTERSECT (AND), UNION (OR), EXCEPT (NOT).
|
||||
|
||||
---
|
||||
|
||||
## Frontend (Trace Detail)
|
||||
|
||||
### State Flow
|
||||
```
|
||||
TraceDetailsV2 (pages/TraceDetailV2/TraceDetailV2.tsx)
|
||||
├── uncollapsedNodes, interestedSpanId, selectedSpan
|
||||
├── useGetTraceV2 → waterfall API
|
||||
├── TraceMetadata (totalSpans, errors, duration)
|
||||
├── TraceFlamegraph (separate API via useGetTraceFlamegraph)
|
||||
└── TraceWaterfall → Success → TableV3 (virtualized)
|
||||
```
|
||||
|
||||
### Components
|
||||
|
||||
| Component | File |
|
||||
|-----------|------|
|
||||
| TraceDetailsV2 | `pages/TraceDetailV2/TraceDetailV2.tsx` |
|
||||
| TraceMetadata | `container/TraceMetadata/TraceMetadata.tsx` |
|
||||
| TraceWaterfall | `container/TraceWaterfall/TraceWaterfall.tsx` |
|
||||
| Success (waterfall table) | `container/TraceWaterfall/.../Success/Success.tsx` |
|
||||
| Filters | `container/TraceWaterfall/.../Filters/Filters.tsx` |
|
||||
| TraceFlamegraph | `container/PaginatedTraceFlamegraph/PaginatedTraceFlamegraph.tsx` |
|
||||
| SpanDetailsDrawer | `container/SpanDetailsDrawer/SpanDetailsDrawer.tsx` |
|
||||
|
||||
### API Hooks
|
||||
|
||||
| Hook | API |
|
||||
|------|-----|
|
||||
| `useGetTraceV2` (`hooks/trace/useGetTraceV2.tsx`) | POST waterfall |
|
||||
| `useGetTraceFlamegraph` (`hooks/trace/useGetTraceFlamegraph.tsx`) | POST flamegraph |
|
||||
|
||||
Adapter: `api/trace/getTraceV2.tsx`. Types: `types/api/trace/getTraceV2.ts`.
|
||||
|
||||
---
|
||||
|
||||
## Known Gotchas
|
||||
|
||||
1. **trace_summary**: Always `GROUP BY trace_id` — raw reads return partial unmerged rows
|
||||
2. **signoz_index_v3 dedup**: Plain MergeTree. Waterfall uses `DISTINCT ON (span_id)`. Flamegraph relies on map-key dedup (keeps last-seen)
|
||||
3. **Flux interval**: Traces ending within 2 min of now bypass cache → fresh DB query every interaction
|
||||
4. **SubTreeNodeCount**: Self-inclusive (root count = total tree nodes)
|
||||
5. **Waterfall pagination**: Max 500 spans per response (sliding window). Frontend virtual-scrolls and re-fetches at edges
|
||||
|
||||
---
|
||||
|
||||
## Extending the Module
|
||||
|
||||
- **New calculated field**: Define in `telemetrytraces/const.go` → map in `field_mapper.go` → optionally update `condition_builder.go`
|
||||
- **New API endpoint**: Handler in `http_handler.go` → register route → implement in ClickHouseReader or Querier
|
||||
- **New aggregation**: Update `querybuilder/agg_expr_rewriter.go`
|
||||
- **New trace operator**: Update `grammar/TraceOperatorGrammar.g4` + `trace_operator_cte_builder.go`
|
||||
|
||||
---
|
||||
|
||||
## Key File Index
|
||||
|
||||
### Backend
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `pkg/telemetrytraces/statement_builder.go` | Trace SQL generation |
|
||||
| `pkg/telemetrytraces/field_mapper.go` | Field → CH column mapping |
|
||||
| `pkg/telemetrytraces/condition_builder.go` | WHERE clause building |
|
||||
| `pkg/telemetrytraces/trace_operator_statement_builder.go` | Trace operator SQL |
|
||||
| `pkg/telemetrytraces/trace_operator_cte_builder.go` | Trace operator CTEs |
|
||||
| `pkg/querier/trace_operator_query.go` | Trace operator execution |
|
||||
| `pkg/query-service/app/http_handler.go:1748` | Waterfall handler |
|
||||
| `pkg/query-service/app/http_handler.go:1781` | Flamegraph handler |
|
||||
| `pkg/query-service/app/clickhouseReader/reader.go:831` | GetSpansForTrace |
|
||||
| `pkg/query-service/app/clickhouseReader/reader.go:873` | Waterfall logic |
|
||||
| `pkg/query-service/app/clickhouseReader/reader.go:1091` | Flamegraph logic |
|
||||
| `pkg/query-service/app/traces/tracedetail/waterfall.go` | DFS traversal, span selection |
|
||||
| `pkg/query-service/app/traces/tracedetail/flamegraph.go` | BFS traversal, level sampling |
|
||||
| `pkg/query-service/model/response.go:279` | Span model (waterfall) |
|
||||
| `pkg/query-service/model/response.go:305` | FlamegraphSpan model |
|
||||
| `pkg/query-service/model/trace.go` | SpanItemV2, TraceSummary |
|
||||
| `pkg/query-service/model/cacheable.go` | Cache structures |
|
||||
|
||||
### Frontend
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `pages/TraceDetailV2/TraceDetailV2.tsx` | Page container |
|
||||
| `container/TraceWaterfall/.../Success/Success.tsx` | Waterfall table |
|
||||
| `container/PaginatedTraceFlamegraph/PaginatedTraceFlamegraph.tsx` | Flamegraph |
|
||||
| `hooks/trace/useGetTraceV2.tsx` | Waterfall API hook |
|
||||
| `hooks/trace/useGetTraceFlamegraph.tsx` | Flamegraph API hook |
|
||||
| `api/trace/getTraceV2.tsx` | API adapter |
|
||||
| `types/api/trace/getTraceV2.ts` | TypeScript types |
|
||||
|
||||
### Schema DDL
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `signozschemamigrator/.../traces_migrations.go:10-134` | signoz_index_v3 |
|
||||
| `signozschemamigrator/.../traces_migrations.go:271-348` | trace_summary + MV |
|
||||
980
docs/modules/QUERY_RANGE_API.md
Normal file
980
docs/modules/QUERY_RANGE_API.md
Normal file
@@ -0,0 +1,980 @@
|
||||
# Query Range API (V5) - Developer Guide
|
||||
|
||||
This document provides a comprehensive guide to the Query Range API (V5), which is the primary query endpoint for traces, logs, and metrics in SigNoz. It covers architecture, request/response models, code flows, and implementation details.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [API Endpoint](#api-endpoint)
|
||||
3. [Request/Response Models](#requestresponse-models)
|
||||
4. [Query Types](#query-types)
|
||||
5. [Request Types](#request-types)
|
||||
6. [Code Flow](#code-flow)
|
||||
7. [Key Components](#key-components)
|
||||
8. [Query Execution](#query-execution)
|
||||
9. [Caching](#caching)
|
||||
10. [Result Processing](#result-processing)
|
||||
11. [Performance Considerations](#performance-considerations)
|
||||
12. [Extending the API](#extending-the-api)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Query Range API (V5) is the unified query endpoint for all telemetry signals (traces, logs, metrics) in SigNoz. It provides:
|
||||
|
||||
- **Unified Interface**: Single endpoint for all signal types
|
||||
- **Query Builder**: Visual query builder support
|
||||
- **Multiple Query Types**: Builder queries, PromQL, ClickHouse SQL, Formulas, Trace Operators
|
||||
- **Flexible Response Types**: Time series, scalar, raw data, trace-specific
|
||||
- **Advanced Features**: Aggregations, filters, group by, ordering, pagination
|
||||
- **Caching**: Intelligent caching for performance
|
||||
|
||||
### Key Technologies
|
||||
|
||||
- **Backend**: Go (Golang)
|
||||
- **Storage**: ClickHouse (columnar database)
|
||||
- **Query Language**: Custom query builder + PromQL + ClickHouse SQL
|
||||
- **Protocol**: HTTP/REST API
|
||||
|
||||
---
|
||||
|
||||
## API Endpoint
|
||||
|
||||
### Endpoint Details
|
||||
|
||||
**URL**: `POST /api/v5/query_range`
|
||||
|
||||
**Handler**: `QuerierAPI.QueryRange` → `querier.QueryRange`
|
||||
|
||||
**Location**:
|
||||
- Handler: `pkg/querier/querier.go:122`
|
||||
- Route Registration: `pkg/query-service/app/http_handler.go:480`
|
||||
|
||||
**Authentication**: Requires ViewAccess permission
|
||||
|
||||
**Content-Type**: `application/json`
|
||||
|
||||
### Request Flow
|
||||
|
||||
```
|
||||
HTTP Request (POST /api/v5/query_range)
|
||||
↓
|
||||
HTTP Handler (QuerierAPI.QueryRange)
|
||||
↓
|
||||
Querier.QueryRange (pkg/querier/querier.go)
|
||||
↓
|
||||
Query Execution (Statement Builders → ClickHouse)
|
||||
↓
|
||||
Result Processing & Merging
|
||||
↓
|
||||
HTTP Response (QueryRangeResponse)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Request/Response Models
|
||||
|
||||
### Request Model
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/req.go`
|
||||
|
||||
```go
|
||||
type QueryRangeRequest struct {
|
||||
Start uint64 // Start timestamp (milliseconds)
|
||||
End uint64 // End timestamp (milliseconds)
|
||||
RequestType RequestType // Response type (TimeSeries, Scalar, Raw, Trace)
|
||||
Variables map[string]VariableItem // Template variables
|
||||
CompositeQuery CompositeQuery // Container for queries
|
||||
NoCache bool // Skip cache flag
|
||||
}
|
||||
```
|
||||
|
||||
### Composite Query
|
||||
|
||||
```go
|
||||
type CompositeQuery struct {
|
||||
Queries []QueryEnvelope // Array of queries to execute
|
||||
}
|
||||
```
|
||||
|
||||
### Query Envelope
|
||||
|
||||
```go
|
||||
type QueryEnvelope struct {
|
||||
Type QueryType // Query type (Builder, PromQL, ClickHouseSQL, Formula, TraceOperator)
|
||||
Spec any // Query specification (type-specific)
|
||||
}
|
||||
```
|
||||
|
||||
### Response Model
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/req.go`
|
||||
|
||||
```go
|
||||
type QueryRangeResponse struct {
|
||||
Type RequestType // Response type
|
||||
Data QueryData // Query results
|
||||
Meta ExecStats // Execution statistics
|
||||
Warning *QueryWarnData // Warnings (if any)
|
||||
QBEvent *QBEvent // Query builder event metadata
|
||||
}
|
||||
|
||||
type QueryData struct {
|
||||
Results []any // Array of result objects (type depends on RequestType)
|
||||
}
|
||||
|
||||
type ExecStats struct {
|
||||
RowsScanned uint64 // Total rows scanned
|
||||
BytesScanned uint64 // Total bytes scanned
|
||||
DurationMS uint64 // Query duration in milliseconds
|
||||
StepIntervals map[string]uint64 // Step intervals per query
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Types
|
||||
|
||||
The API supports multiple query types, each with its own specification format.
|
||||
|
||||
### 1. Builder Query (`QueryTypeBuilder`)
|
||||
|
||||
Visual query builder queries. Supports traces, logs, and metrics.
|
||||
|
||||
**Spec Type**: `QueryBuilderQuery[T]` where T is:
|
||||
- `TraceAggregation` for traces
|
||||
- `LogAggregation` for logs
|
||||
- `MetricAggregation` for metrics
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
QueryBuilderQuery[TraceAggregation] {
|
||||
Name: "query_name",
|
||||
Signal: SignalTraces,
|
||||
Filter: &Filter {
|
||||
Expression: "service.name = 'api' AND duration_nano > 1000000",
|
||||
},
|
||||
Aggregations: []TraceAggregation {
|
||||
{Expression: "count()", Alias: "total"},
|
||||
{Expression: "avg(duration_nano)", Alias: "avg_duration"},
|
||||
},
|
||||
GroupBy: []GroupByKey {...},
|
||||
Order: []OrderBy {...},
|
||||
Limit: 100,
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**:
|
||||
- Traces: `pkg/telemetrytraces/statement_builder.go`
|
||||
- Logs: `pkg/telemetrylogs/statement_builder.go`
|
||||
- Metrics: `pkg/telemetrymetrics/statement_builder.go`
|
||||
|
||||
### 2. PromQL Query (`QueryTypePromQL`)
|
||||
|
||||
Prometheus Query Language queries for metrics.
|
||||
|
||||
**Spec Type**: `PromQuery`
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
PromQuery {
|
||||
Query: "rate(http_requests_total[5m])",
|
||||
Step: Step{Duration: time.Minute},
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**: `pkg/querier/promql_query.go`
|
||||
|
||||
### 3. ClickHouse SQL Query (`QueryTypeClickHouseSQL`)
|
||||
|
||||
Direct ClickHouse SQL queries.
|
||||
|
||||
**Spec Type**: `ClickHouseQuery`
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
ClickHouseQuery {
|
||||
Query: "SELECT count() FROM signoz_traces.distributed_signoz_index_v3 WHERE ...",
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**: `pkg/querier/ch_sql_query.go`
|
||||
|
||||
### 4. Formula Query (`QueryTypeFormula`)
|
||||
|
||||
Mathematical formulas combining other queries.
|
||||
|
||||
**Spec Type**: `QueryBuilderFormula`
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
QueryBuilderFormula {
|
||||
Expression: "A / B * 100", // A and B are query names
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**: `pkg/querier/formula_query.go`
|
||||
|
||||
### 5. Trace Operator Query (`QueryTypeTraceOperator`)
|
||||
|
||||
Set operations on trace queries (AND, OR, NOT).
|
||||
|
||||
**Spec Type**: `QueryBuilderTraceOperator`
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
QueryBuilderTraceOperator {
|
||||
Expression: "A AND B", // A and B are query names
|
||||
Filter: &Filter {...},
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**:
|
||||
- `pkg/telemetrytraces/trace_operator_statement_builder.go`
|
||||
- `pkg/querier/trace_operator_query.go`
|
||||
|
||||
---
|
||||
|
||||
## Request Types
|
||||
|
||||
The `RequestType` determines the format of the response data.
|
||||
|
||||
### 1. `RequestTypeTimeSeries`
|
||||
|
||||
Returns time series data for charts.
|
||||
|
||||
**Response Format**: `TimeSeriesData`
|
||||
|
||||
```go
|
||||
type TimeSeriesData struct {
|
||||
QueryName string
|
||||
Aggregations []AggregationBucket
|
||||
}
|
||||
|
||||
type AggregationBucket struct {
|
||||
Index int
|
||||
Series []TimeSeries
|
||||
Alias string
|
||||
Meta AggregationMeta
|
||||
}
|
||||
|
||||
type TimeSeries struct {
|
||||
Labels map[string]string
|
||||
Values []TimeSeriesValue
|
||||
}
|
||||
|
||||
type TimeSeriesValue struct {
|
||||
Timestamp int64
|
||||
Value float64
|
||||
}
|
||||
```
|
||||
|
||||
**Use Case**: Line charts, bar charts, area charts
|
||||
|
||||
### 2. `RequestTypeScalar`
|
||||
|
||||
Returns a single scalar value.
|
||||
|
||||
**Response Format**: `ScalarData`
|
||||
|
||||
```go
|
||||
type ScalarData struct {
|
||||
QueryName string
|
||||
Data []ScalarValue
|
||||
}
|
||||
|
||||
type ScalarValue struct {
|
||||
Timestamp int64
|
||||
Value float64
|
||||
}
|
||||
```
|
||||
|
||||
**Use Case**: Single value displays, stat panels
|
||||
|
||||
### 3. `RequestTypeRaw`
|
||||
|
||||
Returns raw data rows.
|
||||
|
||||
**Response Format**: `RawData`
|
||||
|
||||
```go
|
||||
type RawData struct {
|
||||
QueryName string
|
||||
Columns []string
|
||||
Rows []RawDataRow
|
||||
}
|
||||
|
||||
type RawDataRow struct {
|
||||
Timestamp time.Time
|
||||
Data map[string]any
|
||||
}
|
||||
```
|
||||
|
||||
**Use Case**: Tables, logs viewer, trace lists
|
||||
|
||||
### 4. `RequestTypeTrace`
|
||||
|
||||
Returns trace-specific data structure.
|
||||
|
||||
**Response Format**: Trace-specific format (see traces documentation)
|
||||
|
||||
**Use Case**: Trace-specific visualizations
|
||||
|
||||
---
|
||||
|
||||
## Code Flow
|
||||
|
||||
### Complete Request Flow
|
||||
|
||||
```
|
||||
1. HTTP Request
|
||||
POST /api/v5/query_range
|
||||
Body: QueryRangeRequest JSON
|
||||
↓
|
||||
2. HTTP Handler
|
||||
QuerierAPI.QueryRange (pkg/querier/querier.go)
|
||||
- Validates request
|
||||
- Extracts organization ID from auth context
|
||||
↓
|
||||
3. Querier.QueryRange (pkg/querier/querier.go:122)
|
||||
- Validates QueryRangeRequest
|
||||
- Processes each query in CompositeQuery.Queries
|
||||
- Identifies dependencies (e.g., trace operators, formulas)
|
||||
- Calculates step intervals
|
||||
- Fetches metric temporality if needed
|
||||
↓
|
||||
4. Query Creation
|
||||
For each QueryEnvelope:
|
||||
|
||||
a. Builder Query:
|
||||
- newBuilderQuery() creates builderQuery instance
|
||||
- Selects appropriate statement builder based on signal:
|
||||
* Traces → traceStmtBuilder
|
||||
* Logs → logStmtBuilder
|
||||
* Metrics → metricStmtBuilder or meterStmtBuilder
|
||||
↓
|
||||
|
||||
b. PromQL Query:
|
||||
- newPromqlQuery() creates promqlQuery instance
|
||||
- Uses Prometheus engine
|
||||
↓
|
||||
|
||||
c. ClickHouse SQL Query:
|
||||
- newchSQLQuery() creates chSQLQuery instance
|
||||
- Direct SQL execution
|
||||
↓
|
||||
|
||||
d. Formula Query:
|
||||
- newFormulaQuery() creates formulaQuery instance
|
||||
- References other queries by name
|
||||
↓
|
||||
|
||||
e. Trace Operator Query:
|
||||
- newTraceOperatorQuery() creates traceOperatorQuery instance
|
||||
- Uses traceOperatorStmtBuilder
|
||||
↓
|
||||
5. Statement Building (for Builder queries)
|
||||
StatementBuilder.Build()
|
||||
- Resolves field keys from metadata store
|
||||
- Builds SQL based on request type:
|
||||
* RequestTypeRaw → buildListQuery()
|
||||
* RequestTypeTimeSeries → buildTimeSeriesQuery()
|
||||
* RequestTypeScalar → buildScalarQuery()
|
||||
* RequestTypeTrace → buildTraceQuery()
|
||||
- Returns SQL statement with arguments
|
||||
↓
|
||||
6. Query Execution
|
||||
Query.Execute()
|
||||
- Executes SQL/query against ClickHouse or Prometheus
|
||||
- Processes results into response format
|
||||
- Returns Result with data and statistics
|
||||
↓
|
||||
7. Caching (if applicable)
|
||||
- Checks bucket cache for time series queries
|
||||
- Executes queries for missing time ranges
|
||||
- Merges cached and fresh results
|
||||
↓
|
||||
8. Result Processing
|
||||
querier.run()
|
||||
- Executes all queries (with dependency resolution)
|
||||
- Collects results and warnings
|
||||
- Merges results from multiple queries
|
||||
↓
|
||||
9. Post-Processing
|
||||
postProcessResults()
|
||||
- Applies formulas if present
|
||||
- Handles variable substitution
|
||||
- Formats results for response
|
||||
↓
|
||||
10. HTTP Response
|
||||
- Returns QueryRangeResponse with results
|
||||
- Includes execution statistics
|
||||
- Includes warnings if any
|
||||
```
|
||||
|
||||
### Key Decision Points
|
||||
|
||||
1. **Query Type Selection**: Based on `QueryEnvelope.Type`
|
||||
2. **Signal Selection**: For builder queries, based on `Signal` field
|
||||
3. **Request Type Handling**: Different SQL generation for different request types
|
||||
4. **Caching Strategy**: Only for time series queries with valid fingerprints
|
||||
5. **Dependency Resolution**: Trace operators and formulas resolve dependencies first
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Querier
|
||||
|
||||
**Location**: `pkg/querier/querier.go`
|
||||
|
||||
**Purpose**: Orchestrates query execution, caching, and result merging
|
||||
|
||||
**Key Methods**:
|
||||
- `QueryRange()`: Main entry point for query execution
|
||||
- `run()`: Executes queries and merges results
|
||||
- `executeWithCache()`: Handles caching logic
|
||||
- `mergeResults()`: Merges cached and fresh results
|
||||
- `postProcessResults()`: Applies formulas and variable substitution
|
||||
|
||||
**Key Features**:
|
||||
- Query orchestration across multiple query types
|
||||
- Intelligent caching with bucket-based strategy
|
||||
- Result merging from multiple queries
|
||||
- Formula evaluation
|
||||
- Time range optimization
|
||||
- Step interval calculation and validation
|
||||
|
||||
### 2. Statement Builder Interface
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/`
|
||||
|
||||
**Purpose**: Converts query builder specifications into executable queries
|
||||
|
||||
**Interface**:
|
||||
```go
|
||||
type StatementBuilder[T any] interface {
|
||||
Build(
|
||||
ctx context.Context,
|
||||
start uint64,
|
||||
end uint64,
|
||||
requestType RequestType,
|
||||
query QueryBuilderQuery[T],
|
||||
variables map[string]VariableItem,
|
||||
) (*Statement, error)
|
||||
}
|
||||
```
|
||||
|
||||
**Implementations**:
|
||||
- `traceQueryStatementBuilder` - Traces (`pkg/telemetrytraces/statement_builder.go`)
|
||||
- `logQueryStatementBuilder` - Logs (`pkg/telemetrylogs/statement_builder.go`)
|
||||
- `metricQueryStatementBuilder` - Metrics (`pkg/telemetrymetrics/statement_builder.go`)
|
||||
|
||||
**Key Features**:
|
||||
- Field resolution via metadata store
|
||||
- SQL generation for different request types
|
||||
- Filter, aggregation, group by, ordering support
|
||||
- Time range optimization
|
||||
|
||||
### 3. Query Interface
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/`
|
||||
|
||||
**Purpose**: Represents an executable query
|
||||
|
||||
**Interface**:
|
||||
```go
|
||||
type Query interface {
|
||||
Execute(ctx context.Context) (*Result, error)
|
||||
Fingerprint() string // For caching
|
||||
Window() (uint64, uint64) // Time range
|
||||
}
|
||||
```
|
||||
|
||||
**Implementations**:
|
||||
- `builderQuery[T]` - Builder queries (`pkg/querier/builder_query.go`)
|
||||
- `promqlQuery` - PromQL queries (`pkg/querier/promql_query.go`)
|
||||
- `chSQLQuery` - ClickHouse SQL queries (`pkg/querier/ch_sql_query.go`)
|
||||
- `formulaQuery` - Formula queries (`pkg/querier/formula_query.go`)
|
||||
- `traceOperatorQuery` - Trace operator queries (`pkg/querier/trace_operator_query.go`)
|
||||
|
||||
### 4. Telemetry Store
|
||||
|
||||
**Location**: `pkg/telemetrystore/`
|
||||
|
||||
**Purpose**: Abstraction layer for ClickHouse database access
|
||||
|
||||
**Key Methods**:
|
||||
- `Query()`: Execute SQL query
|
||||
- `QueryRow()`: Execute query returning single row
|
||||
- `Select()`: Execute query returning multiple rows
|
||||
|
||||
**Implementation**: `clickhouseTelemetryStore` (`pkg/telemetrystore/clickhousetelemetrystore/`)
|
||||
|
||||
### 5. Metadata Store
|
||||
|
||||
**Location**: `pkg/types/telemetrytypes/`
|
||||
|
||||
**Purpose**: Provides metadata about available fields, keys, and attributes
|
||||
|
||||
**Key Methods**:
|
||||
- `GetKeysMulti()`: Get field keys for multiple selectors
|
||||
- `FetchTemporalityMulti()`: Get metric temporality information
|
||||
|
||||
**Implementation**: `telemetryMetadataStore` (`pkg/telemetrymetadata/`)
|
||||
|
||||
### 6. Bucket Cache
|
||||
|
||||
**Location**: `pkg/querier/`
|
||||
|
||||
**Purpose**: Caches query results by time buckets for performance
|
||||
|
||||
**Key Methods**:
|
||||
- `GetMissRanges()`: Get time ranges not in cache
|
||||
- `Put()`: Store query result in cache
|
||||
|
||||
**Features**:
|
||||
- Bucket-based caching (aligned to step intervals)
|
||||
- Automatic cache invalidation
|
||||
- Parallel query execution for missing ranges
|
||||
|
||||
---
|
||||
|
||||
## Query Execution
|
||||
|
||||
### Builder Query Execution
|
||||
|
||||
**Location**: `pkg/querier/builder_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Statement builder generates SQL
|
||||
2. SQL executed against ClickHouse via TelemetryStore
|
||||
3. Results processed based on RequestType:
|
||||
- TimeSeries: Grouped by time buckets and labels
|
||||
- Scalar: Single value extraction
|
||||
- Raw: Row-by-row processing
|
||||
4. Statistics collected (rows scanned, bytes scanned, duration)
|
||||
|
||||
### PromQL Query Execution
|
||||
|
||||
**Location**: `pkg/querier/promql_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Query parsed by Prometheus engine
|
||||
2. Executed against Prometheus-compatible data
|
||||
3. Results converted to QueryRangeResponse format
|
||||
|
||||
### ClickHouse SQL Query Execution
|
||||
|
||||
**Location**: `pkg/querier/ch_sql_query.go`
|
||||
|
||||
**Process**:
|
||||
1. SQL query executed directly
|
||||
2. Results processed based on RequestType
|
||||
3. Variable substitution applied
|
||||
|
||||
### Formula Query Execution
|
||||
|
||||
**Location**: `pkg/querier/formula_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Referenced queries executed first
|
||||
2. Formula expression evaluated using govaluate
|
||||
3. Results computed from query results
|
||||
|
||||
### Trace Operator Query Execution
|
||||
|
||||
**Location**: `pkg/querier/trace_operator_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Expression parsed to find dependencies
|
||||
2. Referenced queries executed
|
||||
3. Set operations applied (INTERSECT, UNION, EXCEPT)
|
||||
4. Results combined
|
||||
|
||||
---
|
||||
|
||||
## Caching
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
**Location**: `pkg/querier/querier.go:642`
|
||||
|
||||
**When Caching Applies**:
|
||||
- Time series queries only
|
||||
- Queries with valid fingerprints
|
||||
- `NoCache` flag not set
|
||||
|
||||
**How It Works**:
|
||||
1. Query fingerprint generated (includes query structure, filters, time range)
|
||||
2. Cache checked for existing results
|
||||
3. Missing time ranges identified
|
||||
4. Queries executed only for missing ranges (parallel execution)
|
||||
5. Fresh results merged with cached results
|
||||
6. Merged result stored in cache
|
||||
|
||||
### Cache Key Generation
|
||||
|
||||
**Location**: `pkg/querier/builder_query.go:52`
|
||||
|
||||
The fingerprint includes:
|
||||
- Signal type
|
||||
- Source type
|
||||
- Step interval
|
||||
- Aggregations
|
||||
- Filters
|
||||
- Group by fields
|
||||
- Time range (for cache key, not fingerprint)
|
||||
|
||||
### Cache Benefits
|
||||
|
||||
- **Performance**: Avoids re-executing identical queries
|
||||
- **Efficiency**: Only queries missing time ranges
|
||||
- **Parallelism**: Multiple missing ranges queried in parallel
|
||||
|
||||
---
|
||||
|
||||
## Result Processing
|
||||
|
||||
### Result Merging
|
||||
|
||||
**Location**: `pkg/querier/querier.go:795`
|
||||
|
||||
**Process**:
|
||||
1. Results from multiple queries collected
|
||||
2. For time series: Series merged by labels
|
||||
3. For raw data: Rows combined
|
||||
4. Statistics aggregated (rows scanned, bytes scanned, duration)
|
||||
|
||||
### Formula Evaluation
|
||||
|
||||
**Location**: `pkg/querier/formula_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Formula expression parsed
|
||||
2. Referenced query results retrieved
|
||||
3. Expression evaluated using govaluate library
|
||||
4. Result computed and formatted
|
||||
|
||||
### Variable Substitution
|
||||
|
||||
**Location**: `pkg/querier/querier.go`
|
||||
|
||||
**Process**:
|
||||
1. Variables extracted from request
|
||||
2. Variable values substituted in queries
|
||||
3. Applied to filters, aggregations, and other query parts
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Query Optimization
|
||||
|
||||
1. **Time Range Optimization**:
|
||||
- For trace queries with `trace_id` filter, query `trace_summary` first to narrow time range
|
||||
- Use appropriate time ranges to limit data scanned
|
||||
|
||||
2. **Step Interval Calculation**:
|
||||
- Automatic step interval calculation based on time range
|
||||
- Minimum step interval enforcement
|
||||
- Warnings for suboptimal intervals
|
||||
|
||||
3. **Index Usage**:
|
||||
- Queries use time bucket columns (`ts_bucket_start`) for efficient filtering
|
||||
- Proper filter placement for index utilization
|
||||
|
||||
4. **Limit Enforcement**:
|
||||
- Raw data queries should include limits
|
||||
- Pagination support via offset/cursor
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Use Query Builder**: Prefer query builder over raw SQL for better optimization
|
||||
2. **Limit Time Ranges**: Always specify reasonable time ranges
|
||||
3. **Use Aggregations**: For large datasets, use aggregations instead of raw data
|
||||
4. **Cache Awareness**: Be mindful of cache TTLs when testing
|
||||
5. **Parallel Queries**: Multiple independent queries execute in parallel
|
||||
6. **Step Intervals**: Let system calculate optimal step intervals
|
||||
|
||||
### Monitoring
|
||||
|
||||
Execution statistics are included in response:
|
||||
- `RowsScanned`: Total rows scanned
|
||||
- `BytesScanned`: Total bytes scanned
|
||||
- `DurationMS`: Query execution time
|
||||
- `StepIntervals`: Step intervals per query
|
||||
|
||||
---
|
||||
|
||||
## Extending the API
|
||||
|
||||
### Adding a New Query Type
|
||||
|
||||
1. **Define Query Type** (`pkg/types/querybuildertypes/querybuildertypesv5/query.go`):
|
||||
```go
|
||||
const (
|
||||
QueryTypeMyNewType QueryType = "my_new_type"
|
||||
)
|
||||
```
|
||||
|
||||
2. **Define Query Spec**:
|
||||
```go
|
||||
type MyNewQuerySpec struct {
|
||||
Name string
|
||||
// ... your fields
|
||||
}
|
||||
```
|
||||
|
||||
3. **Update QueryEnvelope Unmarshaling** (`pkg/types/querybuildertypes/querybuildertypesv5/query.go`):
|
||||
```go
|
||||
case QueryTypeMyNewType:
|
||||
var spec MyNewQuerySpec
|
||||
if err := UnmarshalJSONWithContext(shadow.Spec, &spec, "my new query spec"); err != nil {
|
||||
return wrapUnmarshalError(err, "invalid my new query spec: %v", err)
|
||||
}
|
||||
q.Spec = spec
|
||||
```
|
||||
|
||||
4. **Implement Query Interface** (`pkg/querier/my_new_query.go`):
|
||||
```go
|
||||
type myNewQuery struct {
|
||||
spec MyNewQuerySpec
|
||||
// ... other fields
|
||||
}
|
||||
|
||||
func (q *myNewQuery) Execute(ctx context.Context) (*qbtypes.Result, error) {
|
||||
// Implementation
|
||||
}
|
||||
|
||||
func (q *myNewQuery) Fingerprint() string {
|
||||
// Generate fingerprint for caching
|
||||
}
|
||||
|
||||
func (q *myNewQuery) Window() (uint64, uint64) {
|
||||
// Return time range
|
||||
}
|
||||
```
|
||||
|
||||
5. **Update Querier** (`pkg/querier/querier.go`):
|
||||
```go
|
||||
case QueryTypeMyNewType:
|
||||
myQuery, ok := query.Spec.(MyNewQuerySpec)
|
||||
if !ok {
|
||||
return nil, errors.NewInvalidInputf(...)
|
||||
}
|
||||
queries[myQuery.Name] = newMyNewQuery(myQuery, ...)
|
||||
```
|
||||
|
||||
### Adding a New Request Type
|
||||
|
||||
1. **Define Request Type** (`pkg/types/querybuildertypes/querybuildertypesv5/req.go`):
|
||||
```go
|
||||
const (
|
||||
RequestTypeMyNewType RequestType = "my_new_type"
|
||||
)
|
||||
```
|
||||
|
||||
2. **Update Statement Builders**: Add handling in `Build()` method
|
||||
3. **Update Query Execution**: Add result processing for new type
|
||||
4. **Update Response Models**: Add response data structure
|
||||
|
||||
### Adding a New Aggregation Function
|
||||
|
||||
1. **Update Aggregation Rewriter** (`pkg/querybuilder/agg_expr_rewriter.go`):
|
||||
```go
|
||||
func (r *aggExprRewriter) RewriteAggregation(expr string) (string, error) {
|
||||
if strings.HasPrefix(expr, "my_function(") {
|
||||
// Parse arguments
|
||||
// Return ClickHouse SQL expression
|
||||
return "myClickHouseFunction(...)", nil
|
||||
}
|
||||
// ... existing functions
|
||||
}
|
||||
```
|
||||
|
||||
2. **Update Documentation**: Document the new function
|
||||
|
||||
---
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Simple Time Series Query
|
||||
|
||||
```go
|
||||
req := qbtypes.QueryRangeRequest{
|
||||
Start: startMs,
|
||||
End: endMs,
|
||||
RequestType: qbtypes.RequestTypeTimeSeries,
|
||||
CompositeQuery: qbtypes.CompositeQuery{
|
||||
Queries: []qbtypes.QueryEnvelope{
|
||||
{
|
||||
Type: qbtypes.QueryTypeBuilder,
|
||||
Spec: qbtypes.QueryBuilderQuery[qbtypes.MetricAggregation]{
|
||||
Name: "A",
|
||||
Signal: telemetrytypes.SignalMetrics,
|
||||
Aggregations: []qbtypes.MetricAggregation{
|
||||
{Expression: "sum(rate)", Alias: "total"},
|
||||
},
|
||||
StepInterval: qbtypes.Step{Duration: time.Minute},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: Query with Filter and Group By
|
||||
|
||||
```go
|
||||
req := qbtypes.QueryRangeRequest{
|
||||
Start: startMs,
|
||||
End: endMs,
|
||||
RequestType: qbtypes.RequestTypeTimeSeries,
|
||||
CompositeQuery: qbtypes.CompositeQuery{
|
||||
Queries: []qbtypes.QueryEnvelope{
|
||||
{
|
||||
Type: qbtypes.QueryTypeBuilder,
|
||||
Spec: qbtypes.QueryBuilderQuery[qbtypes.TraceAggregation]{
|
||||
Name: "A",
|
||||
Signal: telemetrytypes.SignalTraces,
|
||||
Filter: &qbtypes.Filter{
|
||||
Expression: "service.name = 'api' AND duration_nano > 1000000",
|
||||
},
|
||||
Aggregations: []qbtypes.TraceAggregation{
|
||||
{Expression: "count()", Alias: "total"},
|
||||
},
|
||||
GroupBy: []qbtypes.GroupByKey{
|
||||
{TelemetryFieldKey: telemetrytypes.TelemetryFieldKey{
|
||||
Name: "service.name",
|
||||
FieldContext: telemetrytypes.FieldContextResource,
|
||||
}},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: Formula Query
|
||||
|
||||
```go
|
||||
req := qbtypes.QueryRangeRequest{
|
||||
Start: startMs,
|
||||
End: endMs,
|
||||
RequestType: qbtypes.RequestTypeTimeSeries,
|
||||
CompositeQuery: qbtypes.CompositeQuery{
|
||||
Queries: []qbtypes.QueryEnvelope{
|
||||
{
|
||||
Type: qbtypes.QueryTypeBuilder,
|
||||
Spec: qbtypes.QueryBuilderQuery[qbtypes.MetricAggregation]{
|
||||
Name: "A",
|
||||
// ... query A definition
|
||||
},
|
||||
},
|
||||
{
|
||||
Type: qbtypes.QueryTypeBuilder,
|
||||
Spec: qbtypes.QueryBuilderQuery[qbtypes.MetricAggregation]{
|
||||
Name: "B",
|
||||
// ... query B definition
|
||||
},
|
||||
},
|
||||
{
|
||||
Type: qbtypes.QueryTypeFormula,
|
||||
Spec: qbtypes.QueryBuilderFormula{
|
||||
Name: "C",
|
||||
Expression: "A / B * 100",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- `pkg/querier/querier_test.go` - Querier tests
|
||||
- `pkg/querier/builder_query_test.go` - Builder query tests
|
||||
- `pkg/querier/formula_query_test.go` - Formula query tests
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- `tests/integration/` - End-to-end API tests
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Run all querier tests
|
||||
go test ./pkg/querier/...
|
||||
|
||||
# Run with verbose output
|
||||
go test -v ./pkg/querier/...
|
||||
|
||||
# Run specific test
|
||||
go test -v ./pkg/querier/ -run TestQueryRange
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging
|
||||
|
||||
### Enable Debug Logging
|
||||
|
||||
```go
|
||||
// In querier.go
|
||||
q.logger.DebugContext(ctx, "Executing query",
|
||||
"query", queryName,
|
||||
"start", start,
|
||||
"end", end)
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Query Not Found**: Check query name matches in CompositeQuery
|
||||
2. **SQL Errors**: Check generated SQL in logs, verify ClickHouse syntax
|
||||
3. **Performance**: Check execution statistics, optimize time ranges
|
||||
4. **Cache Issues**: Set `NoCache: true` to bypass cache
|
||||
5. **Formula Errors**: Check formula expression syntax and referenced query names
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
### Key Files
|
||||
|
||||
- `pkg/querier/querier.go` - Main query orchestration
|
||||
- `pkg/querier/builder_query.go` - Builder query execution
|
||||
- `pkg/types/querybuildertypes/querybuildertypesv5/` - Request/response models
|
||||
- `pkg/telemetrystore/` - ClickHouse interface
|
||||
- `pkg/telemetrymetadata/` - Metadata store
|
||||
|
||||
### Signal-Specific Documentation
|
||||
|
||||
- [Traces Module](./TRACES_MODULE.md) - Trace-specific details
|
||||
- Logs module documentation (when available)
|
||||
- Metrics module documentation (when available)
|
||||
|
||||
### Related Documentation
|
||||
|
||||
- [ClickHouse Documentation](https://clickhouse.com/docs)
|
||||
- [PromQL Documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/)
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
||||
When contributing to the Query Range API:
|
||||
|
||||
1. **Follow Existing Patterns**: Match the style of existing query types
|
||||
2. **Add Tests**: Include unit tests for new functionality
|
||||
3. **Update Documentation**: Update this doc for significant changes
|
||||
4. **Consider Performance**: Optimize queries and use caching appropriately
|
||||
5. **Handle Errors**: Provide meaningful error messages
|
||||
|
||||
For questions or help, reach out to the maintainers or open an issue.
|
||||
Reference in New Issue
Block a user