mirror of
https://github.com/SigNoz/signoz.git
synced 2026-02-07 10:22:12 +00:00
Compare commits
2 Commits
test/uplot
...
ns/claude-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
08c53fe7e8 | ||
|
|
c1fac00d2e |
136
.claude/CLAUDE.md
Normal file
136
.claude/CLAUDE.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
SigNoz is an open-source observability platform (APM, logs, metrics, traces) built on OpenTelemetry and ClickHouse. It provides a unified solution for monitoring applications with features including distributed tracing, log management, metrics dashboards, and alerting.
|
||||
|
||||
## Build and Development Commands
|
||||
|
||||
### Development Environment Setup
|
||||
```bash
|
||||
make devenv-up # Start ClickHouse and OTel Collector for local dev
|
||||
make devenv-clickhouse # Start only ClickHouse
|
||||
make devenv-signoz-otel-collector # Start only OTel Collector
|
||||
make devenv-clickhouse-clean # Clean ClickHouse data
|
||||
```
|
||||
|
||||
### Backend (Go)
|
||||
```bash
|
||||
make go-run-community # Run community backend server
|
||||
make go-run-enterprise # Run enterprise backend server
|
||||
make go-test # Run all Go unit tests
|
||||
go test -race ./pkg/... # Run tests for specific package
|
||||
go test -race ./pkg/querier/... # Example: run querier tests
|
||||
```
|
||||
|
||||
### Integration Tests (Python)
|
||||
```bash
|
||||
cd tests/integration
|
||||
uv sync # Install dependencies
|
||||
make py-test-setup # Start test environment (keep running with --reuse)
|
||||
make py-test # Run all integration tests
|
||||
make py-test-teardown # Stop test environment
|
||||
|
||||
# Run specific test
|
||||
uv run pytest --basetemp=./tmp/ -vv --reuse src/<suite>/<file>.py::test_name
|
||||
```
|
||||
|
||||
### Code Quality
|
||||
```bash
|
||||
# Go linting (golangci-lint)
|
||||
golangci-lint run
|
||||
|
||||
# Python formatting/linting
|
||||
make py-fmt # Format with black
|
||||
make py-lint # Run isort, autoflake, pylint
|
||||
```
|
||||
|
||||
### OpenAPI Generation
|
||||
```bash
|
||||
go run cmd/enterprise/*.go generate openapi
|
||||
```
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Backend Structure
|
||||
|
||||
The Go backend follows a **provider pattern** for dependency injection:
|
||||
|
||||
- **`pkg/signoz/`** - IoC container that wires all providers together
|
||||
- **`pkg/modules/`** - Business logic modules (user, organization, dashboard, etc.)
|
||||
- **`pkg/<provider>/`** - Provider implementations following consistent structure:
|
||||
- `<name>.go` - Interface definition
|
||||
- `config.go` - Configuration (implements `factory.Config`)
|
||||
- `<implname><name>/provider.go` - Implementation
|
||||
- `<name>test/` - Mock implementations for testing
|
||||
|
||||
### Key Packages
|
||||
- **`pkg/querier/`** - Query engine for telemetry data (logs, traces, metrics)
|
||||
- **`pkg/telemetrystore/`** - ClickHouse telemetry storage interface
|
||||
- **`pkg/sqlstore/`** - Relational database (SQLite/PostgreSQL) for metadata
|
||||
- **`pkg/apiserver/`** - HTTP API server with OpenAPI integration
|
||||
- **`pkg/alertmanager/`** - Alert management
|
||||
- **`pkg/authn/`, `pkg/authz/`** - Authentication and authorization
|
||||
- **`pkg/flagger/`** - Feature flags (OpenFeature-based)
|
||||
- **`pkg/errors/`** - Structured error handling
|
||||
|
||||
### Enterprise vs Community
|
||||
- **`cmd/community/`** - Community edition entry point
|
||||
- **`cmd/enterprise/`** - Enterprise edition entry point
|
||||
- **`ee/`** - Enterprise-only features
|
||||
|
||||
## Code Conventions
|
||||
|
||||
### Error Handling
|
||||
Use the custom `pkg/errors` package instead of standard library:
|
||||
```go
|
||||
errors.New(typ, code, message) // Instead of errors.New()
|
||||
errors.Newf(typ, code, message, args...) // Instead of fmt.Errorf()
|
||||
errors.Wrapf(err, typ, code, msg) // Wrap with context
|
||||
```
|
||||
|
||||
Define domain-specific error codes:
|
||||
```go
|
||||
var CodeThingNotFound = errors.MustNewCode("thing_not_found")
|
||||
```
|
||||
|
||||
### HTTP Handlers
|
||||
Handlers are thin adapters in modules that:
|
||||
1. Extract auth context from request
|
||||
2. Decode request body using `binding` package
|
||||
3. Call module functions
|
||||
4. Return responses using `render` package
|
||||
|
||||
Register routes in `pkg/apiserver/signozapiserver/` with `handler.New()` and `OpenAPIDef`.
|
||||
|
||||
### SQL/Database
|
||||
- Use Bun ORM via `sqlstore.BunDBCtx(ctx)`
|
||||
- Star schema with `organizations` as central entity
|
||||
- All tables have `id`, `created_at`, `updated_at`, `org_id` columns
|
||||
- Write idempotent migrations in `pkg/sqlmigration/`
|
||||
- No `ON CASCADE` deletes - handle in application logic
|
||||
|
||||
### REST Endpoints
|
||||
- Use plural resource names: `/v1/organizations`, `/v1/users`
|
||||
- Use `me` for current user/org: `/v1/organizations/me/users`
|
||||
- Follow RESTful conventions for CRUD operations
|
||||
|
||||
### Linting Rules (from .golangci.yml)
|
||||
- Don't use `errors` package - use `pkg/errors`
|
||||
- Don't use `zap` logger - use `slog`
|
||||
- Don't use `fmt.Errorf` or `fmt.Print*`
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
- Run with race detector: `go test -race ./...`
|
||||
- Provider mocks are in `<provider>test/` packages
|
||||
|
||||
### Integration Tests
|
||||
- Located in `tests/integration/`
|
||||
- Use pytest with testcontainers
|
||||
- Files prefixed with numbers for execution order (e.g., `01_database.py`)
|
||||
- Always use `--reuse` flag during development
|
||||
- Fixtures in `tests/integration/fixtures/`
|
||||
36
.claude/skills/commit/SKILL.md
Normal file
36
.claude/skills/commit/SKILL.md
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
name: commit
|
||||
description: Create a conventional commit with staged changes
|
||||
disable-model-invocation: true
|
||||
allowed-tools: Bash(git:*)
|
||||
---
|
||||
|
||||
# Create Conventional Commit
|
||||
|
||||
Commit staged changes using conventional commit format: `type(scope): description`
|
||||
|
||||
## Types
|
||||
|
||||
- `feat:` - New feature
|
||||
- `fix:` - Bug fix
|
||||
- `chore:` - Maintenance/refactor/tooling
|
||||
- `test:` - Tests only
|
||||
- `docs:` - Documentation
|
||||
|
||||
## Process
|
||||
|
||||
1. Review staged changes: `git diff --cached`
|
||||
2. Determine type, optional scope, and description (imperative, <70 chars)
|
||||
3. Commit using HEREDOC:
|
||||
```bash
|
||||
git commit -m "$(cat <<'EOF'
|
||||
type(scope): description
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
4. Verify: `git log -1`
|
||||
|
||||
## Notes
|
||||
|
||||
- Description: imperative mood, lowercase, no period
|
||||
- Body: explain WHY, not WHAT (code shows what)
|
||||
55
.claude/skills/raise-pr/SKILL.md
Normal file
55
.claude/skills/raise-pr/SKILL.md
Normal file
@@ -0,0 +1,55 @@
|
||||
---
|
||||
name: raise-pr
|
||||
description: Create a pull request with auto-filled template. Pass 'commit' to commit staged changes first.
|
||||
disable-model-invocation: true
|
||||
allowed-tools: Bash(gh:*, git:*), Read
|
||||
argument-hint: [commit?]
|
||||
---
|
||||
|
||||
# Raise Pull Request
|
||||
|
||||
Create a PR with auto-filled template from commits after origin/main.
|
||||
|
||||
## Arguments
|
||||
|
||||
- No argument: Create PR with existing commits
|
||||
- `commit`: Commit staged changes first, then create PR
|
||||
|
||||
## Process
|
||||
|
||||
1. **If `$ARGUMENTS` is "commit"**: Review staged changes and commit with descriptive message
|
||||
- Check for staged changes: `git diff --cached --stat`
|
||||
- If changes exist:
|
||||
- Review the changes: `git diff --cached`
|
||||
- Create a short and clear commit message based on the changes
|
||||
- Commit command: `git commit -m "message"`
|
||||
|
||||
2. **Analyze commits since origin/main**:
|
||||
- `git log origin/main..HEAD --pretty=format:"%s%n%b"` - get commit messages
|
||||
- `git diff origin/main...HEAD --stat` - see changes
|
||||
|
||||
3. **Read template**: `.github/pull_request_template.md`
|
||||
|
||||
4. **Generate PR**:
|
||||
- **Title**: Short (<70 chars), from commit messages or main change
|
||||
- **Body**: Fill template sections based on commits/changes:
|
||||
- Summary (why/what/approach) - end with "Closes #<issue_number>" if issue number is available from branch name (git branch --show-current)
|
||||
- Change Type checkboxes
|
||||
- Bug Context (if applicable)
|
||||
- Testing Strategy
|
||||
- Risk Assessment
|
||||
- Changelog (if user-facing)
|
||||
- Checklist
|
||||
|
||||
5. **Create PR**:
|
||||
```bash
|
||||
git push -u origin $(git branch --show-current)
|
||||
gh pr create --base main --title "..." --body "..."
|
||||
gh pr view
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Analyze ALL commits messages from origin/main to HEAD
|
||||
- Fill template sections based on code analysis
|
||||
- Leave the sections of PR template as it is if you can't determine
|
||||
292
docs/implementation/EXTERNAL_API_MONITORING.md
Normal file
292
docs/implementation/EXTERNAL_API_MONITORING.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# External API Monitoring - Developer Guide
|
||||
|
||||
## Overview
|
||||
|
||||
External API Monitoring tracks outbound HTTP calls from your services to external APIs. It groups spans by domain (e.g., `api.example.com`) and displays metrics like endpoint count, request rate, error rate, latency, and last seen time.
|
||||
|
||||
**Key Requirement**: Spans must have `kind_string = 'Client'` and either `http.url`/`url.full` AND `net.peer.name`/`server.address` attributes.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Flow
|
||||
|
||||
```
|
||||
Frontend (DomainList)
|
||||
→ useListOverview hook
|
||||
→ POST /api/v1/third-party-apis/overview/list
|
||||
→ getDomainList handler
|
||||
→ BuildDomainList (7 queries)
|
||||
→ QueryRange (ClickHouse)
|
||||
→ Post-processing (merge semconv, filter IPs)
|
||||
→ formatDataForTable
|
||||
→ UI Display
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key APIs
|
||||
|
||||
### 1. Domain List API
|
||||
|
||||
**Endpoint**: `POST /api/v1/third-party-apis/overview/list`
|
||||
|
||||
**Request**:
|
||||
```json
|
||||
{
|
||||
"start": 1699123456789, // Unix timestamp (ms)
|
||||
"end": 1699127056789,
|
||||
"show_ip": false, // Filter IP addresses
|
||||
"filter": {
|
||||
"expression": "kind_string = 'Client' AND service.name = 'api'"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response**: Table with columns:
|
||||
- `net.peer.name` (domain name)
|
||||
- `endpoints` (count_distinct with fallback: http.url or url.full)
|
||||
- `rps` (rate())
|
||||
- `error_rate` (formula: error/total_span * 100)
|
||||
- `p99` (p99(duration_nano))
|
||||
- `lastseen` (max(timestamp))
|
||||
|
||||
**Handler**: `pkg/query-service/app/http_handler.go::getDomainList()`
|
||||
|
||||
---
|
||||
|
||||
### 2. Domain Info API
|
||||
|
||||
**Endpoint**: `POST /api/v1/third-party-apis/overview/domain`
|
||||
|
||||
**Request**: Same as Domain List, but includes `domain` field
|
||||
|
||||
**Response**: Endpoint-level metrics for a specific domain
|
||||
|
||||
**Handler**: `pkg/query-service/app/http_handler.go::getDomainInfo()`
|
||||
|
||||
---
|
||||
|
||||
## Query Building
|
||||
|
||||
### Location
|
||||
`pkg/modules/thirdpartyapi/translator.go`
|
||||
|
||||
### BuildDomainList() - Creates 7 Sub-queries
|
||||
|
||||
1. **endpoints**: `count_distinct(if(http.url exists, http.url, url.full))` - Unique endpoint count (handles both semconv attributes)
|
||||
2. **lastseen**: `max(timestamp)` - Last access time
|
||||
3. **rps**: `rate()` - Requests per second
|
||||
4. **error**: `count() WHERE has_error = true` - Error count
|
||||
5. **total_span**: `count()` - Total spans (for error rate)
|
||||
6. **p99**: `p99(duration_nano)` - 99th percentile latency
|
||||
7. **error_rate**: Formula `(error/total_span)*100`
|
||||
|
||||
### Base Filter
|
||||
```go
|
||||
"(http.url EXISTS OR url.full EXISTS) AND kind_string = 'Client'"
|
||||
```
|
||||
|
||||
### GroupBy
|
||||
- Groups by `server.address` + `net.peer.name` (dual semconv support)
|
||||
|
||||
---
|
||||
|
||||
## Key Files
|
||||
|
||||
### Frontend
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `frontend/src/container/ApiMonitoring/Explorer/Domains/DomainList.tsx` | Main list view component |
|
||||
| `frontend/src/container/ApiMonitoring/Explorer/Domains/DomainDetails/DomainDetails.tsx` | Domain details drawer |
|
||||
| `frontend/src/hooks/thirdPartyApis/useListOverview.ts` | Data fetching hook |
|
||||
| `frontend/src/api/thirdPartyApis/listOverview.ts` | API client |
|
||||
| `frontend/src/container/ApiMonitoring/utils.tsx` | Utilities (formatting, query building) |
|
||||
|
||||
### Backend
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `pkg/query-service/app/http_handler.go` | API handlers (`getDomainList`, `getDomainInfo`) |
|
||||
| `pkg/modules/thirdpartyapi/translator.go` | Query builder & response processing |
|
||||
| `pkg/types/thirdpartyapitypes/thirdpartyapi.go` | Request/response types |
|
||||
|
||||
---
|
||||
|
||||
## Data Tables
|
||||
|
||||
### Primary Table
|
||||
- **Table**: `signoz_traces.distributed_signoz_index_v3`
|
||||
- **Key Columns**:
|
||||
- `kind_string` - Filter for `'Client'` spans
|
||||
- `duration_nano` - For latency calculations
|
||||
- `has_error` - For error rate
|
||||
- `timestamp` - For last seen
|
||||
- `attributes_string` - Map containing `http.url`, `net.peer.name`, etc.
|
||||
- `resources_string` - Map containing `server.address`, `service.name`, etc.
|
||||
|
||||
### Attribute Access
|
||||
```sql
|
||||
-- Check existence
|
||||
mapContains(attributes_string, 'http.url') = 1
|
||||
|
||||
-- Get value
|
||||
attributes_string['http.url']
|
||||
|
||||
-- Materialized (if exists)
|
||||
attribute_string_http$$url
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Processing
|
||||
|
||||
### 1. MergeSemconvColumns()
|
||||
- Merges `server.address` and `net.peer.name` into single column
|
||||
- Location: `pkg/modules/thirdpartyapi/translator.go:117`
|
||||
|
||||
### 2. FilterIntermediateColumns()
|
||||
- Removes intermediate formula columns from response
|
||||
- Location: `pkg/modules/thirdpartyapi/translator.go:70`
|
||||
|
||||
### 3. FilterResponse()
|
||||
- Filters out IP addresses if `show_ip = false`
|
||||
- Uses `net.ParseIP()` to detect IPs
|
||||
- Location: `pkg/modules/thirdpartyapi/translator.go:214`
|
||||
|
||||
---
|
||||
|
||||
## Required Attributes
|
||||
|
||||
### For Domain Grouping
|
||||
- `net.peer.name` OR `server.address` (required)
|
||||
|
||||
### For Filtering
|
||||
- `http.url` OR `url.full` (required)
|
||||
- `kind_string = 'Client'` (required)
|
||||
|
||||
### Not Required
|
||||
- `http.target` - Not used in external API monitoring
|
||||
|
||||
### Known Bug
|
||||
The `buildEndpointsQuery()` uses `count_distinct(http.url)` but filter allows `url.full`. If spans only have `url.full`, they pass filter but don't contribute to endpoint count.
|
||||
|
||||
**Fix Needed**: Update aggregation to handle both attributes:
|
||||
```go
|
||||
// Current (buggy)
|
||||
{Expression: "count_distinct(http.url)"}
|
||||
|
||||
// Should be
|
||||
{Expression: "count_distinct(coalesce(http.url, url.full))"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Frontend Data Flow
|
||||
|
||||
### 1. Domain List View
|
||||
```
|
||||
DomainList component
|
||||
→ useListOverview({ start, end, show_ip, filter })
|
||||
→ listOverview API call
|
||||
→ formatDataForTable(response)
|
||||
→ Table display
|
||||
```
|
||||
|
||||
### 2. Domain Details View
|
||||
```
|
||||
User clicks domain
|
||||
→ DomainDetails drawer opens
|
||||
→ Multiple queries:
|
||||
- DomainMetrics (overview cards)
|
||||
- AllEndpoints (endpoint table)
|
||||
- TopErrors (error table)
|
||||
- EndPointDetails (when endpoint selected)
|
||||
```
|
||||
|
||||
### 3. Data Formatting
|
||||
- `formatDataForTable()` - Converts API response to table format
|
||||
- Handles `n/a` values, converts nanoseconds to milliseconds
|
||||
- Maps column names to display fields
|
||||
|
||||
---
|
||||
|
||||
## Query Examples
|
||||
|
||||
### Domain List Query
|
||||
```sql
|
||||
SELECT
|
||||
multiIf(
|
||||
mapContains(attributes_string, 'server.address'),
|
||||
attributes_string['server.address'],
|
||||
mapContains(attributes_string, 'net.peer.name'),
|
||||
attributes_string['net.peer.name'],
|
||||
NULL
|
||||
) AS domain,
|
||||
count_distinct(attributes_string['http.url']) AS endpoints,
|
||||
rate() AS rps,
|
||||
p99(duration_nano) AS p99,
|
||||
max(timestamp) AS lastseen
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE
|
||||
(mapContains(attributes_string, 'http.url') = 1
|
||||
OR mapContains(attributes_string, 'url.full') = 1)
|
||||
AND kind_string = 'Client'
|
||||
AND timestamp >= ? AND timestamp < ?
|
||||
GROUP BY domain
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Key Test Files
|
||||
- `frontend/src/container/ApiMonitoring/__tests__/AllEndpointsWidgetV5Migration.test.tsx`
|
||||
- `frontend/src/container/ApiMonitoring/__tests__/EndpointDropdownV5Migration.test.tsx`
|
||||
- `pkg/modules/thirdpartyapi/translator_test.go`
|
||||
|
||||
### Test Scenarios
|
||||
1. Domain filtering with both semconv attributes
|
||||
2. URL handling (http.url vs url.full)
|
||||
3. IP address filtering
|
||||
4. Error rate calculation
|
||||
5. Empty state handling
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Empty State
|
||||
**Symptom**: No domains shown despite data existing
|
||||
|
||||
**Causes**:
|
||||
1. Missing `net.peer.name` or `server.address`
|
||||
2. Missing `http.url` or `url.full`
|
||||
3. Spans not marked as `kind_string = 'Client'`
|
||||
4. Bug: Only `url.full` present but query uses `count_distinct(http.url)`
|
||||
|
||||
### Performance
|
||||
- Queries use `ts_bucket_start` for time partitioning
|
||||
- Resource filtering uses separate `distributed_traces_v3_resource` table
|
||||
- Materialized columns improve performance for common attributes
|
||||
|
||||
---
|
||||
|
||||
## Quick Start Checklist
|
||||
|
||||
- [ ] Understand trace table schema (`signoz_index_v3`)
|
||||
- [ ] Review `BuildDomainList()` in `translator.go`
|
||||
- [ ] Check `getDomainList()` handler in `http_handler.go`
|
||||
- [ ] Review frontend `DomainList.tsx` component
|
||||
- [ ] Understand semconv attribute mapping (legacy vs current)
|
||||
- [ ] Test with spans that have required attributes
|
||||
- [ ] Review post-processing functions (merge, filter)
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- **Trace Schema**: `pkg/telemetrytraces/field_mapper.go`
|
||||
- **Query Builder**: `pkg/telemetrytraces/statement_builder.go`
|
||||
- **API Routes**: `pkg/query-service/app/http_handler.go:2157`
|
||||
- **Constants**: `pkg/modules/thirdpartyapi/translator.go:14-20`
|
||||
980
docs/implementation/QUERY_RANGE_API.md
Normal file
980
docs/implementation/QUERY_RANGE_API.md
Normal file
@@ -0,0 +1,980 @@
|
||||
# Query Range API (V5) - Developer Guide
|
||||
|
||||
This document provides a comprehensive guide to the Query Range API (V5), which is the primary query endpoint for traces, logs, and metrics in SigNoz. It covers architecture, request/response models, code flows, and implementation details.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [API Endpoint](#api-endpoint)
|
||||
3. [Request/Response Models](#requestresponse-models)
|
||||
4. [Query Types](#query-types)
|
||||
5. [Request Types](#request-types)
|
||||
6. [Code Flow](#code-flow)
|
||||
7. [Key Components](#key-components)
|
||||
8. [Query Execution](#query-execution)
|
||||
9. [Caching](#caching)
|
||||
10. [Result Processing](#result-processing)
|
||||
11. [Performance Considerations](#performance-considerations)
|
||||
12. [Extending the API](#extending-the-api)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Query Range API (V5) is the unified query endpoint for all telemetry signals (traces, logs, metrics) in SigNoz. It provides:
|
||||
|
||||
- **Unified Interface**: Single endpoint for all signal types
|
||||
- **Query Builder**: Visual query builder support
|
||||
- **Multiple Query Types**: Builder queries, PromQL, ClickHouse SQL, Formulas, Trace Operators
|
||||
- **Flexible Response Types**: Time series, scalar, raw data, trace-specific
|
||||
- **Advanced Features**: Aggregations, filters, group by, ordering, pagination
|
||||
- **Caching**: Intelligent caching for performance
|
||||
|
||||
### Key Technologies
|
||||
|
||||
- **Backend**: Go (Golang)
|
||||
- **Storage**: ClickHouse (columnar database)
|
||||
- **Query Language**: Custom query builder + PromQL + ClickHouse SQL
|
||||
- **Protocol**: HTTP/REST API
|
||||
|
||||
---
|
||||
|
||||
## API Endpoint
|
||||
|
||||
### Endpoint Details
|
||||
|
||||
**URL**: `POST /api/v5/query_range`
|
||||
|
||||
**Handler**: `QuerierAPI.QueryRange` → `querier.QueryRange`
|
||||
|
||||
**Location**:
|
||||
- Handler: `pkg/querier/querier.go:122`
|
||||
- Route Registration: `pkg/query-service/app/http_handler.go:480`
|
||||
|
||||
**Authentication**: Requires ViewAccess permission
|
||||
|
||||
**Content-Type**: `application/json`
|
||||
|
||||
### Request Flow
|
||||
|
||||
```
|
||||
HTTP Request (POST /api/v5/query_range)
|
||||
↓
|
||||
HTTP Handler (QuerierAPI.QueryRange)
|
||||
↓
|
||||
Querier.QueryRange (pkg/querier/querier.go)
|
||||
↓
|
||||
Query Execution (Statement Builders → ClickHouse)
|
||||
↓
|
||||
Result Processing & Merging
|
||||
↓
|
||||
HTTP Response (QueryRangeResponse)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Request/Response Models
|
||||
|
||||
### Request Model
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/req.go`
|
||||
|
||||
```go
|
||||
type QueryRangeRequest struct {
|
||||
Start uint64 // Start timestamp (milliseconds)
|
||||
End uint64 // End timestamp (milliseconds)
|
||||
RequestType RequestType // Response type (TimeSeries, Scalar, Raw, Trace)
|
||||
Variables map[string]VariableItem // Template variables
|
||||
CompositeQuery CompositeQuery // Container for queries
|
||||
NoCache bool // Skip cache flag
|
||||
}
|
||||
```
|
||||
|
||||
### Composite Query
|
||||
|
||||
```go
|
||||
type CompositeQuery struct {
|
||||
Queries []QueryEnvelope // Array of queries to execute
|
||||
}
|
||||
```
|
||||
|
||||
### Query Envelope
|
||||
|
||||
```go
|
||||
type QueryEnvelope struct {
|
||||
Type QueryType // Query type (Builder, PromQL, ClickHouseSQL, Formula, TraceOperator)
|
||||
Spec any // Query specification (type-specific)
|
||||
}
|
||||
```
|
||||
|
||||
### Response Model
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/req.go`
|
||||
|
||||
```go
|
||||
type QueryRangeResponse struct {
|
||||
Type RequestType // Response type
|
||||
Data QueryData // Query results
|
||||
Meta ExecStats // Execution statistics
|
||||
Warning *QueryWarnData // Warnings (if any)
|
||||
QBEvent *QBEvent // Query builder event metadata
|
||||
}
|
||||
|
||||
type QueryData struct {
|
||||
Results []any // Array of result objects (type depends on RequestType)
|
||||
}
|
||||
|
||||
type ExecStats struct {
|
||||
RowsScanned uint64 // Total rows scanned
|
||||
BytesScanned uint64 // Total bytes scanned
|
||||
DurationMS uint64 // Query duration in milliseconds
|
||||
StepIntervals map[string]uint64 // Step intervals per query
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Types
|
||||
|
||||
The API supports multiple query types, each with its own specification format.
|
||||
|
||||
### 1. Builder Query (`QueryTypeBuilder`)
|
||||
|
||||
Visual query builder queries. Supports traces, logs, and metrics.
|
||||
|
||||
**Spec Type**: `QueryBuilderQuery[T]` where T is:
|
||||
- `TraceAggregation` for traces
|
||||
- `LogAggregation` for logs
|
||||
- `MetricAggregation` for metrics
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
QueryBuilderQuery[TraceAggregation] {
|
||||
Name: "query_name",
|
||||
Signal: SignalTraces,
|
||||
Filter: &Filter {
|
||||
Expression: "service.name = 'api' AND duration_nano > 1000000",
|
||||
},
|
||||
Aggregations: []TraceAggregation {
|
||||
{Expression: "count()", Alias: "total"},
|
||||
{Expression: "avg(duration_nano)", Alias: "avg_duration"},
|
||||
},
|
||||
GroupBy: []GroupByKey {...},
|
||||
Order: []OrderBy {...},
|
||||
Limit: 100,
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**:
|
||||
- Traces: `pkg/telemetrytraces/statement_builder.go`
|
||||
- Logs: `pkg/telemetrylogs/statement_builder.go`
|
||||
- Metrics: `pkg/telemetrymetrics/statement_builder.go`
|
||||
|
||||
### 2. PromQL Query (`QueryTypePromQL`)
|
||||
|
||||
Prometheus Query Language queries for metrics.
|
||||
|
||||
**Spec Type**: `PromQuery`
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
PromQuery {
|
||||
Query: "rate(http_requests_total[5m])",
|
||||
Step: Step{Duration: time.Minute},
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**: `pkg/querier/promql_query.go`
|
||||
|
||||
### 3. ClickHouse SQL Query (`QueryTypeClickHouseSQL`)
|
||||
|
||||
Direct ClickHouse SQL queries.
|
||||
|
||||
**Spec Type**: `ClickHouseQuery`
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
ClickHouseQuery {
|
||||
Query: "SELECT count() FROM signoz_traces.distributed_signoz_index_v3 WHERE ...",
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**: `pkg/querier/ch_sql_query.go`
|
||||
|
||||
### 4. Formula Query (`QueryTypeFormula`)
|
||||
|
||||
Mathematical formulas combining other queries.
|
||||
|
||||
**Spec Type**: `QueryBuilderFormula`
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
QueryBuilderFormula {
|
||||
Expression: "A / B * 100", // A and B are query names
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**: `pkg/querier/formula_query.go`
|
||||
|
||||
### 5. Trace Operator Query (`QueryTypeTraceOperator`)
|
||||
|
||||
Set operations on trace queries (AND, OR, NOT).
|
||||
|
||||
**Spec Type**: `QueryBuilderTraceOperator`
|
||||
|
||||
**Example**:
|
||||
```go
|
||||
QueryBuilderTraceOperator {
|
||||
Expression: "A AND B", // A and B are query names
|
||||
Filter: &Filter {...},
|
||||
}
|
||||
```
|
||||
|
||||
**Key Files**:
|
||||
- `pkg/telemetrytraces/trace_operator_statement_builder.go`
|
||||
- `pkg/querier/trace_operator_query.go`
|
||||
|
||||
---
|
||||
|
||||
## Request Types
|
||||
|
||||
The `RequestType` determines the format of the response data.
|
||||
|
||||
### 1. `RequestTypeTimeSeries`
|
||||
|
||||
Returns time series data for charts.
|
||||
|
||||
**Response Format**: `TimeSeriesData`
|
||||
|
||||
```go
|
||||
type TimeSeriesData struct {
|
||||
QueryName string
|
||||
Aggregations []AggregationBucket
|
||||
}
|
||||
|
||||
type AggregationBucket struct {
|
||||
Index int
|
||||
Series []TimeSeries
|
||||
Alias string
|
||||
Meta AggregationMeta
|
||||
}
|
||||
|
||||
type TimeSeries struct {
|
||||
Labels map[string]string
|
||||
Values []TimeSeriesValue
|
||||
}
|
||||
|
||||
type TimeSeriesValue struct {
|
||||
Timestamp int64
|
||||
Value float64
|
||||
}
|
||||
```
|
||||
|
||||
**Use Case**: Line charts, bar charts, area charts
|
||||
|
||||
### 2. `RequestTypeScalar`
|
||||
|
||||
Returns a single scalar value.
|
||||
|
||||
**Response Format**: `ScalarData`
|
||||
|
||||
```go
|
||||
type ScalarData struct {
|
||||
QueryName string
|
||||
Data []ScalarValue
|
||||
}
|
||||
|
||||
type ScalarValue struct {
|
||||
Timestamp int64
|
||||
Value float64
|
||||
}
|
||||
```
|
||||
|
||||
**Use Case**: Single value displays, stat panels
|
||||
|
||||
### 3. `RequestTypeRaw`
|
||||
|
||||
Returns raw data rows.
|
||||
|
||||
**Response Format**: `RawData`
|
||||
|
||||
```go
|
||||
type RawData struct {
|
||||
QueryName string
|
||||
Columns []string
|
||||
Rows []RawDataRow
|
||||
}
|
||||
|
||||
type RawDataRow struct {
|
||||
Timestamp time.Time
|
||||
Data map[string]any
|
||||
}
|
||||
```
|
||||
|
||||
**Use Case**: Tables, logs viewer, trace lists
|
||||
|
||||
### 4. `RequestTypeTrace`
|
||||
|
||||
Returns trace-specific data structure.
|
||||
|
||||
**Response Format**: Trace-specific format (see traces documentation)
|
||||
|
||||
**Use Case**: Trace-specific visualizations
|
||||
|
||||
---
|
||||
|
||||
## Code Flow
|
||||
|
||||
### Complete Request Flow
|
||||
|
||||
```
|
||||
1. HTTP Request
|
||||
POST /api/v5/query_range
|
||||
Body: QueryRangeRequest JSON
|
||||
↓
|
||||
2. HTTP Handler
|
||||
QuerierAPI.QueryRange (pkg/querier/querier.go)
|
||||
- Validates request
|
||||
- Extracts organization ID from auth context
|
||||
↓
|
||||
3. Querier.QueryRange (pkg/querier/querier.go:122)
|
||||
- Validates QueryRangeRequest
|
||||
- Processes each query in CompositeQuery.Queries
|
||||
- Identifies dependencies (e.g., trace operators, formulas)
|
||||
- Calculates step intervals
|
||||
- Fetches metric temporality if needed
|
||||
↓
|
||||
4. Query Creation
|
||||
For each QueryEnvelope:
|
||||
|
||||
a. Builder Query:
|
||||
- newBuilderQuery() creates builderQuery instance
|
||||
- Selects appropriate statement builder based on signal:
|
||||
* Traces → traceStmtBuilder
|
||||
* Logs → logStmtBuilder
|
||||
* Metrics → metricStmtBuilder or meterStmtBuilder
|
||||
↓
|
||||
|
||||
b. PromQL Query:
|
||||
- newPromqlQuery() creates promqlQuery instance
|
||||
- Uses Prometheus engine
|
||||
↓
|
||||
|
||||
c. ClickHouse SQL Query:
|
||||
- newchSQLQuery() creates chSQLQuery instance
|
||||
- Direct SQL execution
|
||||
↓
|
||||
|
||||
d. Formula Query:
|
||||
- newFormulaQuery() creates formulaQuery instance
|
||||
- References other queries by name
|
||||
↓
|
||||
|
||||
e. Trace Operator Query:
|
||||
- newTraceOperatorQuery() creates traceOperatorQuery instance
|
||||
- Uses traceOperatorStmtBuilder
|
||||
↓
|
||||
5. Statement Building (for Builder queries)
|
||||
StatementBuilder.Build()
|
||||
- Resolves field keys from metadata store
|
||||
- Builds SQL based on request type:
|
||||
* RequestTypeRaw → buildListQuery()
|
||||
* RequestTypeTimeSeries → buildTimeSeriesQuery()
|
||||
* RequestTypeScalar → buildScalarQuery()
|
||||
* RequestTypeTrace → buildTraceQuery()
|
||||
- Returns SQL statement with arguments
|
||||
↓
|
||||
6. Query Execution
|
||||
Query.Execute()
|
||||
- Executes SQL/query against ClickHouse or Prometheus
|
||||
- Processes results into response format
|
||||
- Returns Result with data and statistics
|
||||
↓
|
||||
7. Caching (if applicable)
|
||||
- Checks bucket cache for time series queries
|
||||
- Executes queries for missing time ranges
|
||||
- Merges cached and fresh results
|
||||
↓
|
||||
8. Result Processing
|
||||
querier.run()
|
||||
- Executes all queries (with dependency resolution)
|
||||
- Collects results and warnings
|
||||
- Merges results from multiple queries
|
||||
↓
|
||||
9. Post-Processing
|
||||
postProcessResults()
|
||||
- Applies formulas if present
|
||||
- Handles variable substitution
|
||||
- Formats results for response
|
||||
↓
|
||||
10. HTTP Response
|
||||
- Returns QueryRangeResponse with results
|
||||
- Includes execution statistics
|
||||
- Includes warnings if any
|
||||
```
|
||||
|
||||
### Key Decision Points
|
||||
|
||||
1. **Query Type Selection**: Based on `QueryEnvelope.Type`
|
||||
2. **Signal Selection**: For builder queries, based on `Signal` field
|
||||
3. **Request Type Handling**: Different SQL generation for different request types
|
||||
4. **Caching Strategy**: Only for time series queries with valid fingerprints
|
||||
5. **Dependency Resolution**: Trace operators and formulas resolve dependencies first
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Querier
|
||||
|
||||
**Location**: `pkg/querier/querier.go`
|
||||
|
||||
**Purpose**: Orchestrates query execution, caching, and result merging
|
||||
|
||||
**Key Methods**:
|
||||
- `QueryRange()`: Main entry point for query execution
|
||||
- `run()`: Executes queries and merges results
|
||||
- `executeWithCache()`: Handles caching logic
|
||||
- `mergeResults()`: Merges cached and fresh results
|
||||
- `postProcessResults()`: Applies formulas and variable substitution
|
||||
|
||||
**Key Features**:
|
||||
- Query orchestration across multiple query types
|
||||
- Intelligent caching with bucket-based strategy
|
||||
- Result merging from multiple queries
|
||||
- Formula evaluation
|
||||
- Time range optimization
|
||||
- Step interval calculation and validation
|
||||
|
||||
### 2. Statement Builder Interface
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/`
|
||||
|
||||
**Purpose**: Converts query builder specifications into executable queries
|
||||
|
||||
**Interface**:
|
||||
```go
|
||||
type StatementBuilder[T any] interface {
|
||||
Build(
|
||||
ctx context.Context,
|
||||
start uint64,
|
||||
end uint64,
|
||||
requestType RequestType,
|
||||
query QueryBuilderQuery[T],
|
||||
variables map[string]VariableItem,
|
||||
) (*Statement, error)
|
||||
}
|
||||
```
|
||||
|
||||
**Implementations**:
|
||||
- `traceQueryStatementBuilder` - Traces (`pkg/telemetrytraces/statement_builder.go`)
|
||||
- `logQueryStatementBuilder` - Logs (`pkg/telemetrylogs/statement_builder.go`)
|
||||
- `metricQueryStatementBuilder` - Metrics (`pkg/telemetrymetrics/statement_builder.go`)
|
||||
|
||||
**Key Features**:
|
||||
- Field resolution via metadata store
|
||||
- SQL generation for different request types
|
||||
- Filter, aggregation, group by, ordering support
|
||||
- Time range optimization
|
||||
|
||||
### 3. Query Interface
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/`
|
||||
|
||||
**Purpose**: Represents an executable query
|
||||
|
||||
**Interface**:
|
||||
```go
|
||||
type Query interface {
|
||||
Execute(ctx context.Context) (*Result, error)
|
||||
Fingerprint() string // For caching
|
||||
Window() (uint64, uint64) // Time range
|
||||
}
|
||||
```
|
||||
|
||||
**Implementations**:
|
||||
- `builderQuery[T]` - Builder queries (`pkg/querier/builder_query.go`)
|
||||
- `promqlQuery` - PromQL queries (`pkg/querier/promql_query.go`)
|
||||
- `chSQLQuery` - ClickHouse SQL queries (`pkg/querier/ch_sql_query.go`)
|
||||
- `formulaQuery` - Formula queries (`pkg/querier/formula_query.go`)
|
||||
- `traceOperatorQuery` - Trace operator queries (`pkg/querier/trace_operator_query.go`)
|
||||
|
||||
### 4. Telemetry Store
|
||||
|
||||
**Location**: `pkg/telemetrystore/`
|
||||
|
||||
**Purpose**: Abstraction layer for ClickHouse database access
|
||||
|
||||
**Key Methods**:
|
||||
- `Query()`: Execute SQL query
|
||||
- `QueryRow()`: Execute query returning single row
|
||||
- `Select()`: Execute query returning multiple rows
|
||||
|
||||
**Implementation**: `clickhouseTelemetryStore` (`pkg/telemetrystore/clickhousetelemetrystore/`)
|
||||
|
||||
### 5. Metadata Store
|
||||
|
||||
**Location**: `pkg/types/telemetrytypes/`
|
||||
|
||||
**Purpose**: Provides metadata about available fields, keys, and attributes
|
||||
|
||||
**Key Methods**:
|
||||
- `GetKeysMulti()`: Get field keys for multiple selectors
|
||||
- `FetchTemporalityMulti()`: Get metric temporality information
|
||||
|
||||
**Implementation**: `telemetryMetadataStore` (`pkg/telemetrymetadata/`)
|
||||
|
||||
### 6. Bucket Cache
|
||||
|
||||
**Location**: `pkg/querier/`
|
||||
|
||||
**Purpose**: Caches query results by time buckets for performance
|
||||
|
||||
**Key Methods**:
|
||||
- `GetMissRanges()`: Get time ranges not in cache
|
||||
- `Put()`: Store query result in cache
|
||||
|
||||
**Features**:
|
||||
- Bucket-based caching (aligned to step intervals)
|
||||
- Automatic cache invalidation
|
||||
- Parallel query execution for missing ranges
|
||||
|
||||
---
|
||||
|
||||
## Query Execution
|
||||
|
||||
### Builder Query Execution
|
||||
|
||||
**Location**: `pkg/querier/builder_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Statement builder generates SQL
|
||||
2. SQL executed against ClickHouse via TelemetryStore
|
||||
3. Results processed based on RequestType:
|
||||
- TimeSeries: Grouped by time buckets and labels
|
||||
- Scalar: Single value extraction
|
||||
- Raw: Row-by-row processing
|
||||
4. Statistics collected (rows scanned, bytes scanned, duration)
|
||||
|
||||
### PromQL Query Execution
|
||||
|
||||
**Location**: `pkg/querier/promql_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Query parsed by Prometheus engine
|
||||
2. Executed against Prometheus-compatible data
|
||||
3. Results converted to QueryRangeResponse format
|
||||
|
||||
### ClickHouse SQL Query Execution
|
||||
|
||||
**Location**: `pkg/querier/ch_sql_query.go`
|
||||
|
||||
**Process**:
|
||||
1. SQL query executed directly
|
||||
2. Results processed based on RequestType
|
||||
3. Variable substitution applied
|
||||
|
||||
### Formula Query Execution
|
||||
|
||||
**Location**: `pkg/querier/formula_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Referenced queries executed first
|
||||
2. Formula expression evaluated using govaluate
|
||||
3. Results computed from query results
|
||||
|
||||
### Trace Operator Query Execution
|
||||
|
||||
**Location**: `pkg/querier/trace_operator_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Expression parsed to find dependencies
|
||||
2. Referenced queries executed
|
||||
3. Set operations applied (INTERSECT, UNION, EXCEPT)
|
||||
4. Results combined
|
||||
|
||||
---
|
||||
|
||||
## Caching
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
**Location**: `pkg/querier/querier.go:642`
|
||||
|
||||
**When Caching Applies**:
|
||||
- Time series queries only
|
||||
- Queries with valid fingerprints
|
||||
- `NoCache` flag not set
|
||||
|
||||
**How It Works**:
|
||||
1. Query fingerprint generated (includes query structure, filters, time range)
|
||||
2. Cache checked for existing results
|
||||
3. Missing time ranges identified
|
||||
4. Queries executed only for missing ranges (parallel execution)
|
||||
5. Fresh results merged with cached results
|
||||
6. Merged result stored in cache
|
||||
|
||||
### Cache Key Generation
|
||||
|
||||
**Location**: `pkg/querier/builder_query.go:52`
|
||||
|
||||
The fingerprint includes:
|
||||
- Signal type
|
||||
- Source type
|
||||
- Step interval
|
||||
- Aggregations
|
||||
- Filters
|
||||
- Group by fields
|
||||
- Time range (for cache key, not fingerprint)
|
||||
|
||||
### Cache Benefits
|
||||
|
||||
- **Performance**: Avoids re-executing identical queries
|
||||
- **Efficiency**: Only queries missing time ranges
|
||||
- **Parallelism**: Multiple missing ranges queried in parallel
|
||||
|
||||
---
|
||||
|
||||
## Result Processing
|
||||
|
||||
### Result Merging
|
||||
|
||||
**Location**: `pkg/querier/querier.go:795`
|
||||
|
||||
**Process**:
|
||||
1. Results from multiple queries collected
|
||||
2. For time series: Series merged by labels
|
||||
3. For raw data: Rows combined
|
||||
4. Statistics aggregated (rows scanned, bytes scanned, duration)
|
||||
|
||||
### Formula Evaluation
|
||||
|
||||
**Location**: `pkg/querier/formula_query.go`
|
||||
|
||||
**Process**:
|
||||
1. Formula expression parsed
|
||||
2. Referenced query results retrieved
|
||||
3. Expression evaluated using govaluate library
|
||||
4. Result computed and formatted
|
||||
|
||||
### Variable Substitution
|
||||
|
||||
**Location**: `pkg/querier/querier.go`
|
||||
|
||||
**Process**:
|
||||
1. Variables extracted from request
|
||||
2. Variable values substituted in queries
|
||||
3. Applied to filters, aggregations, and other query parts
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Query Optimization
|
||||
|
||||
1. **Time Range Optimization**:
|
||||
- For trace queries with `trace_id` filter, query `trace_summary` first to narrow time range
|
||||
- Use appropriate time ranges to limit data scanned
|
||||
|
||||
2. **Step Interval Calculation**:
|
||||
- Automatic step interval calculation based on time range
|
||||
- Minimum step interval enforcement
|
||||
- Warnings for suboptimal intervals
|
||||
|
||||
3. **Index Usage**:
|
||||
- Queries use time bucket columns (`ts_bucket_start`) for efficient filtering
|
||||
- Proper filter placement for index utilization
|
||||
|
||||
4. **Limit Enforcement**:
|
||||
- Raw data queries should include limits
|
||||
- Pagination support via offset/cursor
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Use Query Builder**: Prefer query builder over raw SQL for better optimization
|
||||
2. **Limit Time Ranges**: Always specify reasonable time ranges
|
||||
3. **Use Aggregations**: For large datasets, use aggregations instead of raw data
|
||||
4. **Cache Awareness**: Be mindful of cache TTLs when testing
|
||||
5. **Parallel Queries**: Multiple independent queries execute in parallel
|
||||
6. **Step Intervals**: Let system calculate optimal step intervals
|
||||
|
||||
### Monitoring
|
||||
|
||||
Execution statistics are included in response:
|
||||
- `RowsScanned`: Total rows scanned
|
||||
- `BytesScanned`: Total bytes scanned
|
||||
- `DurationMS`: Query execution time
|
||||
- `StepIntervals`: Step intervals per query
|
||||
|
||||
---
|
||||
|
||||
## Extending the API
|
||||
|
||||
### Adding a New Query Type
|
||||
|
||||
1. **Define Query Type** (`pkg/types/querybuildertypes/querybuildertypesv5/query.go`):
|
||||
```go
|
||||
const (
|
||||
QueryTypeMyNewType QueryType = "my_new_type"
|
||||
)
|
||||
```
|
||||
|
||||
2. **Define Query Spec**:
|
||||
```go
|
||||
type MyNewQuerySpec struct {
|
||||
Name string
|
||||
// ... your fields
|
||||
}
|
||||
```
|
||||
|
||||
3. **Update QueryEnvelope Unmarshaling** (`pkg/types/querybuildertypes/querybuildertypesv5/query.go`):
|
||||
```go
|
||||
case QueryTypeMyNewType:
|
||||
var spec MyNewQuerySpec
|
||||
if err := UnmarshalJSONWithContext(shadow.Spec, &spec, "my new query spec"); err != nil {
|
||||
return wrapUnmarshalError(err, "invalid my new query spec: %v", err)
|
||||
}
|
||||
q.Spec = spec
|
||||
```
|
||||
|
||||
4. **Implement Query Interface** (`pkg/querier/my_new_query.go`):
|
||||
```go
|
||||
type myNewQuery struct {
|
||||
spec MyNewQuerySpec
|
||||
// ... other fields
|
||||
}
|
||||
|
||||
func (q *myNewQuery) Execute(ctx context.Context) (*qbtypes.Result, error) {
|
||||
// Implementation
|
||||
}
|
||||
|
||||
func (q *myNewQuery) Fingerprint() string {
|
||||
// Generate fingerprint for caching
|
||||
}
|
||||
|
||||
func (q *myNewQuery) Window() (uint64, uint64) {
|
||||
// Return time range
|
||||
}
|
||||
```
|
||||
|
||||
5. **Update Querier** (`pkg/querier/querier.go`):
|
||||
```go
|
||||
case QueryTypeMyNewType:
|
||||
myQuery, ok := query.Spec.(MyNewQuerySpec)
|
||||
if !ok {
|
||||
return nil, errors.NewInvalidInputf(...)
|
||||
}
|
||||
queries[myQuery.Name] = newMyNewQuery(myQuery, ...)
|
||||
```
|
||||
|
||||
### Adding a New Request Type
|
||||
|
||||
1. **Define Request Type** (`pkg/types/querybuildertypes/querybuildertypesv5/req.go`):
|
||||
```go
|
||||
const (
|
||||
RequestTypeMyNewType RequestType = "my_new_type"
|
||||
)
|
||||
```
|
||||
|
||||
2. **Update Statement Builders**: Add handling in `Build()` method
|
||||
3. **Update Query Execution**: Add result processing for new type
|
||||
4. **Update Response Models**: Add response data structure
|
||||
|
||||
### Adding a New Aggregation Function
|
||||
|
||||
1. **Update Aggregation Rewriter** (`pkg/querybuilder/agg_expr_rewriter.go`):
|
||||
```go
|
||||
func (r *aggExprRewriter) RewriteAggregation(expr string) (string, error) {
|
||||
if strings.HasPrefix(expr, "my_function(") {
|
||||
// Parse arguments
|
||||
// Return ClickHouse SQL expression
|
||||
return "myClickHouseFunction(...)", nil
|
||||
}
|
||||
// ... existing functions
|
||||
}
|
||||
```
|
||||
|
||||
2. **Update Documentation**: Document the new function
|
||||
|
||||
---
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Simple Time Series Query
|
||||
|
||||
```go
|
||||
req := qbtypes.QueryRangeRequest{
|
||||
Start: startMs,
|
||||
End: endMs,
|
||||
RequestType: qbtypes.RequestTypeTimeSeries,
|
||||
CompositeQuery: qbtypes.CompositeQuery{
|
||||
Queries: []qbtypes.QueryEnvelope{
|
||||
{
|
||||
Type: qbtypes.QueryTypeBuilder,
|
||||
Spec: qbtypes.QueryBuilderQuery[qbtypes.MetricAggregation]{
|
||||
Name: "A",
|
||||
Signal: telemetrytypes.SignalMetrics,
|
||||
Aggregations: []qbtypes.MetricAggregation{
|
||||
{Expression: "sum(rate)", Alias: "total"},
|
||||
},
|
||||
StepInterval: qbtypes.Step{Duration: time.Minute},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: Query with Filter and Group By
|
||||
|
||||
```go
|
||||
req := qbtypes.QueryRangeRequest{
|
||||
Start: startMs,
|
||||
End: endMs,
|
||||
RequestType: qbtypes.RequestTypeTimeSeries,
|
||||
CompositeQuery: qbtypes.CompositeQuery{
|
||||
Queries: []qbtypes.QueryEnvelope{
|
||||
{
|
||||
Type: qbtypes.QueryTypeBuilder,
|
||||
Spec: qbtypes.QueryBuilderQuery[qbtypes.TraceAggregation]{
|
||||
Name: "A",
|
||||
Signal: telemetrytypes.SignalTraces,
|
||||
Filter: &qbtypes.Filter{
|
||||
Expression: "service.name = 'api' AND duration_nano > 1000000",
|
||||
},
|
||||
Aggregations: []qbtypes.TraceAggregation{
|
||||
{Expression: "count()", Alias: "total"},
|
||||
},
|
||||
GroupBy: []qbtypes.GroupByKey{
|
||||
{TelemetryFieldKey: telemetrytypes.TelemetryFieldKey{
|
||||
Name: "service.name",
|
||||
FieldContext: telemetrytypes.FieldContextResource,
|
||||
}},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: Formula Query
|
||||
|
||||
```go
|
||||
req := qbtypes.QueryRangeRequest{
|
||||
Start: startMs,
|
||||
End: endMs,
|
||||
RequestType: qbtypes.RequestTypeTimeSeries,
|
||||
CompositeQuery: qbtypes.CompositeQuery{
|
||||
Queries: []qbtypes.QueryEnvelope{
|
||||
{
|
||||
Type: qbtypes.QueryTypeBuilder,
|
||||
Spec: qbtypes.QueryBuilderQuery[qbtypes.MetricAggregation]{
|
||||
Name: "A",
|
||||
// ... query A definition
|
||||
},
|
||||
},
|
||||
{
|
||||
Type: qbtypes.QueryTypeBuilder,
|
||||
Spec: qbtypes.QueryBuilderQuery[qbtypes.MetricAggregation]{
|
||||
Name: "B",
|
||||
// ... query B definition
|
||||
},
|
||||
},
|
||||
{
|
||||
Type: qbtypes.QueryTypeFormula,
|
||||
Spec: qbtypes.QueryBuilderFormula{
|
||||
Name: "C",
|
||||
Expression: "A / B * 100",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
|
||||
- `pkg/querier/querier_test.go` - Querier tests
|
||||
- `pkg/querier/builder_query_test.go` - Builder query tests
|
||||
- `pkg/querier/formula_query_test.go` - Formula query tests
|
||||
|
||||
### Integration Tests
|
||||
|
||||
- `tests/integration/` - End-to-end API tests
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# Run all querier tests
|
||||
go test ./pkg/querier/...
|
||||
|
||||
# Run with verbose output
|
||||
go test -v ./pkg/querier/...
|
||||
|
||||
# Run specific test
|
||||
go test -v ./pkg/querier/ -run TestQueryRange
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging
|
||||
|
||||
### Enable Debug Logging
|
||||
|
||||
```go
|
||||
// In querier.go
|
||||
q.logger.DebugContext(ctx, "Executing query",
|
||||
"query", queryName,
|
||||
"start", start,
|
||||
"end", end)
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Query Not Found**: Check query name matches in CompositeQuery
|
||||
2. **SQL Errors**: Check generated SQL in logs, verify ClickHouse syntax
|
||||
3. **Performance**: Check execution statistics, optimize time ranges
|
||||
4. **Cache Issues**: Set `NoCache: true` to bypass cache
|
||||
5. **Formula Errors**: Check formula expression syntax and referenced query names
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
### Key Files
|
||||
|
||||
- `pkg/querier/querier.go` - Main query orchestration
|
||||
- `pkg/querier/builder_query.go` - Builder query execution
|
||||
- `pkg/types/querybuildertypes/querybuildertypesv5/` - Request/response models
|
||||
- `pkg/telemetrystore/` - ClickHouse interface
|
||||
- `pkg/telemetrymetadata/` - Metadata store
|
||||
|
||||
### Signal-Specific Documentation
|
||||
|
||||
- [Traces Module](./TRACES_MODULE.md) - Trace-specific details
|
||||
- Logs module documentation (when available)
|
||||
- Metrics module documentation (when available)
|
||||
|
||||
### Related Documentation
|
||||
|
||||
- [ClickHouse Documentation](https://clickhouse.com/docs)
|
||||
- [PromQL Documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/)
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
||||
When contributing to the Query Range API:
|
||||
|
||||
1. **Follow Existing Patterns**: Match the style of existing query types
|
||||
2. **Add Tests**: Include unit tests for new functionality
|
||||
3. **Update Documentation**: Update this doc for significant changes
|
||||
4. **Consider Performance**: Optimize queries and use caching appropriately
|
||||
5. **Handle Errors**: Provide meaningful error messages
|
||||
|
||||
For questions or help, reach out to the maintainers or open an issue.
|
||||
185
docs/implementation/SPAN_METRICS_PROCESSOR.md
Normal file
185
docs/implementation/SPAN_METRICS_PROCESSOR.md
Normal file
@@ -0,0 +1,185 @@
|
||||
# SigNoz Span Metrics Processor
|
||||
|
||||
The `signozspanmetricsprocessor` is an OpenTelemetry Collector processor that intercepts trace data to generate RED metrics (Rate, Errors, Duration) from spans.
|
||||
|
||||
**Location:** `signoz-otel-collector/processor/signozspanmetricsprocessor/`
|
||||
|
||||
## Trace Interception
|
||||
|
||||
The processor implements `consumer.Traces` interface and sits in the traces pipeline:
|
||||
|
||||
```go
|
||||
func (p *processorImp) ConsumeTraces(ctx context.Context, traces ptrace.Traces) error {
|
||||
p.lock.Lock()
|
||||
p.aggregateMetrics(traces)
|
||||
p.lock.Unlock()
|
||||
|
||||
return p.tracesConsumer.ConsumeTraces(ctx, traces) // forward unchanged
|
||||
}
|
||||
```
|
||||
|
||||
All traces flow through this method. Metrics are aggregated, then traces are forwarded unmodified to the next consumer.
|
||||
|
||||
## Metrics Generated
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `signoz_latency` | Histogram | Span latency by service/operation/kind/status |
|
||||
| `signoz_calls_total` | Counter | Call count per service/operation/kind/status |
|
||||
| `signoz_db_latency_sum/count` | Counter | DB call latency (spans with `db.system` attribute) |
|
||||
| `signoz_external_call_latency_sum/count` | Counter | External call latency (client spans with remote address) |
|
||||
|
||||
### Dimensions
|
||||
|
||||
All metrics include these base dimensions:
|
||||
- `service.name` - from resource attributes
|
||||
- `operation` - span name
|
||||
- `span.kind` - SPAN_KIND_SERVER, SPAN_KIND_CLIENT, etc.
|
||||
- `status.code` - STATUS_CODE_OK, STATUS_CODE_ERROR, etc.
|
||||
|
||||
Additional dimensions can be configured.
|
||||
|
||||
## Aggregation Flow
|
||||
|
||||
```
|
||||
traces pipeline
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ ConsumeTraces() │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ aggregateMetrics(traces) │
|
||||
│ │ │
|
||||
│ ├── for each ResourceSpan │
|
||||
│ │ extract service.name │
|
||||
│ │ │ │
|
||||
│ │ ├── for each Span │
|
||||
│ │ │ │ │
|
||||
│ │ │ ▼ │
|
||||
│ │ │ aggregateMetricsForSpan() │
|
||||
│ │ │ ├── skip stale spans (>24h) │
|
||||
│ │ │ ├── skip excluded patterns │
|
||||
│ │ │ ├── calculate latency │
|
||||
│ │ │ ├── build metric key │
|
||||
│ │ │ ├── update histograms │
|
||||
│ │ │ └── cache dimensions │
|
||||
│ │ │ │
|
||||
│ ▼ │
|
||||
│ forward traces to next consumer │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Periodic Export
|
||||
|
||||
A background goroutine exports aggregated metrics on a ticker interval:
|
||||
|
||||
```go
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-p.ticker.C:
|
||||
p.exportMetrics(ctx) // build and send to metrics exporter
|
||||
}
|
||||
}
|
||||
}()
|
||||
```
|
||||
|
||||
## Key Design Features
|
||||
|
||||
### 1. Time Bucketing (Delta Temporality)
|
||||
|
||||
For delta temporality, metric keys include a time bucket prefix:
|
||||
|
||||
```go
|
||||
if p.config.GetAggregationTemporality() == pmetric.AggregationTemporalityDelta {
|
||||
p.AddTimeToKeyBuf(span.StartTimestamp().AsTime()) // truncated to interval
|
||||
}
|
||||
```
|
||||
|
||||
- Spans are grouped by time bucket (default: 1 minute)
|
||||
- After export, buckets are reset
|
||||
- Memory-efficient for high-cardinality data
|
||||
|
||||
### 2. LRU Dimension Caching
|
||||
|
||||
Dimension key-value maps are cached to avoid rebuilding:
|
||||
|
||||
```go
|
||||
if _, has := p.metricKeyToDimensions.Get(k); !has {
|
||||
p.metricKeyToDimensions.Add(k, p.buildDimensionKVs(...))
|
||||
}
|
||||
```
|
||||
|
||||
- Configurable cache size (`DimensionsCacheSize`)
|
||||
- Evicted keys also removed from histograms
|
||||
|
||||
### 3. Cardinality Protection
|
||||
|
||||
Prevents memory explosion from high cardinality:
|
||||
|
||||
```go
|
||||
if len(p.serviceToOperations) > p.maxNumberOfServicesToTrack {
|
||||
serviceName = "overflow_service"
|
||||
}
|
||||
if len(p.serviceToOperations[serviceName]) > p.maxNumberOfOperationsToTrackPerService {
|
||||
spanName = "overflow_operation"
|
||||
}
|
||||
```
|
||||
|
||||
Excess services/operations are aggregated into overflow buckets.
|
||||
|
||||
### 4. Exemplars
|
||||
|
||||
Trace/span IDs attached to histogram samples for metric-to-trace correlation:
|
||||
|
||||
```go
|
||||
histo.exemplarsData = append(histo.exemplarsData, exemplarData{
|
||||
traceID: traceID,
|
||||
spanID: spanID,
|
||||
value: latency,
|
||||
})
|
||||
```
|
||||
|
||||
Enables "show me a trace that caused this latency spike" in UI.
|
||||
|
||||
## Configuration Options
|
||||
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `metrics_exporter` | Target exporter for generated metrics | required |
|
||||
| `latency_histogram_buckets` | Custom histogram bucket boundaries | 2,4,6,8,10,50,100,200,400,800,1000,1400,2000,5000,10000,15000 ms |
|
||||
| `dimensions` | Additional span/resource attributes to include | [] |
|
||||
| `dimensions_cache_size` | LRU cache size for dimension maps | 1000 |
|
||||
| `aggregation_temporality` | cumulative or delta | cumulative |
|
||||
| `time_bucket_interval` | Bucket interval for delta temporality | 1m |
|
||||
| `skip_spans_older_than` | Skip stale spans | 24h |
|
||||
| `max_services_to_track` | Cardinality limit for services | - |
|
||||
| `max_operations_to_track_per_service` | Cardinality limit for operations | - |
|
||||
| `exclude_patterns` | Regex patterns to skip spans | [] |
|
||||
|
||||
## Pipeline Configuration Example
|
||||
|
||||
```yaml
|
||||
processors:
|
||||
signozspanmetrics:
|
||||
metrics_exporter: clickhousemetricswrite
|
||||
latency_histogram_buckets: [2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms]
|
||||
dimensions:
|
||||
- name: http.method
|
||||
- name: http.status_code
|
||||
dimensions_cache_size: 10000
|
||||
aggregation_temporality: delta
|
||||
|
||||
pipelines:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [signozspanmetrics, batch]
|
||||
exporters: [clickhousetraces]
|
||||
|
||||
metrics:
|
||||
receivers: [otlp]
|
||||
exporters: [clickhousemetricswrite]
|
||||
```
|
||||
|
||||
The processor sits in the traces pipeline but exports to a metrics pipeline exporter.
|
||||
832
docs/implementation/TRACES_MODULE.md
Normal file
832
docs/implementation/TRACES_MODULE.md
Normal file
@@ -0,0 +1,832 @@
|
||||
# SigNoz Traces Module - Developer Guide
|
||||
|
||||
This document provides a comprehensive guide to understanding and contributing to the traces module in SigNoz. It covers architecture, APIs, code flows, and implementation details.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Architecture](#architecture)
|
||||
3. [Data Models](#data-models)
|
||||
4. [API Endpoints](#api-endpoints)
|
||||
5. [Code Flows](#code-flows)
|
||||
6. [Key Components](#key-components)
|
||||
7. [Query Building System](#query-building-system)
|
||||
8. [Storage Schema](#storage-schema)
|
||||
9. [Extending the Traces Module](#extending-the-traces-module)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The traces module in SigNoz handles distributed tracing data from OpenTelemetry. It provides:
|
||||
|
||||
- **Ingestion**: Receives traces via OpenTelemetry Collector
|
||||
- **Storage**: Stores traces in ClickHouse
|
||||
- **Querying**: Supports complex queries with filters, aggregations, and trace operators
|
||||
- **Visualization**: Provides waterfall and flamegraph views
|
||||
- **Trace Funnels**: Advanced analytics for multi-step trace analysis
|
||||
|
||||
### Key Technologies
|
||||
|
||||
- **Backend**: Go (Golang)
|
||||
- **Storage**: ClickHouse (columnar database)
|
||||
- **Protocol**: OpenTelemetry Protocol (OTLP)
|
||||
- **Query Language**: Custom query builder + ClickHouse SQL
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### High-Level Flow
|
||||
|
||||
```
|
||||
Application → OpenTelemetry SDK → OTLP Receiver →
|
||||
[Processors: signozspanmetrics, batch] →
|
||||
ClickHouse Traces Exporter → ClickHouse Database
|
||||
↓
|
||||
Query Service (Go)
|
||||
↓
|
||||
Frontend (React/TypeScript)
|
||||
```
|
||||
|
||||
### Component Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Frontend (React) │
|
||||
│ - TracesExplorer │
|
||||
│ - TraceDetail (Waterfall/Flamegraph) │
|
||||
│ - Query Builder UI │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│ HTTP/REST API
|
||||
┌────────────────────▼────────────────────────────────────┐
|
||||
│ Query Service (Go) │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ HTTP Handlers (http_handler.go) │ │
|
||||
│ │ - QueryRangeV5 (Main query endpoint) │ │
|
||||
│ │ - GetWaterfallSpansForTrace │ │
|
||||
│ │ - GetFlamegraphSpansForTrace │ │
|
||||
│ │ - Trace Fields API │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ Querier (querier.go) │ │
|
||||
│ │ - Query orchestration │ │
|
||||
│ │ - Cache management │ │
|
||||
│ │ - Result merging │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ Statement Builders │ │
|
||||
│ │ - traceQueryStatementBuilder │ │
|
||||
│ │ - traceOperatorStatementBuilder │ │
|
||||
│ │ - Builds ClickHouse SQL from query specs │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ ClickHouse Reader (clickhouseReader/) │ │
|
||||
│ │ - Direct trace retrieval │ │
|
||||
│ │ - Waterfall/Flamegraph data processing │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│ ClickHouse Protocol
|
||||
┌────────────────────▼────────────────────────────────────┐
|
||||
│ ClickHouse Database │
|
||||
│ - signoz_traces.distributed_signoz_index_v3 │
|
||||
│ - signoz_traces.distributed_trace_summary │
|
||||
│ - signoz_traces.distributed_tag_attributes_v2 │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Models
|
||||
|
||||
### Core Trace Models
|
||||
|
||||
**Location**: `pkg/query-service/model/trace.go`
|
||||
|
||||
### Query Request Models
|
||||
|
||||
**Location**: `pkg/types/querybuildertypes/querybuildertypesv5/`
|
||||
|
||||
- `QueryRangeRequest`: Main query request structure
|
||||
- `QueryBuilderQuery[TraceAggregation]`: Query builder specification for traces
|
||||
- `QueryBuilderTraceOperator`: Trace operator query specification
|
||||
- `CompositeQuery`: Container for multiple queries
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### 1. Query Range API (V5) - Primary Query Endpoint
|
||||
|
||||
**Endpoint**: `POST /api/v5/query_range`
|
||||
|
||||
**Handler**: `QuerierAPI.QueryRange` → `querier.QueryRange`
|
||||
|
||||
**Purpose**: Main query endpoint for traces, logs, and metrics. Supports:
|
||||
- Query builder queries
|
||||
- Trace operator queries
|
||||
- Aggregations, filters, group by
|
||||
- Time series, scalar, and raw data requests
|
||||
|
||||
> **Note**: For detailed information about the Query Range API, including request/response models, query types, and common code flows, see the [Query Range API Documentation](./QUERY_RANGE_API.md).
|
||||
|
||||
**Trace-Specific Details**:
|
||||
- Uses `traceQueryStatementBuilder` for SQL generation
|
||||
- Supports trace-specific aggregations (count, avg, p99, etc. on duration_nano)
|
||||
- Trace operator queries combine multiple trace queries with set operations
|
||||
- Time range optimization when `trace_id` filter is present
|
||||
|
||||
**Key Files**:
|
||||
- `pkg/telemetrytraces/statement_builder.go` - Trace SQL generation
|
||||
- `pkg/telemetrytraces/trace_operator_statement_builder.go` - Trace operator SQL
|
||||
- `pkg/querier/trace_operator_query.go` - Trace operator execution
|
||||
|
||||
### 2. Waterfall View API
|
||||
|
||||
**Endpoint**: `POST /api/v2/traces/waterfall/{traceId}`
|
||||
|
||||
**Handler**: `GetWaterfallSpansForTraceWithMetadata`
|
||||
|
||||
**Purpose**: Retrieves spans for waterfall visualization with metadata
|
||||
|
||||
**Request Parameters**:
|
||||
```go
|
||||
type GetWaterfallSpansForTraceWithMetadataParams struct {
|
||||
SelectedSpanID string // Selected span to focus on
|
||||
IsSelectedSpanIDUnCollapsed bool // Whether selected span is expanded
|
||||
UncollapsedSpans []string // List of expanded span IDs
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```go
|
||||
type GetWaterfallSpansForTraceWithMetadataResponse struct {
|
||||
StartTimestampMillis uint64 // Trace start time
|
||||
EndTimestampMillis uint64 // Trace end time
|
||||
DurationNano uint64 // Total duration
|
||||
RootServiceName string // Root service
|
||||
RootServiceEntryPoint string // Entry point operation
|
||||
TotalSpansCount uint64 // Total spans
|
||||
TotalErrorSpansCount uint64 // Error spans
|
||||
ServiceNameToTotalDurationMap map[string]uint64 // Service durations
|
||||
Spans []*Span // Span tree
|
||||
HasMissingSpans bool // Missing spans indicator
|
||||
UncollapsedSpans []string // Expanded spans
|
||||
}
|
||||
```
|
||||
|
||||
**Code Flow**:
|
||||
```
|
||||
Handler → ClickHouseReader.GetWaterfallSpansForTraceWithMetadata
|
||||
→ Query trace_summary for time range
|
||||
→ Query spans from signoz_index_v3
|
||||
→ Build span tree structure
|
||||
→ Apply uncollapsed/selected span logic
|
||||
→ Return filtered spans (500 span limit)
|
||||
```
|
||||
|
||||
**Key Files**:
|
||||
- `pkg/query-service/app/http_handler.go:1748` - Handler
|
||||
- `pkg/query-service/app/clickhouseReader/reader.go:873` - Implementation
|
||||
- `pkg/query-service/app/traces/tracedetail/waterfall.go` - Tree processing
|
||||
|
||||
### 3. Flamegraph View API
|
||||
|
||||
**Endpoint**: `POST /api/v2/traces/flamegraph/{traceId}`
|
||||
|
||||
**Handler**: `GetFlamegraphSpansForTrace`
|
||||
|
||||
**Purpose**: Retrieves spans organized by level for flamegraph visualization
|
||||
|
||||
**Request Parameters**:
|
||||
```go
|
||||
type GetFlamegraphSpansForTraceParams struct {
|
||||
SelectedSpanID string // Selected span ID
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```go
|
||||
type GetFlamegraphSpansForTraceResponse struct {
|
||||
StartTimestampMillis uint64 // Trace start
|
||||
EndTimestampMillis uint64 // Trace end
|
||||
DurationNano uint64 // Total duration
|
||||
Spans [][]*FlamegraphSpan // Spans organized by level
|
||||
}
|
||||
```
|
||||
|
||||
**Code Flow**:
|
||||
```
|
||||
Handler → ClickHouseReader.GetFlamegraphSpansForTrace
|
||||
→ Query trace_summary for time range
|
||||
→ Query spans from signoz_index_v3
|
||||
→ Build span tree
|
||||
→ BFS traversal to organize by level
|
||||
→ Sample spans (50 levels, 100 spans/level max)
|
||||
→ Return level-organized spans
|
||||
```
|
||||
|
||||
**Key Files**:
|
||||
- `pkg/query-service/app/http_handler.go:1781` - Handler
|
||||
- `pkg/query-service/app/clickhouseReader/reader.go:1091` - Implementation
|
||||
- `pkg/query-service/app/traces/tracedetail/flamegraph.go` - BFS processing
|
||||
|
||||
### 4. Trace Fields API
|
||||
|
||||
**Endpoint**:
|
||||
- `GET /api/v2/traces/fields` - Get available trace fields
|
||||
- `POST /api/v2/traces/fields` - Update trace field metadata
|
||||
|
||||
**Handler**: `traceFields`, `updateTraceField`
|
||||
|
||||
**Purpose**: Manage trace field metadata for query builder
|
||||
|
||||
**Key Files**:
|
||||
- `pkg/query-service/app/http_handler.go:4912` - Get handler
|
||||
- `pkg/query-service/app/http_handler.go:4921` - Update handler
|
||||
|
||||
### 5. Trace Funnels API
|
||||
|
||||
**Endpoint**: `/api/v1/trace-funnels/*`
|
||||
|
||||
**Purpose**: Manage trace funnels (multi-step trace analysis)
|
||||
|
||||
**Endpoints**:
|
||||
- `POST /api/v1/trace-funnels/new` - Create funnel
|
||||
- `GET /api/v1/trace-funnels/list` - List funnels
|
||||
- `GET /api/v1/trace-funnels/{funnel_id}` - Get funnel
|
||||
- `PUT /api/v1/trace-funnels/{funnel_id}` - Update funnel
|
||||
- `DELETE /api/v1/trace-funnels/{funnel_id}` - Delete funnel
|
||||
- `POST /api/v1/trace-funnels/{funnel_id}/analytics/*` - Analytics endpoints
|
||||
|
||||
**Key Files**:
|
||||
- `pkg/query-service/app/http_handler.go:5084` - Route registration
|
||||
- `pkg/modules/tracefunnel/` - Funnel implementation
|
||||
|
||||
---
|
||||
|
||||
## Code Flows
|
||||
|
||||
### Flow 1: Query Range Request (V5)
|
||||
|
||||
This is the primary query flow for traces. For the complete flow covering all query types, see the [Query Range API Documentation](./QUERY_RANGE_API.md#code-flow).
|
||||
|
||||
**Trace-Specific Flow**:
|
||||
|
||||
```
|
||||
1. HTTP Request
|
||||
POST /api/v5/query_range
|
||||
↓
|
||||
2. Querier.QueryRange (common flow - see QUERY_RANGE_API.md)
|
||||
↓
|
||||
3. Trace Query Processing:
|
||||
a. Builder Query (QueryTypeBuilder with SignalTraces):
|
||||
- newBuilderQuery() creates builderQuery instance
|
||||
- Uses traceStmtBuilder (traceQueryStatementBuilder)
|
||||
↓
|
||||
b. Trace Operator Query (QueryTypeTraceOperator):
|
||||
- newTraceOperatorQuery() creates traceOperatorQuery
|
||||
- Uses traceOperatorStmtBuilder
|
||||
↓
|
||||
4. Trace Statement Building
|
||||
traceQueryStatementBuilder.Build() (pkg/telemetrytraces/statement_builder.go:58)
|
||||
- Resolves trace field keys from metadata store
|
||||
- Optimizes time range if trace_id filter present (queries trace_summary)
|
||||
- Maps fields using traceFieldMapper
|
||||
- Builds conditions using traceConditionBuilder
|
||||
- Builds SQL based on request type:
|
||||
* RequestTypeRaw → buildListQuery()
|
||||
* RequestTypeTimeSeries → buildTimeSeriesQuery()
|
||||
* RequestTypeScalar → buildScalarQuery()
|
||||
* RequestTypeTrace → buildTraceQuery()
|
||||
↓
|
||||
5. Query Execution
|
||||
builderQuery.Execute() (pkg/querier/builder_query.go)
|
||||
- Executes SQL against ClickHouse (signoz_traces database)
|
||||
- Processes results into response format
|
||||
↓
|
||||
6. Result Processing (common flow - see QUERY_RANGE_API.md)
|
||||
- Merges results from multiple queries
|
||||
- Applies formulas if present
|
||||
- Handles caching
|
||||
↓
|
||||
7. HTTP Response
|
||||
- Returns QueryRangeResponse with trace results
|
||||
```
|
||||
|
||||
**Trace-Specific Key Components**:
|
||||
- `pkg/telemetrytraces/statement_builder.go` - Trace SQL generation
|
||||
- `pkg/telemetrytraces/field_mapper.go` - Trace field mapping
|
||||
- `pkg/telemetrytraces/condition_builder.go` - Trace filter building
|
||||
- `pkg/telemetrytraces/trace_operator_statement_builder.go` - Trace operator SQL
|
||||
|
||||
### Flow 2: Waterfall View Request
|
||||
|
||||
```
|
||||
1. HTTP Request
|
||||
POST /api/v2/traces/waterfall/{traceId}
|
||||
↓
|
||||
2. GetWaterfallSpansForTraceWithMetadata handler
|
||||
- Extracts traceId from URL
|
||||
- Parses request body for params
|
||||
↓
|
||||
3. ClickHouseReader.GetWaterfallSpansForTraceWithMetadata
|
||||
- Checks cache first (5 minute TTL)
|
||||
↓
|
||||
4. If cache miss:
|
||||
a. Query trace_summary table
|
||||
SELECT * FROM distributed_trace_summary WHERE trace_id = ?
|
||||
- Gets time range (start, end, num_spans)
|
||||
↓
|
||||
b. Query spans table
|
||||
SELECT ... FROM distributed_signoz_index_v3
|
||||
WHERE trace_id = ?
|
||||
AND ts_bucket_start >= ? AND ts_bucket_start <= ?
|
||||
- Retrieves all spans for trace
|
||||
↓
|
||||
c. Build span tree
|
||||
- Parse references to build parent-child relationships
|
||||
- Identify root spans (no parent)
|
||||
- Calculate service durations
|
||||
↓
|
||||
d. Cache result
|
||||
↓
|
||||
5. Apply selection logic
|
||||
tracedetail.GetSelectedSpans()
|
||||
- Traverses tree based on uncollapsed spans
|
||||
- Finds path to selected span
|
||||
- Returns sliding window (500 spans max)
|
||||
↓
|
||||
6. HTTP Response
|
||||
- Returns spans with metadata
|
||||
```
|
||||
|
||||
**Key Components**:
|
||||
- `pkg/query-service/app/clickhouseReader/reader.go:873`
|
||||
- `pkg/query-service/app/traces/tracedetail/waterfall.go`
|
||||
- `pkg/query-service/model/trace.go`
|
||||
|
||||
### Flow 3: Trace Operator Query
|
||||
|
||||
Trace operators allow combining multiple trace queries with set operations.
|
||||
|
||||
```
|
||||
1. QueryRangeRequest with QueryTypeTraceOperator
|
||||
↓
|
||||
2. Querier identifies trace operator queries
|
||||
- Parses expression to find dependencies
|
||||
- Collects referenced queries
|
||||
↓
|
||||
3. traceOperatorStatementBuilder.Build()
|
||||
- Parses expression (e.g., "A AND B", "A OR B")
|
||||
- Builds expression tree
|
||||
↓
|
||||
4. traceOperatorCTEBuilder.build()
|
||||
- Creates CTEs (Common Table Expressions) for each query
|
||||
- Builds final query with set operations:
|
||||
* AND → INTERSECT
|
||||
* OR → UNION
|
||||
* NOT → EXCEPT
|
||||
↓
|
||||
5. Execute combined query
|
||||
- Returns traces matching the operator expression
|
||||
```
|
||||
|
||||
**Key Components**:
|
||||
- `pkg/telemetrytraces/trace_operator_statement_builder.go`
|
||||
- `pkg/telemetrytraces/trace_operator_cte_builder.go`
|
||||
- `pkg/querier/trace_operator_query.go`
|
||||
|
||||
---
|
||||
|
||||
## Key Components
|
||||
|
||||
> **Note**: For common components used across all signals (Querier, TelemetryStore, MetadataStore, etc.), see the [Query Range API Documentation](./QUERY_RANGE_API.md#key-components).
|
||||
|
||||
### 1. Trace Statement Builder
|
||||
|
||||
**Location**: `pkg/telemetrytraces/statement_builder.go`
|
||||
|
||||
**Purpose**: Converts trace query builder specifications into ClickHouse SQL
|
||||
|
||||
**Key Methods**:
|
||||
- `Build()`: Main entry point, builds SQL statement
|
||||
- `buildListQuery()`: Builds query for raw/list results
|
||||
- `buildTimeSeriesQuery()`: Builds query for time series
|
||||
- `buildScalarQuery()`: Builds query for scalar values
|
||||
- `buildTraceQuery()`: Builds query for trace-specific results
|
||||
|
||||
**Key Features**:
|
||||
- Trace field resolution via metadata store
|
||||
- Time range optimization for trace_id filters (queries trace_summary first)
|
||||
- Support for trace aggregations, filters, group by, ordering
|
||||
- Calculated field support (http_method, db_name, has_error, etc.)
|
||||
- Resource filter support via resourceFilterStmtBuilder
|
||||
|
||||
### 2. Trace Field Mapper
|
||||
|
||||
**Location**: `pkg/telemetrytraces/field_mapper.go`
|
||||
|
||||
**Purpose**: Maps trace query field names to ClickHouse column names
|
||||
|
||||
**Field Types**:
|
||||
- **Intrinsic Fields**: Built-in fields (trace_id, span_id, duration_nano, name, kind_string, status_code_string, etc.)
|
||||
- **Calculated Fields**: Derived fields (http_method, db_name, has_error, response_status_code, etc.)
|
||||
- **Attribute Fields**: Dynamic span/resource attributes (accessed via attributes_string, attributes_number, attributes_bool, resources_string)
|
||||
|
||||
**Example Mapping**:
|
||||
```
|
||||
"service.name" → "resource_string_service$$name"
|
||||
"http.method" → Calculated from attributes_string['http.method']
|
||||
"duration_nano" → "duration_nano" (intrinsic)
|
||||
"trace_id" → "trace_id" (intrinsic)
|
||||
```
|
||||
|
||||
**Key Methods**:
|
||||
- `MapField()`: Maps a field to ClickHouse expression
|
||||
- `MapAttribute()`: Maps attribute fields
|
||||
- `MapResource()`: Maps resource fields
|
||||
|
||||
### 3. Trace Condition Builder
|
||||
|
||||
**Location**: `pkg/telemetrytraces/condition_builder.go`
|
||||
|
||||
**Purpose**: Builds WHERE clause conditions from trace filter expressions
|
||||
|
||||
**Supported Operators**:
|
||||
- `=`, `!=`, `IN`, `NOT IN`
|
||||
- `>`, `>=`, `<`, `<=`
|
||||
- `LIKE`, `NOT LIKE`, `ILIKE`
|
||||
- `EXISTS`, `NOT EXISTS`
|
||||
- `CONTAINS`, `NOT CONTAINS`
|
||||
|
||||
**Key Methods**:
|
||||
- `BuildCondition()`: Builds condition from filter expression
|
||||
- Handles attribute, resource, and intrinsic field filtering
|
||||
|
||||
### 4. Trace Operator Statement Builder
|
||||
|
||||
**Location**: `pkg/telemetrytraces/trace_operator_statement_builder.go`
|
||||
|
||||
**Purpose**: Builds SQL for trace operator queries (AND, OR, NOT operations on trace queries)
|
||||
|
||||
**Key Methods**:
|
||||
- `Build()`: Builds CTE-based SQL for trace operators
|
||||
- Uses `traceOperatorCTEBuilder` to create Common Table Expressions
|
||||
|
||||
**Features**:
|
||||
- Parses operator expressions (e.g., "A AND B")
|
||||
- Creates CTEs for each referenced query
|
||||
- Combines results using INTERSECT, UNION, EXCEPT
|
||||
|
||||
### 5. ClickHouse Reader (Trace-Specific Methods)
|
||||
|
||||
**Location**: `pkg/query-service/app/clickhouseReader/reader.go`
|
||||
|
||||
**Purpose**: Direct trace data retrieval and processing (bypasses query builder)
|
||||
|
||||
**Key Methods**:
|
||||
- `GetWaterfallSpansForTraceWithMetadata()`: Waterfall view data
|
||||
- `GetFlamegraphSpansForTrace()`: Flamegraph view data
|
||||
- `SearchTraces()`: Legacy trace search (still used for some flows)
|
||||
- `GetMinAndMaxTimestampForTraceID()`: Time range optimization helper
|
||||
|
||||
**Caching**: Implements 5-minute cache for trace detail views
|
||||
|
||||
**Note**: These methods are used for trace-specific visualizations. For general trace queries, use the Query Range API.
|
||||
|
||||
---
|
||||
|
||||
## Query Building System
|
||||
|
||||
> **Note**: For general query building concepts and patterns, see the [Query Range API Documentation](./QUERY_RANGE_API.md). This section covers trace-specific aspects.
|
||||
|
||||
### Trace Query Builder Structure
|
||||
|
||||
A trace query consists of:
|
||||
|
||||
```go
|
||||
QueryBuilderQuery[TraceAggregation] {
|
||||
Name: "query_name",
|
||||
Signal: SignalTraces,
|
||||
Filter: &Filter {
|
||||
Expression: "service.name = 'api' AND duration_nano > 1000000"
|
||||
},
|
||||
Aggregations: []TraceAggregation {
|
||||
{Expression: "count()", Alias: "total"},
|
||||
{Expression: "avg(duration_nano)", Alias: "avg_duration"},
|
||||
{Expression: "p99(duration_nano)", Alias: "p99"},
|
||||
},
|
||||
GroupBy: []GroupByKey {
|
||||
{TelemetryFieldKey: {Name: "service.name", ...}},
|
||||
},
|
||||
Order: []OrderBy {...},
|
||||
Limit: 100,
|
||||
}
|
||||
```
|
||||
|
||||
### Trace-Specific SQL Generation Process
|
||||
|
||||
1. **Field Resolution**:
|
||||
- Resolve trace field names using `traceFieldMapper`
|
||||
- Handle intrinsic, calculated, and attribute fields
|
||||
- Map to ClickHouse columns (e.g., `service.name` → `resource_string_service$$name`)
|
||||
|
||||
2. **Time Range Optimization**:
|
||||
- If `trace_id` filter present, query `trace_summary` first
|
||||
- Narrow time range based on trace start/end times
|
||||
- Reduces data scanned significantly
|
||||
|
||||
3. **Filter Building**:
|
||||
- Convert filter expression using `traceConditionBuilder`
|
||||
- Handle attribute filters (attributes_string, attributes_number, attributes_bool)
|
||||
- Handle resource filters (resources_string)
|
||||
- Handle intrinsic field filters
|
||||
|
||||
4. **Aggregation Building**:
|
||||
- Build SELECT with trace aggregations
|
||||
- Support trace-specific functions (count, avg, p99, etc. on duration_nano)
|
||||
|
||||
5. **Group By Building**:
|
||||
- Add GROUP BY clause with trace fields
|
||||
- Support grouping by service.name, operation name, etc.
|
||||
|
||||
6. **Order Building**:
|
||||
- Add ORDER BY clause
|
||||
- Support ordering by duration, timestamp, etc.
|
||||
|
||||
7. **Limit/Offset**:
|
||||
- Add pagination
|
||||
|
||||
### Example Generated SQL
|
||||
|
||||
For query: `count() WHERE service.name = 'api' GROUP BY service.name`
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
count() AS total,
|
||||
resource_string_service$$name AS service_name
|
||||
FROM signoz_traces.distributed_signoz_index_v3
|
||||
WHERE
|
||||
timestamp >= toDateTime64(1234567890/1e9, 9)
|
||||
AND timestamp <= toDateTime64(1234567899/1e9, 9)
|
||||
AND ts_bucket_start >= toDateTime64(1234567890/1e9, 9)
|
||||
AND ts_bucket_start <= toDateTime64(1234567899/1e9, 9)
|
||||
AND resource_string_service$$name = 'api'
|
||||
GROUP BY resource_string_service$$name
|
||||
```
|
||||
|
||||
**Note**: The query uses `ts_bucket_start` for efficient time filtering (partitioning column).
|
||||
|
||||
---
|
||||
|
||||
## Storage Schema
|
||||
|
||||
### Main Tables
|
||||
|
||||
**Location**: `pkg/telemetrytraces/tables.go`
|
||||
|
||||
#### 1. `distributed_signoz_index_v3`
|
||||
|
||||
Main span index table. Stores all span data.
|
||||
|
||||
**Key Columns**:
|
||||
- `timestamp`: Span timestamp
|
||||
- `duration_nano`: Span duration
|
||||
- `span_id`, `trace_id`: Identifiers
|
||||
- `has_error`: Error indicator
|
||||
- `kind`: Span kind
|
||||
- `name`: Operation name
|
||||
- `attributes_string`, `attributes_number`, `attributes_bool`: Attributes
|
||||
- `resources_string`: Resource attributes
|
||||
- `events`: Span events
|
||||
- `status_code_string`, `status_message`: Status
|
||||
- `ts_bucket_start`: Time bucket for partitioning
|
||||
|
||||
#### 2. `distributed_trace_summary`
|
||||
|
||||
Trace-level summary for quick lookups.
|
||||
|
||||
**Columns**:
|
||||
- `trace_id`: Trace identifier
|
||||
- `start`: Earliest span timestamp
|
||||
- `end`: Latest span timestamp
|
||||
- `num_spans`: Total span count
|
||||
|
||||
#### 3. `distributed_tag_attributes_v2`
|
||||
|
||||
Metadata table for attribute keys.
|
||||
|
||||
**Purpose**: Stores available attribute keys for autocomplete
|
||||
|
||||
#### 4. `distributed_span_attributes_keys`
|
||||
|
||||
Span attribute keys metadata.
|
||||
|
||||
**Purpose**: Tracks which attributes exist in spans
|
||||
|
||||
### Database
|
||||
|
||||
All trace tables are in the `signoz_traces` database.
|
||||
|
||||
---
|
||||
|
||||
## Extending the Traces Module
|
||||
|
||||
### Adding a New Calculated Field
|
||||
|
||||
1. **Define Field in Constants** (`pkg/telemetrytraces/const.go`):
|
||||
```go
|
||||
CalculatedFields = map[string]telemetrytypes.TelemetryFieldKey{
|
||||
"my_new_field": {
|
||||
Name: "my_new_field",
|
||||
Description: "Description of the field",
|
||||
Signal: telemetrytypes.SignalTraces,
|
||||
FieldContext: telemetrytypes.FieldContextSpan,
|
||||
FieldDataType: telemetrytypes.FieldDataTypeString,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
2. **Implement Field Mapping** (`pkg/telemetrytraces/field_mapper.go`):
|
||||
```go
|
||||
func (fm *fieldMapper) MapField(field telemetrytypes.TelemetryFieldKey) (string, error) {
|
||||
if field.Name == "my_new_field" {
|
||||
// Return ClickHouse expression
|
||||
return "attributes_string['my.attribute.key']", nil
|
||||
}
|
||||
// ... existing mappings
|
||||
}
|
||||
```
|
||||
|
||||
3. **Update Condition Builder** (if needed for filtering):
|
||||
```go
|
||||
// In condition_builder.go, add support for your field
|
||||
```
|
||||
|
||||
### Adding a New API Endpoint
|
||||
|
||||
1. **Add Handler Method** (`pkg/query-service/app/http_handler.go`):
|
||||
```go
|
||||
func (aH *APIHandler) MyNewTraceHandler(w http.ResponseWriter, r *http.Request) {
|
||||
// Extract parameters
|
||||
// Call reader or querier
|
||||
// Return response
|
||||
}
|
||||
```
|
||||
|
||||
2. **Register Route** (in `RegisterRoutes` or separate method):
|
||||
```go
|
||||
router.HandleFunc("/api/v2/traces/my-endpoint",
|
||||
am.ViewAccess(aH.MyNewTraceHandler)).Methods(http.MethodPost)
|
||||
```
|
||||
|
||||
3. **Implement Logic**:
|
||||
- Add to `ClickHouseReader` if direct DB access needed
|
||||
- Or use `Querier` for query builder queries
|
||||
|
||||
### Adding a New Aggregation Function
|
||||
|
||||
1. **Update Aggregation Rewriter** (`pkg/querybuilder/agg_expr_rewriter.go`):
|
||||
```go
|
||||
func (r *aggExprRewriter) RewriteAggregation(expr string) (string, error) {
|
||||
// Add parsing for your function
|
||||
if strings.HasPrefix(expr, "my_function(") {
|
||||
// Return ClickHouse SQL expression
|
||||
return "myClickHouseFunction(...)", nil
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. **Update Statement Builder** (if special handling needed):
|
||||
```go
|
||||
// In statement_builder.go, add special case if needed
|
||||
```
|
||||
|
||||
### Adding Trace Operator Support
|
||||
|
||||
Trace operators are already extensible. To add a new operator:
|
||||
|
||||
1. **Update Grammar** (`grammar/TraceOperatorGrammar.g4`):
|
||||
```antlr
|
||||
operator: AND | OR | NOT | MY_NEW_OPERATOR;
|
||||
```
|
||||
|
||||
2. **Update CTE Builder** (`pkg/telemetrytraces/trace_operator_cte_builder.go`):
|
||||
```go
|
||||
func (b *traceOperatorCTEBuilder) buildOperatorQuery(op TraceOperatorType) string {
|
||||
switch op {
|
||||
case TraceOperatorTypeMyNewOperator:
|
||||
return "MY_CLICKHOUSE_OPERATION"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Pattern 1: Query with Filter
|
||||
|
||||
```go
|
||||
query := qbtypes.QueryBuilderQuery[qbtypes.TraceAggregation]{
|
||||
Name: "filtered_traces",
|
||||
Signal: telemetrytypes.SignalTraces,
|
||||
Filter: &qbtypes.Filter{
|
||||
Expression: "service.name = 'api' AND duration_nano > 1000000",
|
||||
},
|
||||
Aggregations: []qbtypes.TraceAggregation{
|
||||
{Expression: "count()", Alias: "total"},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 2: Time Series Query
|
||||
|
||||
```go
|
||||
query := qbtypes.QueryBuilderQuery[qbtypes.TraceAggregation]{
|
||||
Name: "time_series",
|
||||
Signal: telemetrytypes.SignalTraces,
|
||||
Aggregations: []qbtypes.TraceAggregation{
|
||||
{Expression: "avg(duration_nano)", Alias: "avg_duration"},
|
||||
},
|
||||
GroupBy: []qbtypes.GroupByKey{
|
||||
{TelemetryFieldKey: telemetrytypes.TelemetryFieldKey{
|
||||
Name: "service.name",
|
||||
FieldContext: telemetrytypes.FieldContextResource,
|
||||
}},
|
||||
},
|
||||
StepInterval: qbtypes.Step{Duration: time.Minute},
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 3: Trace Operator Query
|
||||
|
||||
```go
|
||||
query := qbtypes.QueryBuilderTraceOperator{
|
||||
Name: "operator_query",
|
||||
Expression: "A AND B", // A and B are query names
|
||||
Filter: &qbtypes.Filter{
|
||||
Expression: "duration_nano > 5000000",
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Caching
|
||||
|
||||
- **Trace Detail Views**: 5-minute cache for waterfall/flamegraph
|
||||
- **Query Results**: Bucket-based caching in querier
|
||||
- **Metadata**: Cached attribute keys and field metadata
|
||||
|
||||
### Query Optimization
|
||||
|
||||
1. **Time Range Optimization**: When `trace_id` is in filter, query `trace_summary` first to narrow time range
|
||||
2. **Index Usage**: Queries use `ts_bucket_start` for time filtering
|
||||
3. **Limit Enforcement**: Waterfall/flamegraph have span limits (500/50)
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Use Query Builder**: Prefer query builder over raw SQL for better optimization
|
||||
2. **Limit Time Ranges**: Always specify reasonable time ranges
|
||||
3. **Use Aggregations**: For large datasets, use aggregations instead of raw data
|
||||
4. **Cache Awareness**: Be mindful of cache TTLs when testing
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
### Key Files
|
||||
|
||||
- `pkg/telemetrytraces/` - Core trace query building
|
||||
- `statement_builder.go` - Trace SQL generation
|
||||
- `field_mapper.go` - Trace field mapping
|
||||
- `condition_builder.go` - Trace filter building
|
||||
- `trace_operator_statement_builder.go` - Trace operator SQL
|
||||
- `pkg/query-service/app/clickhouseReader/reader.go` - Direct trace access
|
||||
- `pkg/query-service/app/http_handler.go` - API handlers
|
||||
- `pkg/query-service/model/trace.go` - Data models
|
||||
|
||||
### Related Documentation
|
||||
|
||||
- [Query Range API Documentation](./QUERY_RANGE_API.md) - Common query_range API details
|
||||
- [OpenTelemetry Specification](https://opentelemetry.io/docs/specs/)
|
||||
- [ClickHouse Documentation](https://clickhouse.com/docs)
|
||||
- [Query Builder Guide](../contributing/go/query-builder.md)
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
||||
When contributing to the traces module:
|
||||
|
||||
1. **Follow Existing Patterns**: Match the style of existing code
|
||||
2. **Add Tests**: Include unit tests for new functionality
|
||||
3. **Update Documentation**: Update this doc for significant changes
|
||||
4. **Consider Performance**: Optimize queries and use caching appropriately
|
||||
5. **Handle Errors**: Provide meaningful error messages
|
||||
|
||||
For questions or help, reach out to the maintainers or open an issue.
|
||||
Reference in New Issue
Block a user