Commit Graph

20 Commits

Author SHA1 Message Date
Karan Balani
0766ab31c0 feat: meter reporter for new billing infra (#11016)
Some checks failed
build-staging / prepare (push) Has been cancelled
build-staging / js-build (push) Has been cancelled
build-staging / go-build (push) Has been cancelled
build-staging / staging (push) Has been cancelled
Release Drafter / update_release_draft (push) Has been cancelled
* feat: meter reporter for new billing infra

* feat(meterreporter): simplify code, add metric meters, dry-run zeus call

* feat(meterreporter): add traces meters

* chore: update interval validation to allow min 5 mins interval for testing

* feat: add telemetry for collect and ship durations & improve comments

* feat(meterreporter): sealed-range catch-up and today-partial ticks

* chore: intermediate commit

* feat: improve retention period queries based on workspace ids for logs only for now

* chore: skip meter checkpoint call temporarily

* feat(meterreporter): bootstrap from data floor, emit sentinel zero-readings

* chore: lower HistoricalBackfillDays

* fix(meterreporter): pin retention type

* refactor(meterreporter): remove unused retry config

* refactor: add retentiontypes

* chore: intermediate commit

* feat(meterreporter): add metric and trace meters

* refactor: cleanup comments

* refactor: remove HistoricalBackfillDays

* refactor: move few things to ee package

* refactor: simplify some sections of tick

* refactor: push meters in batch for each day

* chore: add tracing and logging

* feat: make retention buckets generic

* feat(metercollector): add MeterCollector interface and split type packages

* feat(metercollector/retention): add narrow retention slice loader and SQL helpers

* refactor(meterreporter): wire http collectors

* chore(meterreporter): trim comments

* test(metercollector): add collector coverage

* chore(meterreporter): increase catchup window

* fix: ci lint and flag default value

* refactor(meters): align retention and zeus

* refactor(retention): move ttl types

* refactor(meters): rename platform fee collector

* refactor(meters): add meter constructor

* refactor(meters): add window constructor

* refactor(meters): consolidate zeus meter types

* refactor(meters): centralize meter metadata

* refactor(retention): add getter module

* refactor(retention): consolidate ttl types

* chore: use int64 instead of float64 as meter value

* chore: int64 conversion in clickhouse query too

* chore: error log - make failed meter collection louder

* chore: start sending data to zeus

* chore: add debug statement for logging meter data

* chore: simplify meter query only use org id and retention duration

* chore: remove unused functions from retention module and move sqlbuilder function too

* chore: remove unused code

* chore: switch to info context log for testing

* refactor(meterreporter): consolidate collectors and push origin into source

Replaces six near-duplicate collector packages with two parametrized,
factory-shaped ones: telemetrymetercollector for the ClickHouse-backed
meters (log size, span size, datapoint count) and staticmetercollector
for fixed-value meters (base platform fee). Each meter is now a Config
entry in cmd/enterprise/meter.go, materialized by iterating the factory.

Pushes the catchup floor concept out of the reporter and into each
collector via a new Origin method. Telemetry collectors return per-meter
min(unix_milli) FROM signoz_meter.samples; static collectors return
todayStart. The reporter now computes per-meter next-day-to-report and
only invokes a collector for days at/after its own next, eliminating
the over-emit + dropCheckpointed dance.

Other tightening: typed Meter.MeterName with JSON marshalers; Meter
dimensions built via attribute.Key-based zeustypes.NewDimensions;
license flows into Collect from the reporter (collectors stop fetching
it themselves); providerSettings plumbed into the meterreporter
factory closure for harness-style provider construction.

* refactor(meterreporter): per-collector Origin, simpler tick, semconv metrics

Pushes the catchup-floor concept out of the reporter and into each
collector via MeterCollector.Origin. Telemetry collectors return per-
meter min(unix_milli) FROM signoz_meter.samples; static collectors
return today. The reporter computes per-meter next-reportable-day,
iterates the day-loop globally, and only invokes a collector for days
at/after its own next — eliminating the over-emit + dropCheckpointed
dance entirely.

collectOrg is split into three named helpers: provider.checkpoints
(Zeus call + index), provider.nextDays (per-meter origin + checkpoint
max), and pure backfillRange (start/end clamped to yesterday + cap).
collectOrg itself reads as a five-step recipe.

Provider stores collectors as map[MeterName]MeterCollector keyed by
name; the slice + sort.Slice scaffolding is gone, validation moves
into newProvider. eligibleCollectors and report take the map directly.

Start matches the opaquetokenizer pattern: synchronous select+ticker,
sharder + per-org loop with license check (skipping orgs with no
active license), per-tick span scoped via an IIFE so defer span.End()
fires once per tick. goroutinesWg removed.

Config drops Timeout. CatchupMaxDaysPerTick renamed to MaxBackfillDays.
runPhase renamed to report. telemetryStore injection removed (no
longer used after dataFloor moved into the telemetry collector).

Metrics rebuilt around OTel semconv: signoz.meterreporter.checkpoints,
.reports, .collections, .meters — each bumped on success and failure,
with error.type set on failure via a new errors.TypeAttr helper in
pkg/errors. collections also carries signoz.meter.name.

* refactor(meterreporter): rename base platform fee meter, add metric units

Renames signoz.meter.base.platform.fee to signoz.meter.platform.active.
The new name matches the per-service template signoz.meter.<service>
.active that scales for future per-service billing meters; "active"
fits the billing-eligibility semantic (org's platform subscription
is active for the period) without conflating with operational
liveness conventions like Prometheus's `up`.

Adds UCUM annotated-count units to each reporter counter:
  - signoz.meterreporter.checkpoints  -> {checkpoint}
  - signoz.meterreporter.reports      -> {report}
  - signoz.meterreporter.collections  -> {collection}
  - signoz.meterreporter.meters       -> {meter}

* chore: stop leaking collectors if flag is false  and address comments

* fix(meterreporter): correct startup and retention metadata

* fix(meterreporter): recover static meter backfill

* chore: address review comments

* chore: move flag evaluation into reporter

* refactor: fix retention origin for staticmeter collectors

* fix(meterreporter): gate backfill by license day

Replace max_backfill_days with a backfill switch.
Clamp sealed-day catch-up to the license creation day.

Send retention duration dimensions in seconds.

* fix(meterreporter): anchor backfill to license day

* chore: address review comments

* chore: drop unrelated authz schema diff

---------

Co-authored-by: Karan Balani <29383381+balanikaran@users.noreply.github.com>
Co-authored-by: grandwizard28 <vibhupandey28@gmail.com>
2026-05-11 17:47:29 +00:00
Abhishek Kumar Singh
d085a8fd53 chore: custom notifiers in alert manager (#10541)
* chore: custom notifiers in alert manager

* chore: lint fixs

* chore: fix email linter

* chore: added tracing to msteamsv2 notifier

* chore: updated test name + code for timeout errors

* refactor: review comments

* refactor: lint fixes

* chore: updated licenses for notifiers

* chore: updated email notifier from upstream

* chore: updated license header with short notation

---------

Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
2026-04-16 10:33:46 +00:00
Vikrant Gupta
2163e1ce41 chore(lint): enable godot and staticcheck (#10775)
* chore(lint): enable godot and staticcheck

* chore(lint): merge main and fix new lint issues in main
2026-03-31 09:11:49 +00:00
Pandey
b811991f9d feat(middleware): add panic recovery middleware (#10666)
* feat(middleware): add panic recovery middleware with TypeFatal error type

Add a global HTTP recovery middleware that catches panics, logs them
with OTel exception semantic conventions via errors.Attr, and returns
a safe user-facing error response. Introduce TypeFatal/CodeFatal for
unrecoverable failures and WithStacktrace to attach pre-formatted
stack traces to errors. Remove redundant per-handler panic recovery
blocks in querier APIs.

* style(errors): keep WithStacktrace call on same line in test

* fix(middleware): replace fmt.Errorf with errors.New in recovery test

* feat(middleware): add request context to panic recovery logs

Capture request body before handler runs and include method, path, and
body in panic recovery logs using OTel semconv attributes. Improve error
message to direct users to GitHub issues or support.
2026-03-23 06:25:26 +00:00
Pandey
95ed125bd9 feat(instrumentation): add OTel exception semantic convention log handler (#10665)
Some checks failed
build-staging / prepare (push) Has been cancelled
build-staging / js-build (push) Has been cancelled
build-staging / go-build (push) Has been cancelled
build-staging / staging (push) Has been cancelled
Release Drafter / update_release_draft (push) Has been cancelled
* feat(instrumentation): add OTel exception semantic convention log handler

Add a loghandler.Wrapper that enriches error log records with OpenTelemetry
exception semantic convention attributes (exception.type, exception.code,
exception.message, exception.stacktrace).

- Add errors.Attr() helper for standardized error logging under "exception" key
- Add exception log handler that replaces raw error attrs with structured group
- Wire exception handler into the instrumentation SDK logger chain
- Remove LogValue() from errors.base as the handler now owns structuring

* refactor: replace "error", err with errors.Attr(err) across codebase

Migrate all slog error logging from ad-hoc "error", err key-value pairs
to the standardized errors.Attr(err) helper, enabling the exception log
handler to enrich these logs with OTel semantic convention attributes.

* refactor: enforce attr-only slog style across codebase

Change sloglint from kv-only to attr-only, requiring all slog calls to
use typed attributes (slog.String, slog.Any, etc.) instead of key-value
pairs. Convert all existing kv-style slog calls in non-excluded paths.

* refactor: tighten slog.Any to specific types and standardize error attrs

- Replace slog.Any with slog.String for string values (action, key, where_clause)
- Replace slog.Any with slog.Uint64 for uint64 values (start, end, step, etc.)
- Replace slog.Any("err", err) with errors.Attr(err) in dispatcher and segment analytics
- Replace slog.Any("error", ctx.Err()) with errors.Attr in factory registry

* fix(instrumentation): use Unwrapb message for exception.message

Use the explicit error message (m) from Unwrapb instead of
foundErr.Error(), which resolves to the inner cause's message
for wrapped errors.

* feat(errors): capture stacktrace at error creation time

Store program counters ([]uintptr) in base errors at creation time
using runtime.Callers, inspired by thanos-io/thanos/pkg/errors. The
exception log handler reads the stacktrace from the error instead of
capturing at log time, showing where the error originated.

* fix(instrumentation): apply default log wrappers uniformly in NewLogger

Move correlation, filtering, and exception wrappers into NewLogger so
all call sites (including CLI loggers in cmd/) get them automatically.

* refactor(instrumentation): remove variadic wrappers from NewLogger

NewLogger no longer accepts arbitrary wrappers. The core wrappers
(correlation, filtering, exception) are hardcoded, preventing callers
from accidentally duplicating behavior.

* refactor: migrate remaining "error", <var> to errors.Attr across legacy paths

Replace all remaining "error", <variable> key-value pairs with
errors.Attr(<variable>) in pkg/query-service/ and ee/query-service/
paths that were missed in the initial migration due to non-standard
variable names (res.Err, filterErr, apiErrorObj.Err, etc).

* refactor(instrumentation): use flat exception.* keys instead of nested group

Use flat keys (exception.type, exception.code, exception.message,
exception.stacktrace) instead of a nested slog.Group in the exception
log handler.
2026-03-22 04:06:31 +00:00
Karan Balani
6d137bcdff feat: idp attributes mapping (#9841) 2026-01-19 22:27:21 +05:30
Piyush Singariya
bca761498a chore(JSON): Promote Body Paths API (#9592) 2025-12-23 14:11:52 +05:30
Piyush Singariya
e66bfe5961 feat(JSON): JSON Body Metadata (#9593)
* feat: json Body Keys

* feat: telemetry types

* feat: change ExtractBodyPaths

* chore: minor comment change

* chore: func rename, file rename

* chore: change table names

* chore: reflect changes from the overhaul

* test: fixing test 1

* fix: test TestQueryToKeys

* fix: test TestPrepareLogsQuery

* chore: remove db

* chore: go mod

* chore: changes based on review

* chore: changes based on review

* fix: in LIKE operation

* chore: addressed few changes

* revert: test file

* fix: comparison fix

* test: add TestBuildListLogsJSONIndexesQuery

* fix: in test TestBuildListLogsJSONIndexesQuery

* fix: pull promoted paths in single db call

* fix: reducing db calls

* test: fix TestBuildListLogsJSONIndexesQuery

* fix: test TestConditionForJSONBodySearch

* fix: lint try 1

* chore: review changes based on cursor

* fix: use enums only

---------

Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
Co-authored-by: Nityananda Gohain <nityanandagohain@gmail.com>
2025-12-09 20:47:26 +07:00
Pranjul Kalsi
bdce97a727 fix: replace fmt.Errorf with signoz/pkg/errors and update golangci-li… (#9373)
This PR fulfills the requirements of #9069 by:

- Adding a golangci-lint directive (forbidigo) to disallow all fmt.Errorf usages.
- Replacing existing fmt.Errorf instances with structured errors from github.com/SigNoz/signoz/pkg/errors for consistent error classification and lint compliance.
- Verified lint and build integrity.
2025-10-27 16:30:18 +05:30
Vibhu Pandey
c122bc09b4 feat(tokenizer|sso): add tokenizer for session management and oidc sso support (#9183)
## 📄 Summary

- Instead of relying on JWT for session management, we are adding another token system: opaque. This gives the benefits of expiration and revocation.

- We are now ensuring that emails are regex checked throughout the backend.

- Support has been added for OIDC protocol
2025-10-16 18:00:38 +05:30
Vibhu Pandey
c83eaf3d50 chore: enable forbidigo and noerrors in depguard (#9047)
* chore: enable forbidgo

* chore: enable forbidgo
2025-09-09 15:44:27 +05:30
Vikrant Gupta
f61e859901 feat(authz): embed openfga server (#8966)
* feat(access-control): embed openfga in signoz

* feat(authz): rename access control to authz

* feat(authz): fix codeowners and go mod tidy

* feat(authz): fix lint

* feat(authz): update go version and move convertor to instrumentation

* feat(authz): some more lint issues

* feat(authz): some more lint issues

* feat(authz): some more lint issues

* feat(authz): fix more lint issues

* feat(authz): make logger converter interface
2025-09-01 17:10:13 +05:30
Piyush Singariya
d6eed8e79d feat: JSON Flattening in logs pipelines (#8227)
* feat: introducing JSON Flattening

* fix: removed bug and tested

* test: removed testing test

* feat: additional severity levels, and some clearing

* chore: minor changes

* test: added tests for processJSONParser

* test: added check for OnError

* fix: review from ellipsis

* fix: variablise max flattening depth

* Update pkg/query-service/app/logparsingpipeline/pipelineBuilder.go

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update pkg/errors/errors.go

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* fix: quoted JSON strings fix

* test: updating otel collector for testing

* test: update collector's reference

* chore: change with new error package

* chore: set flattening depth equal to 1

* fix: fallback for depth

* fix: change in errors package

* fix: tests

* fix: test

* chore: update collector version

* fix: go.sum

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Nityananda Gohain <nityanandagohain@gmail.com>
2025-07-14 18:48:01 +05:30
Srikanth Chekuri
c5d5c84a0e chore: add fieldmapper implementation (#7955) 2025-05-16 20:09:57 +05:30
Piyush Singariya
03ab6e704b feat: S3 Sync (AWS Integrations) (#7718) 2025-05-14 05:12:41 +05:30
Vibhu Pandey
5bceffbeaa fix: fix modules and handler (#7737)
* fix: fix modules and handler

* fix: fix sqlmigration package

* fix: fix other fmt issues

* fix: fix tests

* fix: fix tests
2025-04-27 16:38:34 +05:30
Vibhu Pandey
bcf7bf38fc feat(alertmanager): add alertmanagertypes (#7101)
add alertmanagertypes
2025-02-12 17:23:18 +00:00
Vibhu Pandey
001122db2c feat(instrumentation): adopt slog (#6907)
### Summary

feat(instrumentation): adopt slog
2025-01-24 09:23:02 +00:00
Vibhu Pandey
bd7d14b1ca feat(render): add render package (#5751)
### Summary

Add `render` package

#### Related Issues / PR's

https://github.com/SigNoz/signoz/pull/5710
2024-08-23 13:07:10 +05:30
Vibhu Pandey
c322fc72d9 feat(errors): add errors package (#5741)
### Summary

Add errors package

#### Related Issues / PR's

https://github.com/SigNoz/signoz/pull/5710
2024-08-22 15:19:32 +05:30