Nikhil Mantri 446dd4589f
Some checks are pending
build-staging / staging (push) Blocked by required conditions
build-staging / prepare (push) Waiting to run
build-staging / js-build (push) Blocked by required conditions
build-staging / go-build (push) Blocked by required conditions
Release Drafter / update_release_draft (push) Waiting to run
feat(infra-monitoring): v2 volumes integration tests (#11431)
* chore: updated logic and use centralized function in the module

* chore: filter metric groups

* chore: filter metric groups

* chore: formula correction

* chore: added step flooring note

* chore: comment correction

* chore: comment correction

* chore: removed function

* chore: renamed variables

* chore: added happy test

* chore: added test 2 for accuracy and test 3 for missing metrics check

* chore: added filter test 4

* chore: added 5th test for filterByStatus

* chore: added group by tests

* chore: pagination test added

* chore: added validation tests

* chore: added auth test

* chore: added all tests

* chore: fix for surfacing meta for pods custom group by

* chore: added nodes integration test suite

* chore: namespaces integration tests

* ci: register inframonitoring suite + ruff format 01_hosts

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: added integration tests for clusters

* chore: formatting changed

* chore: formatting changed

* chore: formatting changed

* chore: added volumes integration tests

* chore: added order by host.name test

* refactor(infra-monitoring): address review comments on hosts integration tests

- inline _post helper at call sites
- combine filter operator tests into parametrized test_hosts_filter
- combine bad attr/grammar tests into parametrized test_hosts_filter_invalid
- convert orderby total-invariant nested loop to stacked parametrize
- drop redundant test_hosts_auth (auth covered globally)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align pods integration tests with review feedback

- inline _post helper at call sites
- combine filter operator tests into parametrized test_pods_filter
- combine bad attr/grammar tests into parametrized test_pods_filter_invalid
- convert orderby total-invariant nested loop to stacked parametrize
- drop redundant test_pods_auth (auth covered globally)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align nodes integration tests with review feedback

- inline _post helper at call sites
- combine filter operator tests into parametrized test_nodes_filter
- combine bad attr/grammar tests into parametrized test_nodes_filter_invalid
- convert orderby total-invariant nested loop to stacked parametrize
- drop redundant test_nodes_auth (auth covered globally)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align namespaces integration tests with review feedback

- inline _post helper at call sites
- combine filter operator tests into parametrized test_namespaces_filter
- combine bad attr/grammar tests into parametrized test_namespaces_filter_invalid
- convert orderby total-invariant nested loop to stacked parametrize
- drop redundant test_namespaces_auth (auth covered globally)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align clusters integration tests with review feedback

- inline _post helper at call sites
- combine filter operator tests into parametrized test_clusters_filter
- combine bad attr/grammar tests into parametrized test_clusters_filter_invalid
- convert orderby total-invariant nested loop to stacked parametrize
- drop redundant test_clusters_auth (auth covered globally)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align volumes integration tests with review feedback

- inline _post helper at call sites
- combine filter operator tests into parametrized test_volumes_filter
- combine bad attr/grammar tests into parametrized test_volumes_filter_invalid
- convert orderby total-invariant nested loop to stacked parametrize
- drop redundant test_volumes_auth (auth covered globally)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): combine hosts groupby tests into parametrized test

- merge test_hosts_groupby_hostname + test_hosts_groupby_os_type into
  test_hosts_groupby parametrized on (dataset, group key, expected counts/values)
- preserves all assertions incl hostName populated-vs-empty branch coverage

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): combine hosts orderby tests into parametrized test

- merge total_invariant_across_orderby + orderby_correctness + orderby_by_host_name
  into test_hosts_orderby parametrized on (column, record_field) x direction
- each case asserts both the total/len invariant and sortedness; sortedness now
  covered for all metric columns, not just cpu
- single dataset (hosts_orderby.jsonl) + CONTAINS 'order-' guard on all cases

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): combine hosts pagination tests

- fold offset-beyond-total case into the test_hosts_pagination page walk
  (offset K+5 expects 0 records via max(0, min(limit, K-offset)); total
  invariant covers the beyond-total page's total == K)
- single seed instead of two

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): merge hosts happy_path into test_hosts_accuracy

- happy_path and value_accuracy datasets were structurally identical
  (same 4 metrics, same sample counts, 2 hosts); one test now asserts
  shape/contract + exact metric values in a single seed/request
- drop unused hosts_happy_path.jsonl

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): combine pods integration tests logically

- merge happy_path into test_pods_accuracy (datasets structurally identical);
  drop unused pods_happy_path.jsonl
- merge groupby namespace/deployment into parametrized test_pods_groupby
- merge orderby invariant + correctness + by-pod-name into test_pods_orderby
  ((column, record_field) x direction; sortedness now covered for all columns)
- fold offset-beyond-total into test_pods_pagination page walk

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): combine nodes integration tests logically

- merge happy_path into test_nodes_accuracy; drop unused nodes_happy_path.jsonl
- merge orderby invariant + correctness + by-node-name into test_nodes_orderby
  ((column, record_field) x direction; sortedness now covered for all columns)
- fold offset-beyond-total into test_nodes_pagination page walk

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): combine namespaces integration tests logically

- merge happy_path into test_namespaces_accuracy; drop unused namespaces_happy_path.jsonl
- merge orderby invariant + correctness + by-namespace-name into test_namespaces_orderby
- fold offset-beyond-total into test_namespaces_pagination page walk

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): combine clusters integration tests logically

- merge happy_path into test_clusters_accuracy; drop unused clusters_happy_path.jsonl
- merge orderby invariant + correctness + by-cluster-name into test_clusters_orderby
- fold offset-beyond-total into test_clusters_pagination page walk

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): combine volumes integration tests logically

- merge happy_path into test_volumes_accuracy; drop unused volumes_happy_path.jsonl
- merge orderby invariant + correctness + by-pvc-name into test_volumes_orderby
- fold offset-beyond-total into test_volumes_pagination page walk

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(infra-monitoring): assert metric value accuracy in hosts filter tests

- regenerate hosts_filter_dataset.jsonl so every host mirrors the acc-h1
  sample pattern from hosts_value_accuracy.jsonl (CI-proven expected values)
- test_hosts_filter now asserts cpu/memory/wait/load15/diskUsage per filtered
  record against FILTER_DATASET_EXPECTED (1e-9), proving filters don't
  distort aggregation

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): scope filter expected values inside test_hosts_filter

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(infra-monitoring): assert metric value accuracy in pods filter tests

- regenerate pods_filter_dataset.jsonl so every pod mirrors the acc-p1
  sample pattern from pods_value_accuracy.jsonl (CI-proven expected values)
- test_pods_filter now asserts the 6 CPU/memory fields per filtered record

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(infra-monitoring): assert metric value accuracy in nodes filter tests

- regenerate nodes_filter_dataset.jsonl so every node mirrors the acc-n1
  sample pattern from nodes_value_accuracy.jsonl (CI-proven expected values)
- test_nodes_filter now asserts the 4 CPU/memory fields per filtered record

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(infra-monitoring): assert metric value accuracy in namespaces filter tests

- regenerate namespaces_filter_dataset.jsonl so every namespace mirrors the
  acc-ns-1 sample pattern (2 pods) from namespaces_value_accuracy.jsonl
- test_namespaces_filter now asserts namespaceCPU/namespaceMemory per record

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(infra-monitoring): assert metric value accuracy in clusters filter tests

- regenerate clusters_filter_dataset.jsonl so every cluster mirrors the
  acc-cluster-1 sample pattern (2 nodes) from clusters_value_accuracy.jsonl
- test_clusters_filter now asserts the 4 CPU/memory fields per filtered record

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(infra-monitoring): assert metric value accuracy in volumes filter tests

- regenerate volumes_filter_dataset.jsonl so every PVC mirrors the acc-pvc-1
  sample pattern from volumes_value_accuracy.jsonl (CI-proven expected values)
- test_volumes_filter now asserts all 6 volume fields per filtered record

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align nodes groupby + pod-phase tests with structure

- merge pod_phase_counts_list_mode + _no_pods_on_node into parametrized
  test_nodes_pod_phase_counts ((dataset, node, filter, expected) cases)
- rename groupby_cluster -> test_nodes_groupby, parametrize over
  k8s.node.name + k8s.cluster.name; adds the node-name-in-groupBy branch
  (nodeName populated, condition derived) that hosts/pods both cover

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align namespaces groupby with structure

- rename groupby_cluster -> test_namespaces_groupby, parametrize over
  k8s.namespace.name + k8s.cluster.name; adds the namespace-name-in-groupBy
  branch (namespaceName populated) that hosts/pods/nodes all cover

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align clusters groupby with structure

- rename groupby_cloud_provider -> test_clusters_groupby, parametrize over
  k8s.cluster.name + cloud.provider; adds the cluster-name-in-groupBy branch
  (clusterName populated) that hosts/pods/nodes/namespaces all cover

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(infra-monitoring): align volumes groupby with structure

- rename groupby_namespace -> test_volumes_groupby, parametrize over
  k8s.persistentvolumeclaim.name + k8s.namespace.name; adds the
  pvc-name-in-groupBy branch (persistentVolumeClaimName populated) that the
  other endpoints all cover

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-10 09:30:31 +00:00
2021-01-24 13:09:49 +05:30
2021-01-03 18:15:44 +05:30
2021-01-22 11:50:17 +05:30
2025-07-17 10:38:31 +00:00
2022-05-30 17:22:47 +05:30

SigNoz
SigNoz

All your logs, metrics, and traces in one place. Monitor your application, spot issues before they occur and troubleshoot downtime quickly with rich context. SigNoz is a cost-effective open-source alternative to Datadog and New Relic. Visit signoz.io for the full documentation, tutorials, and guide.

GitHub issues tweet

DocumentationReadMe in ChineseReadMe in GermanReadMe in PortugueseSlack CommunityTwitter

Features

Application Performance Monitoring

Use SigNoz APM to monitor your applications and services. It comes with out-of-box charts for key application metrics like p99 latency, error rate, Apdex and operations per second. You can also monitor the database and external calls made from your application. Read more.

You can instrument your application with OpenTelemetry to get started.

apm-cover

Logs Management

SigNoz can be used as a centralized log management solution. We use ClickHouse (used by likes of Uber & Cloudflare) as a datastore, ⎯ an extremely fast and highly optimized storage for logs data. Instantly search through all your logs using quick filters and a powerful query builder.

You can also create charts on your logs and monitor them with customized dashboards. Read more.

logs-management-cover

Distributed Tracing

Distributed Tracing is essential to troubleshoot issues in microservices applications. Powered by OpenTelemetry, distributed tracing in SigNoz can help you track user requests across services to help you identify performance bottlenecks.

See user requests in a detailed breakdown with the help of Flamegraphs and Gantt Charts. Click on any span to see the entire trace represented beautifully, which will help you make sense of where issues actually occurred in the flow of requests.

Read more.

distributed-tracing-cover

Metrics and Dashboards

Ingest metrics from your infrastructure or applications and create customized dashboards to monitor them. Create visualization that suits your needs with a variety of panel types like pie chart, time-series, bar chart, etc.

Create queries on your metrics data quickly with an easy-to-use metrics query builder. Add multiple queries and combine those queries with formulae to create really complex queries quickly.

Read more.

metrics-n-dashboards-cover

LLM Observability

Monitor and debug your LLM applications with comprehensive observability. Track LLM calls, analyze token usage, monitor performance, and gain insights into your AI application's behavior in production.

SigNoz LLM observability helps you understand how your language models are performing, identify issues with prompts and responses, track token usage and costs, and optimize your AI applications for better performance and reliability.

Get started with LLM Observability →

llm-observability-cover

Alerts

Use alerts in SigNoz to get notified when anything unusual happens in your application. You can set alerts on any type of telemetry signal (logs, metrics, traces), create thresholds and set up a notification channel to get notified. Advanced features like alert history and anomaly detection can help you create smarter alerts.

Alerts in SigNoz help you identify issues proactively so that you can address them before they reach your customers.

Read more.

alerts-cover

Exceptions Monitoring

Monitor exceptions automatically in Python, Java, Ruby, and Javascript. For other languages, just drop in a few lines of code and start monitoring exceptions.

See the detailed stack trace for all exceptions caught in your application. You can also log in custom attributes to add more context to your exceptions. For example, you can add attributes to identify users for which exceptions occurred.

Read more.

exceptions-cover



Why SigNoz?

SigNoz is a single tool for all your monitoring and observability needs. Here are a few reasons why you should choose SigNoz:

  • Single tool for observability(logs, metrics, and traces)

  • Built on top of OpenTelemetry, the open-source standard which frees you from any type of vendor lock-in

  • Correlated logs, metrics and traces for much richer context while debugging

  • Uses ClickHouse (used by likes of Uber & Cloudflare) as datastore - an extremely fast and highly optimized storage for observability data

  • DIY Query builder, PromQL, and ClickHouse queries to fulfill all your use-cases around querying observability data

  • Open-Source - you can use open-source, our cloud service or a mix of both based on your use case

Getting Started

Create a SigNoz Cloud Account

SigNoz cloud is the easiest way to get started with SigNoz. Our cloud service is for those users who want to spend more time in getting insights for their application performance without worrying about maintenance.

Get started for free

Deploy using Docker(self-hosted)

Please follow the steps listed here to install using docker

The troubleshooting instructions may be helpful if you face any issues.

 

Deploy in Kubernetes using Helm(self-hosted)

Please follow the steps listed here to install using helm charts



We also offer managed services in your infra. Check our pricing plans for all details.

Join our Slack community

Come say Hi to us on Slack 👋



Languages supported:

SigNoz supports all major programming languages for monitoring. Any framework and language supported by OpenTelemetry is supported by SigNoz. Find instructions for instrumenting different languages below:

You can find our entire documentation here.



Comparisons to Familiar Tools

SigNoz vs Prometheus

Prometheus is good if you want to do just metrics. But if you want to have a seamless experience between metrics, logs and traces, then current experience of stitching together Prometheus & other tools is not great.

SigNoz is a one-stop solution for metrics and other telemetry signals. And because you will use the same standard(OpenTelemetry) to collect all telemetry signals, you can also correlate these signals to troubleshoot quickly.

For example, if you see that there are issues with infrastructure metrics of your k8s cluster at a timestamp, you can jump to other signals like logs and traces to understand the issue quickly.

 

SigNoz vs Jaeger

Jaeger only does distributed tracing. SigNoz supports metrics, traces and logs - all the 3 pillars of observability.

Moreover, SigNoz has few more advanced features wrt Jaeger:

  • Jaegar UI doesnt show any metrics on traces or on filtered traces
  • Jaeger cant get aggregates on filtered traces. For example, p99 latency of requests which have tag - customer_type='premium'. This can be done easily on SigNoz
  • You can also go from traces to logs easily in SigNoz

 

SigNoz vs Elastic

  • SigNoz Logs management are based on ClickHouse, a columnar OLAP datastore which makes aggregate log analytics queries much more efficient
  • 50% lower resource requirement compared to Elastic during ingestion

We have published benchmarks comparing Elastic with SigNoz. Check it out here

 

SigNoz vs Loki

  • SigNoz supports aggregations on high-cardinality data over a huge volume while loki doesnt.
  • SigNoz supports indexes over high cardinality data and has no limitations on the number of indexes, while Loki reaches max streams with a few indexes added to it.
  • Searching over a huge volume of data is difficult and slow in Loki compared to SigNoz

We have published benchmarks comparing Loki with SigNoz. Check it out here



Contributing

We ❤️ contributions big or small. Please read CONTRIBUTING.md to get started with making contributions to SigNoz.

Not sure how to get started? Just ping us on #contributing in our slack community



Documentation

You can find docs at https://signoz.io/docs/. If you need any clarification or find something missing, feel free to raise a GitHub issue with the label documentation or reach out to us at the community slack channel.



Community

Join the slack community to know more about distributed tracing, observability, or SigNoz and to connect with other users and contributors.

If you have any ideas, questions, or any feedback, please share on our Github Discussions

As always, thanks to our amazing contributors!

Description
No description provided
Readme MIT 427 MiB
Languages
TypeScript 52.6%
Go 38.2%
Python 4.5%
SCSS 4.3%
Shell 0.2%
Other 0.1%