Files
signoz/pkg/querier
Pandey 53b2b2f017 fix(telemetrystore): fix clickhouse connection-pool slot leak (acquire conn timeout) (#11506)
* fix(telemetrystore): upgrade clickhouse-go to v2.44.0 to fix connection-pool slot leak

clickhouse-go v2.43.0 introduced connection-pool slot leaks triggered by context
cancellation: acquire() failed to release the pool slot when idle.Get returned a
cancellation error (ClickHouse/clickhouse-go#1759), and batch.Close() never released
the connection when closeQuery() failed on a cancelled context
(ClickHouse/clickhouse-go#1795). Both leak slots until the pool is exhausted and every
query fails with 'acquire conn timeout'. Both are fixed in v2.44.0.

v2.44.0 adds HasData() to the driver.Rows interface, which the test mock did not
implement. Swap the mock to the SigNoz fork github.com/SigNoz/clickhouse-go-mock
v0.14.0, which implements HasData() and tracks v2.44.0.

* feat(telemetrystore): emit clickhouse connection-pool metrics

Register OTel observable gauges that report the clickhouse connection-pool stats
from driver.Stats() on each collection cycle:
signoz.telemetrystore.connection.{open,idle,max_open,max_idle}. Plotting open against
max_open makes pool saturation (and leaks like the one fixed in the previous commit)
directly observable in Prometheus.
2026-05-30 01:06:16 +05:30
..