Multi-DSN causes duplicate metric errors for built-in stat queries #296

keithf4 · 2019-08-05T15:35:04Z

Been trying to get Multi-DSN working with 0.5.1, but not having much luck. Was getting these errors when running it as a systemd service in CentOS, but also still getting exact same thing when running directly either with socket

DATA_SOURCE_NAME="postgresql:///keith?host=/tmp/,postgresql:///postgres?host=/tmp/" ./postgres_exporter

or tcp

DATA_SOURCE_NAME="postgresql:///keith?host=localhost&sslmode=disable,postgresql:///postgres?host=localhost&sslmode=disable" ./postgres_exporter

The built in metrics are causing tons of dupe metric errors because there's no label to distinguish them between multiple databases

* collected metric pg_stat_activity_count label:<name:"datname" value:"keith" > label:<name:"server" value:"localhost:5432" > label:<name:"state" value:"disabled" > gauge:<value:0 >  was collected before with the same name and label values
* collected metric pg_stat_activity_max_tx_duration label:<name:"datname" value:"keith" > label:<name:"server" value:"localhost:5432" > label:<name:"state" value:"disabled" > gauge:<value:0 >  was collected before with the same name and label values
* collected metric pg_stat_bgwriter_checkpoints_timed label:<name:"server" value:"localhost:5432" > counter:<value:13676 >  was collected before with the same name and label values
* collected metric pg_stat_bgwriter_checkpoints_req label:<name:"server" value:"localhost:5432" > counter:<value:3 >  was collected before with the same name and label values
* collected metric pg_stat_bgwriter_checkpoint_write_time label:<name:"server" value:"localhost:5432" > counter:<value:5.494712e+06 >  was collected before with the same name and label values

Disabling built-in metrics does not help with this either since it does the same thing with the pg_settings stuff that's always output

DATA_SOURCE_NAME="postgresql:///keith?host=localhost&sslmode=disable,postgresql:///postgres?host=localhost&sslmode=disable" ./postgres_exporter --disable-default-metrics

* collected metric pg_settings_wal_block_size label:<name:"server" value:"localhost:5432" > gauge:<value:8192 >  was collected before with the same name and label values
* collected metric pg_settings_wal_buffers_bytes label:<name:"server" value:"localhost:5432" > gauge:<value:4.194304e+06 >  was collected before with the same name and label values
* collected metric pg_settings_wal_compression label:<name:"server" value:"localhost:5432" > gauge:<value:0 >  was collected before with the same name and label values
* collected metric pg_settings_wal_keep_segments label:<name:"server" value:"localhost:5432" > gauge:<value:0 >  was collected before with the same name and label values

Am I missing something here?

The text was updated successfully, but these errors were encountered:

keithf4 · 2019-08-05T15:40:29Z

Just noticed that there is also a --disable-settings-metrics option as well. Disabling both built-in queries and settings seems to at least allow the exporter to run and then use custom queries then.

karlseguin · 2019-08-28T03:05:54Z

Same problem. The crux of the issue is that auto-discovery doesn't work with anything that produces metrics for multiple databases, including the built-in metrics, the built-in settings and any custom metrics.

I changed the for loop at https://github.com/wrouesnel/postgres_exporter/blob/238f5c099af62ec32fb7c511361fd616e3998f2f/cmd/postgres_exporter/postgres_exporter.go#L1315 to be:

for i, dsn := range dsns {
  if i == 1 {
    e.disableDefaultMetrics = true
    e.disableSettingsMetrics = true
  }
  ...

So that the built-in stuff is only run for the first discovered database.

However, any multi-DB metrics in the extended query has the same problem.

I feel there are two real solutions to this problem:
1 - When auto-discovery is enabled, expose a $database variable to the extended query, allowing it to be a per-database file
2 - Create some higher level first class support for "postgres" metrics and "database metrics" and running in two phases (first collecting postgres metrics including the possibilty of having postgres-wide extended queries) then doing per-database metrics (with a different extended query file)

sergeypugachov · 2019-09-20T12:35:52Z

I make a work around for it. In extended queries (queries.yaml) add current_database() in query, and use it value as a label.

SELECT relname, current_database() as datname, ... FROM pg_statio_user_tables

So, i have one pg_exporter for default metrics and settings, and another one for extended queries to databases. Last one starts with --disable-default-metrics --disable-settings-metrics --auto-discover-databases flags.

So far so good on 0.5.1

chrisdrew1 · 2019-10-14T16:43:57Z

So, the workaround is to have 2 exporters? Better than a 1<>1 ration to databases I guess.

hpurmann · 2019-11-25T12:41:25Z

We're seeing the same issue. Is this commit related? deac1c3

wrouesnel · 2019-11-25T13:25:01Z

This should be resolved now in 0.8.0 since multi-DSN support has had a bunch of work since then.

hpurmann · 2019-11-25T14:58:03Z

Thanks, I can confirm that I don't see errors on the /metrics page about duplicated metric names with 0.8.0 anymore as opposed to 0.7.0.

robbiet480 · 2020-05-22T04:07:55Z

I've just experienced this issue with AWS RDS Aurora PostgreSQL. I have one writer and two readers on my cluster. Pointing to a specific reader endpoint or the cluster read only endpoint instead of the cluster read/write endpoint killed the errors for me. That's not great though, because it seems i'm only getting pg_stat_statements metrics for reads/that instance.

EDIT: Figured it out: my postgres_exporter user wasn't able to see the queryid on every row of pg_stat_statements and therefore there were duplicates. Fixed it by modifying the non-superuser queries at #398.

pergh · 2020-05-28T11:47:20Z

Just experienced issue when having 2 DSNs. Separately they work fine.

panic: descriptors reported by collector have inconsistent label names or help strings for the same fully-qualified name, offender is Desc{fqName: "pg_settings_archive_timeout_seconds", help: "Forces a switch to the next xlog file if a new file has not been started within N seconds. [Units converted to seconds.]", constLabels: {server="hostname:5432"}, variableLabels: []}

Happens both on "v0.8.0" and "latest".

jdhanani1-zz mentioned this issue Nov 4, 2019

Getting "collected metrics xxx was collected before with the same name and label values" #327

Closed

daxmc99 mentioned this issue Jan 4, 2021

Update wrouesnel/postgres_exporter Docker tag to v0.8.0 sourcegraph/sourcegraph-public-snapshot#13601

Closed

1 task

mjnman mentioned this issue Mar 29, 2023

Errors with multi-DSN features iamseth/oracledb_exporter#293

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-DSN causes duplicate metric errors for built-in stat queries #296

Multi-DSN causes duplicate metric errors for built-in stat queries #296

keithf4 commented Aug 5, 2019

keithf4 commented Aug 5, 2019

karlseguin commented Aug 28, 2019

sergeypugachov commented Sep 20, 2019 •

edited

Loading

chrisdrew1 commented Oct 14, 2019

hpurmann commented Nov 25, 2019

wrouesnel commented Nov 25, 2019

hpurmann commented Nov 25, 2019

robbiet480 commented May 22, 2020 •

edited

Loading

pergh commented May 28, 2020

Multi-DSN causes duplicate metric errors for built-in stat queries #296

Multi-DSN causes duplicate metric errors for built-in stat queries #296

Comments

keithf4 commented Aug 5, 2019

keithf4 commented Aug 5, 2019

karlseguin commented Aug 28, 2019

sergeypugachov commented Sep 20, 2019 • edited Loading

chrisdrew1 commented Oct 14, 2019

hpurmann commented Nov 25, 2019

wrouesnel commented Nov 25, 2019

hpurmann commented Nov 25, 2019

robbiet480 commented May 22, 2020 • edited Loading

pergh commented May 28, 2020

sergeypugachov commented Sep 20, 2019 •

edited

Loading

robbiet480 commented May 22, 2020 •

edited

Loading