Skip to content

Multi-DSN causes duplicate metric errors for built-in stat queries #296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
keithf4 opened this issue Aug 5, 2019 · 9 comments
Open

Multi-DSN causes duplicate metric errors for built-in stat queries #296

keithf4 opened this issue Aug 5, 2019 · 9 comments

Comments

@keithf4
Copy link

keithf4 commented Aug 5, 2019

Been trying to get Multi-DSN working with 0.5.1, but not having much luck. Was getting these errors when running it as a systemd service in CentOS, but also still getting exact same thing when running directly either with socket

DATA_SOURCE_NAME="postgresql:///keith?host=/tmp/,postgresql:///postgres?host=/tmp/" ./postgres_exporter

or tcp

DATA_SOURCE_NAME="postgresql:///keith?host=localhost&sslmode=disable,postgresql:///postgres?host=localhost&sslmode=disable" ./postgres_exporter

The built in metrics are causing tons of dupe metric errors because there's no label to distinguish them between multiple databases

* collected metric pg_stat_activity_count label:<name:"datname" value:"keith" > label:<name:"server" value:"localhost:5432" > label:<name:"state" value:"disabled" > gauge:<value:0 >  was collected before with the same name and label values
* collected metric pg_stat_activity_max_tx_duration label:<name:"datname" value:"keith" > label:<name:"server" value:"localhost:5432" > label:<name:"state" value:"disabled" > gauge:<value:0 >  was collected before with the same name and label values
* collected metric pg_stat_bgwriter_checkpoints_timed label:<name:"server" value:"localhost:5432" > counter:<value:13676 >  was collected before with the same name and label values
* collected metric pg_stat_bgwriter_checkpoints_req label:<name:"server" value:"localhost:5432" > counter:<value:3 >  was collected before with the same name and label values
* collected metric pg_stat_bgwriter_checkpoint_write_time label:<name:"server" value:"localhost:5432" > counter:<value:5.494712e+06 >  was collected before with the same name and label values

Disabling built-in metrics does not help with this either since it does the same thing with the pg_settings stuff that's always output

DATA_SOURCE_NAME="postgresql:///keith?host=localhost&sslmode=disable,postgresql:///postgres?host=localhost&sslmode=disable" ./postgres_exporter --disable-default-metrics
* collected metric pg_settings_wal_block_size label:<name:"server" value:"localhost:5432" > gauge:<value:8192 >  was collected before with the same name and label values
* collected metric pg_settings_wal_buffers_bytes label:<name:"server" value:"localhost:5432" > gauge:<value:4.194304e+06 >  was collected before with the same name and label values
* collected metric pg_settings_wal_compression label:<name:"server" value:"localhost:5432" > gauge:<value:0 >  was collected before with the same name and label values
* collected metric pg_settings_wal_keep_segments label:<name:"server" value:"localhost:5432" > gauge:<value:0 >  was collected before with the same name and label values

Am I missing something here?

@keithf4
Copy link
Author

keithf4 commented Aug 5, 2019

Just noticed that there is also a --disable-settings-metrics option as well. Disabling both built-in queries and settings seems to at least allow the exporter to run and then use custom queries then.

@karlseguin
Copy link

Same problem. The crux of the issue is that auto-discovery doesn't work with anything that produces metrics for multiple databases, including the built-in metrics, the built-in settings and any custom metrics.

I changed the for loop at https://github.com/wrouesnel/postgres_exporter/blob/238f5c099af62ec32fb7c511361fd616e3998f2f/cmd/postgres_exporter/postgres_exporter.go#L1315 to be:

for i, dsn := range dsns {
  if i == 1 {
    e.disableDefaultMetrics = true
    e.disableSettingsMetrics = true
  }
  ...

So that the built-in stuff is only run for the first discovered database.

However, any multi-DB metrics in the extended query has the same problem.

I feel there are two real solutions to this problem:
1 - When auto-discovery is enabled, expose a $database variable to the extended query, allowing it to be a per-database file
2 - Create some higher level first class support for "postgres" metrics and "database metrics" and running in two phases (first collecting postgres metrics including the possibilty of having postgres-wide extended queries) then doing per-database metrics (with a different extended query file)

@sergeypugachov
Copy link

sergeypugachov commented Sep 20, 2019

I make a work around for it. In extended queries (queries.yaml) add current_database() in query, and use it value as a label.

SELECT relname, current_database() as datname, ... FROM pg_statio_user_tables

So, i have one pg_exporter for default metrics and settings, and another one for extended queries to databases. Last one starts with --disable-default-metrics --disable-settings-metrics --auto-discover-databases flags.

So far so good on 0.5.1

@chrisdrew1
Copy link

So, the workaround is to have 2 exporters? Better than a 1<>1 ration to databases I guess.

@hpurmann
Copy link

We're seeing the same issue. Is this commit related? deac1c3

@wrouesnel
Copy link
Contributor

This should be resolved now in 0.8.0 since multi-DSN support has had a bunch of work since then.

@hpurmann
Copy link

Thanks, I can confirm that I don't see errors on the /metrics page about duplicated metric names with 0.8.0 anymore as opposed to 0.7.0.

@robbiet480
Copy link
Contributor

robbiet480 commented May 22, 2020

I've just experienced this issue with AWS RDS Aurora PostgreSQL. I have one writer and two readers on my cluster. Pointing to a specific reader endpoint or the cluster read only endpoint instead of the cluster read/write endpoint killed the errors for me. That's not great though, because it seems i'm only getting pg_stat_statements metrics for reads/that instance.

EDIT: Figured it out: my postgres_exporter user wasn't able to see the queryid on every row of pg_stat_statements and therefore there were duplicates. Fixed it by modifying the non-superuser queries at #398.

@pergh
Copy link

pergh commented May 28, 2020

Just experienced issue when having 2 DSNs. Separately they work fine.

panic: descriptors reported by collector have inconsistent label names or help strings for the same fully-qualified name, offender is Desc{fqName: "pg_settings_archive_timeout_seconds", help: "Forces a switch to the next xlog file if a new file has not been started within N seconds. [Units converted to seconds.]", constLabels: {server="hostname:5432"}, variableLabels: []}

Happens both on "v0.8.0" and "latest".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants