Skip to content

Panic on driver startup when using --http-endpoint in v1.15.3 #1894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Fricounet opened this issue Dec 17, 2024 · 0 comments · Fixed by #1895
Closed

Panic on driver startup when using --http-endpoint in v1.15.3 #1894

Fricounet opened this issue Dec 17, 2024 · 0 comments · Fixed by #1895

Comments

@Fricounet
Copy link
Contributor

Starting v1.15.3, and when specifying the --http-endpoint flag in the controller for prometheus metrics, the driver panics on startup with the following logs:

❯ docker run -it gcp-pd-csi-driver:test --http-endpoint=localhost:8080
I1217 18:06:19.372371       1 main.go:114] Operating compute environment set to: production and computeEndpoint is set to: <nil>
I1217 18:06:19.372771       1 main.go:123] Sys info: NumCPU: 20 MAXPROC: 1
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x181bb17]

goroutine 1 [running]:
k8s.io/component-base/metrics.(*CounterVec).Create(0x10?, 0xc00059fc58?)
	<autogenerated>:1 +0x17
k8s.io/component-base/metrics.(*kubeRegistry).MustRegister(0xc0004ce1c0, {0xc000439b70, 0x1, 0x2065332?})
	/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/vendor/k8s.io/component-base/metrics/registry.go:169 +0xbe
sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/pkg/metrics.(*MetricsManager).RegisterPDCSIMetric(...)
	/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/pkg/metrics/metrics.go:83
main.handle()
	/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/cmd/gce-pd-csi-driver/main.go:150 +0x483
main.main()
	/go/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/cmd/gce-pd-csi-driver/main.go:115 +0x18a

I can reproduce the issue pretty consistently with the following setup:

  • docker buildx build --platform "linux/amd64" --tag gcp-pd-csi-driver:test --build-arg=BUILDPLATFORM=linux --build-arg=STAGINGVERSION=v1.15.3 -f Dockerfile --load .
  • docker run -it gcp-pd-csi-driver:test --http-endpoint=localhost:8080

And after running some git bisect, I found that the fist bad commit is this one:

❯ git bisect bad
984b61c6d4e54ba3685d1383ce8ffa5111b2ce57 is the first bad commit
commit 984b61c6d4e54ba3685d1383ce8ffa5111b2ce57 (HEAD)
Author: Peter Schuurman <[email protected]>
Date:   Tue Nov 26 15:17:46 2024 -0800

    Migrate metric defer() statements to gRPC metric interceptor. This allows for more accurate error code reporting if gRPC functionality is refactored

 cmd/gce-pd-csi-driver/main.go               |   4 +++-
 pkg/gce-pd-csi-driver/controller.go         | 102 +++++++++++++----------------------------------------------------------------
 pkg/gce-pd-csi-driver/gce-pd-driver.go      |   5 ++--
 pkg/gce-pd-csi-driver/gce-pd-driver_test.go |  11 ++++++---
 pkg/gce-pd-csi-driver/server.go             |  23 ++++++++++++------
 pkg/gce-pd-csi-driver/server_test.go        | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pkg/metrics/interceptor.go                  |  23 ++++++++++++++++++
 pkg/metrics/metadata.go                     |  55 ++++++++++++++++++++++++++++++++++++++++++
 pkg/metrics/metrics.go                      |  25 +++++++------------
 pkg/metrics/metrics_test.go                 |  17 +++++++------
 pkg/metrics/metrics_test_util.go            |  23 ++++++++++++++++++
 test/sanity/sanity_test.go                  |   2 +-
 12 files changed, 379 insertions(+), 123 deletions(-)
 create mode 100644 pkg/gce-pd-csi-driver/server_test.go
 create mode 100644 pkg/metrics/interceptor.go
 create mode 100644 pkg/metrics/metadata.go
 create mode 100644 pkg/metrics/metrics_test_util.go

I haven't found what is causing the issue in the commit yet but I figured out that I could raise the issue so that other folks can investigate too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant