-
Notifications
You must be signed in to change notification settings - Fork 160
Cache and print devices for debugging future outages #2097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Cache and print devices for debugging future outages #2097
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: julianKatz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3a8aba1
to
8f34a1f
Compare
/ok-to-test |
d4bde0a
to
e38ef72
Compare
/lgtm |
New changes are detected. LGTM label has been removed. |
5330d24
to
ae96677
Compare
changes and printing the full list. example: periodic symlink cache read: /dev/disk/by-id/google-persistent-disk-0 -> /dev/sda; /dev/disk/by-id/google-pvc-f5418f78-dc07-4d69-9487-6c4a7232dd67 -> /dev/sdb; /dev/disk/by-id/scsi-0Google_PersistentDisk_persistent-disk-0 -> /dev/sda; /dev/disk/by-id/scsi-0Google_PersistentDisk_pvc-f5418f78-dc07-4d69-9487-6c4a7232dd67 -> /dev/sdb
easier to unit test.
0644523
to
e58b7e5
Compare
@julianKatz: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind cleanup # Is this right??
What this PR does / why we need it:
This PR adds a cache that periodically (configured to every minute currently) looks at the
/dev/disk/by-id/
directory and evaluates the symlinks there. It maintains a cache of the symlink and the real path it points to.This will help with debugging future filesystem issues. In a past OMG, we found that our insight into changes in symlinks for specific disks hampered our ability to debug. Logging marked the real path of the disk at mount and unmount, but the change in between couldn't be detected.
This PR will print those links every minute, also logging when elements of the cache change.
An example:
The cache will also note if a symlink is broken.
NOTE: Currently this filters out any thing in
by-id/
that ends with-part[0-9]*$
. This removes partitions, which are noise. Mounting partitions directly isn't well supported in GKE, but we may want to test that in the future.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: