Skip to content

Redis Lock is only aquired after configured lock timeout although lock is not held by any other instance #3716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
muellerml opened this issue Feb 3, 2022 · 7 comments

Comments

@muellerml
Copy link

In what version(s) of Spring Integration are you seeing this issue?

Since 5.5.8 (could also be existing in version 5.5.7)

Describe the bug

Since the upgrade to Spring Boot 2.6.3 (which includes Spring Integration 5.5.8), we see some unexpected behaviour during our internal integration tests.
We have deployed two instances of our service and retrieve in them locks for short duration. Since the upgrade we notice that we sometimes run in a condition where the lock is already released from the first instance but the second instance fails to acquire it immediately and waits for the lock to timeout before actually acquiring it, although no other instance is holding the lock now.
From our internal logging it seems like the release of the lock and the attempt to acquire the lock happens nearly at the same time, but we are not really able to verify that exactly.

To Reproduce

No consistent workflow to reproduce the bug available.

Expected behavior

Locks are aquired by the second instance immediately.

@muellerml muellerml added status: waiting-for-triage The issue need to be evaluated and its future decided type: bug labels Feb 3, 2022
@artembilan
Copy link
Member

@Meteorkor ,

would you mind to take a look here?
Looks like we still have some race condition...

Thanks

@Meteorkor
Copy link
Contributor

@muellerml
Could you please upload an approximate sample code of how the code is being used in each instance?

@muellerml
Copy link
Author

I'm really struggling to compile an example. We are doing nothing too fancy here. Just acquiring the lock in various service methods to perform some computations.

I'm currently trying to debug deeper into the issue. When I added some logging to the RedisLockRegistry and the RedisUnLockNotifyMessageListener our builds got reliably green again. To me it seems like the Future.get() call blocks for the whole time although no one else holds the lock. Trying to find some sweet spot now where I can better trace the issue, will continue with this after the weekend.

@artembilan
Copy link
Member

Thanks for the explanation, @muellerml !

Just to be sure that we are on the same page: are you sure that all your instances are upgraded to the same latest version for this spring-integration-redis?

@muellerml
Copy link
Author

muellerml commented Feb 4, 2022

Pretty sure, nothing (logs, metrics, kubectl) indicates that there is another instance with an older version running although that would very well explain the behavior we experience. As I said before I will further investigate the behavior next week.

Just some guessing what maybe could happen:

  1. First lock is going to subscribeLock() method and initializes (=starts) the RedisMessageListenerContainer
  2. In a different thread asecond lock with different name "directly" after the first lock enters the subscribeLock() method and sees that the container is already running (https://github.com/spring-projects/spring-data-redis/blob/main/src/main/java/org/springframework/data/redis/listener/RedisMessageListenerContainer.java#L210), as running is set to true directly at the beginning of the initialization of the RedisMessageListenerContainer
  3. Second lock subscribes to the messages but the subscriptions will only be active once the MessageListenerContainer is running (https://github.com/spring-projects/spring-data-redis/blob/main/src/main/java/org/springframework/data/redis/listener/RedisMessageListenerContainer.java#L218 and https://github.com/spring-projects/spring-data-redis/blob/4b645ccf85550d96ac67a44ba9abeb0ab45d6e4b/src/main/java/org/springframework/data/redis/listener/RedisMessageListenerContainer.java#L485)
  4. In the mean time the lock is released by the other running instance but the message is missed by the instance waiting for the initialization of the RedisMessageListenerContainer

@Meteorkor
Copy link
Contributor

@muellerml
Thanks for the explanation.

First lock is going to subscribeLock() method and initializes (=starts) the RedisMessageListenerContainer

If the problem occurs only with locks requested before being initialized by the first lock
It doesn't seem to happen after initialization is complete.
Is the current situation no longer occurring after it is initialized?

The initialization part of RedisMessageListenerContainer doesn't take long, but in the case you mentioned, i want to change it because we can miss the message.

@muellerml
Copy link
Author

Yes it seems like it. I didn't think of the initialization process as being the issue until I added some debug log statements and looked in the code in RedisLockRegistry more in detail. As our tests are also executed in parallel and some are doing the same steps nearly at the same time, this explains why we frequently ran into the issue.

I just started to also find out there are debug logs for when the RedisMessageListenerContainer is started and this confirms my assumption:
2022-02-07 07:18:03.331 Started RedisMessageListenerContainer
2022-02-07 07:18:03.157 Released lock; RedisLock
2022-02-07 07:18:03.115 Subscribed to lock ...

I can relate any issue we faced to this behaviour, so I don't think there's another issue after the RedisMessageListenerContainer is initialized.

@artembilan artembilan added this to the 6.0 M2 milestone Feb 7, 2022
@artembilan artembilan added backport 5.5.x and removed status: waiting-for-triage The issue need to be evaluated and its future decided labels Feb 7, 2022
artembilan pushed a commit that referenced this issue Feb 8, 2022
Fixes #3716

If the `redisMessageListenerContainer` is starting, waiting for it to complete without doing `subscribeUnlock()`

* Introduce `isRunningRedisMessageListenerContainer` state since the `running` in the `RedisMessageListenerContainer` is set in the beginning of the `start()` misleading on the concurrent calls to the `RedisLockRegistry`

**Cherry-pick to `5.5.x`**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants