Redis Lock is only aquired after configured lock timeout although lock is not held by any other instance #3716

muellerml · 2022-02-03T07:24:17Z

In what version(s) of Spring Integration are you seeing this issue?

Since 5.5.8 (could also be existing in version 5.5.7)

Describe the bug

Since the upgrade to Spring Boot 2.6.3 (which includes Spring Integration 5.5.8), we see some unexpected behaviour during our internal integration tests.
We have deployed two instances of our service and retrieve in them locks for short duration. Since the upgrade we notice that we sometimes run in a condition where the lock is already released from the first instance but the second instance fails to acquire it immediately and waits for the lock to timeout before actually acquiring it, although no other instance is holding the lock now.
From our internal logging it seems like the release of the lock and the attempt to acquire the lock happens nearly at the same time, but we are not really able to verify that exactly.

To Reproduce

No consistent workflow to reproduce the bug available.

Expected behavior

Locks are aquired by the second instance immediately.

artembilan · 2022-02-03T14:43:16Z

@Meteorkor ,

would you mind to take a look here?
Looks like we still have some race condition...

Thanks

Meteorkor · 2022-02-03T22:40:02Z

@muellerml
Could you please upload an approximate sample code of how the code is being used in each instance?

muellerml · 2022-02-04T16:35:48Z

I'm really struggling to compile an example. We are doing nothing too fancy here. Just acquiring the lock in various service methods to perform some computations.

I'm currently trying to debug deeper into the issue. When I added some logging to the RedisLockRegistry and the RedisUnLockNotifyMessageListener our builds got reliably green again. To me it seems like the Future.get() call blocks for the whole time although no one else holds the lock. Trying to find some sweet spot now where I can better trace the issue, will continue with this after the weekend.

artembilan · 2022-02-04T16:38:08Z

Thanks for the explanation, @muellerml !

Just to be sure that we are on the same page: are you sure that all your instances are upgraded to the same latest version for this spring-integration-redis?

muellerml · 2022-02-04T17:52:51Z

Pretty sure, nothing (logs, metrics, kubectl) indicates that there is another instance with an older version running although that would very well explain the behavior we experience. As I said before I will further investigate the behavior next week.

Just some guessing what maybe could happen:

First lock is going to subscribeLock() method and initializes (=starts) the RedisMessageListenerContainer
In a different thread asecond lock with different name "directly" after the first lock enters the subscribeLock() method and sees that the container is already running (https://github.com/spring-projects/spring-data-redis/blob/main/src/main/java/org/springframework/data/redis/listener/RedisMessageListenerContainer.java#L210), as running is set to true directly at the beginning of the initialization of the RedisMessageListenerContainer
Second lock subscribes to the messages but the subscriptions will only be active once the MessageListenerContainer is running (https://github.com/spring-projects/spring-data-redis/blob/main/src/main/java/org/springframework/data/redis/listener/RedisMessageListenerContainer.java#L218 and https://github.com/spring-projects/spring-data-redis/blob/4b645ccf85550d96ac67a44ba9abeb0ab45d6e4b/src/main/java/org/springframework/data/redis/listener/RedisMessageListenerContainer.java#L485)
In the mean time the lock is released by the other running instance but the message is missed by the instance waiting for the initialization of the RedisMessageListenerContainer

Meteorkor · 2022-02-05T13:17:27Z

@muellerml
Thanks for the explanation.

First lock is going to subscribeLock() method and initializes (=starts) the RedisMessageListenerContainer

If the problem occurs only with locks requested before being initialized by the first lock
It doesn't seem to happen after initialization is complete.
Is the current situation no longer occurring after it is initialized?

The initialization part of RedisMessageListenerContainer doesn't take long, but in the case you mentioned, i want to change it because we can miss the message.

muellerml · 2022-02-07T07:38:34Z

Yes it seems like it. I didn't think of the initialization process as being the issue until I added some debug log statements and looked in the code in RedisLockRegistry more in detail. As our tests are also executed in parallel and some are doing the same steps nearly at the same time, this explains why we frequently ran into the issue.

I just started to also find out there are debug logs for when the RedisMessageListenerContainer is started and this confirms my assumption:
2022-02-07 07:18:03.331 Started RedisMessageListenerContainer
2022-02-07 07:18:03.157 Released lock; RedisLock
2022-02-07 07:18:03.115 Subscribed to lock ...

I can relate any issue we faced to this behaviour, so I don't think there's another issue after the RedisMessageListenerContainer is initialized.

Fixes #3716 If the `redisMessageListenerContainer` is starting, waiting for it to complete without doing `subscribeUnlock()` * Introduce `isRunningRedisMessageListenerContainer` state since the `running` in the `RedisMessageListenerContainer` is set in the beginning of the `start()` misleading on the concurrent calls to the `RedisLockRegistry` **Cherry-pick to `5.5.x`**

muellerml added status: waiting-for-triage The issue need to be evaluated and its future decided type: bug labels Feb 3, 2022

artembilan added the in: redis label Feb 3, 2022

Meteorkor added a commit to Meteorkor/spring-integration that referenced this issue Feb 5, 2022

spring-projectsGH-3716: Fix wait for init redisMessageListenerContainer

012c9ed

Meteorkor mentioned this issue Feb 5, 2022

GH-3716: Fix wait for init redisMessageListenerContainer #3719

Merged

artembilan added this to the 6.0 M2 milestone Feb 7, 2022

artembilan added backport 5.5.x and removed status: waiting-for-triage The issue need to be evaluated and its future decided labels Feb 7, 2022

artembilan closed this as completed in 291fe34 Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redis Lock is only aquired after configured lock timeout although lock is not held by any other instance #3716

Redis Lock is only aquired after configured lock timeout although lock is not held by any other instance #3716

muellerml commented Feb 3, 2022

artembilan commented Feb 3, 2022

Uh oh!

Meteorkor commented Feb 3, 2022

Uh oh!

muellerml commented Feb 4, 2022

Uh oh!

artembilan commented Feb 4, 2022

Uh oh!

muellerml commented Feb 4, 2022 •

edited

Loading

Uh oh!

Meteorkor commented Feb 5, 2022

Uh oh!

muellerml commented Feb 7, 2022

Uh oh!

Redis Lock is only aquired after configured lock timeout although lock is not held by any other instance #3716

Redis Lock is only aquired after configured lock timeout although lock is not held by any other instance #3716

Comments

muellerml commented Feb 3, 2022

artembilan commented Feb 3, 2022

Uh oh!

Meteorkor commented Feb 3, 2022

Uh oh!

muellerml commented Feb 4, 2022

Uh oh!

artembilan commented Feb 4, 2022

Uh oh!

muellerml commented Feb 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Meteorkor commented Feb 5, 2022

Uh oh!

muellerml commented Feb 7, 2022

Uh oh!

muellerml commented Feb 4, 2022 •

edited

Loading