-
Notifications
You must be signed in to change notification settings - Fork 19
Auto refresh a list of cluster nodes in TarantoolClusterClient #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Since we now have TarantoolClusterClient, we could add the following to it: The procedure called on Tarantool side to determine the list of available nodes must be pluggable, since we have several use-cases (sharded cluster, replicaset, 'cluster' module etc.) We may consider round robin between available instances |
Notes from meeting with Dmitry (@sharonovd) (Jan, 31).
|
I thought about that again and decided that there are no much sense to support a list of instances here. There are three possible situations when we need to switch from one instance to an another one to obtain a list of node in a cluster, all are about unavailability of the original node: accidental node downtime, planned maintanence on the node, cluster updating with moving out the node. Accidental node downtime will break fetching a list of nodes, but DQL/DML requests will be switched by a cluster client onto another node. The nodes list updating is not a final goal; it'll stuck for some time, but DQL/DML will work. So this case does not show us any reason to expand 'info node' to a list of such nodes. Planned maintanence when the node is temporary down is quite same. When we want to move the node out of a cluster it is some kind of planned work. Expanding 'info node' to a list would not save us from a situation when all nodes in the list will be down (sooner or later). So we anyway should include updating of an info node address / list into the plan for such cluster reconstruction. A list does not add any value here. |
Ok, the final algorithm will consist from following steps:
|
There is the question how to handle the situation when a new list of nodes There are the following ways to handle this situation:
Third way requires to store two sockets in a cluster client and handle both. I don't sure it was right thing to inherit TarantoolClusterClient from I would say that 2nd way should be less intruisive considering our current |
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Now I understood that I missed the case when we initially connect a cluster client. At this point it worth to have several addresses to bootstrap a cluster client, because it is possible that one node is down at the moment. Also there is no much need to split discovery and DQL/DML nodes when both are lists. |
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: tarantool#34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: tarantool#34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: tarantool#34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: tarantool#34
- Avoid a possible race between reading, writing and reconnecting threads when a reconnection process is started. It might have happened that the lagged thread (reading or writing) could reset the state to RECONNECT after the reconnecting thread has already started and set the state to 0. As a result, all next attempts to reconnect will never happen. Now the reconnect thread holds on the state as long as it is required. - Avoid another possible race between reading and writing threads when they are started during the reconnection process. It might have happened that one of the threads crashed when it was starting and another slightly lagged thread set up its flag. It could have led that the reconnecting thread saw RECONNECT + R/W state instead of pure RECONNECT. Again, this case broke down all next reconnection attempts. Now reading and writing threads take into account whether RECONNECT state is already set or not. - Avoid LockSupport class usage for a thread to be suspended and woken up. Actually, LockSupport is more like an internal component to build high-level blocking primitives. It is not recommended using this class directly. It was replaced by ReentrantLock.Condition primitive based on LockSupport but which has proper LockSupport usage inside. Fixes: #142 Addects: #34, #136
- Avoid a possible race between reading, writing and reconnecting threads when a reconnection process is started. It might have happened that the lagged thread (reading or writing) could reset the state to RECONNECT after the reconnecting thread has already started and set the state to 0. As a result, all next attempts to reconnect will never happen. Now the reconnect thread holds on the state as long as it is required. - Avoid another possible race between reading and writing threads when they are started during the reconnection process. It might have happened that one of the threads crashed when it was starting and another slightly lagged thread set up its flag. It could have led that the reconnecting thread saw RECONNECT + R/W state instead of pure RECONNECT. Again, this case broke down all next reconnection attempts. Now reading and writing threads take into account whether RECONNECT state is already set or not. - Avoid LockSupport class usage for a thread to be suspended and woken up. Actually, LockSupport is more like an internal component to build high-level blocking primitives. It is not recommended using this class directly. It was replaced by ReentrantLock.Condition primitive based on LockSupport but which has proper LockSupport usage inside. Fixes: #142 Affects: #34, #136
- Avoid a possible race between reading, writing and reconnecting threads when a reconnection process is started. It might have happened that the lagged thread (reading or writing) could reset the state to RECONNECT after the reconnecting thread has already started and set the state to 0. As a result, all next attempts to reconnect will never happen. Now the reconnect thread holds on the state as long as it is required. - Avoid another possible race between reading and writing threads when they are started during the reconnection process. It might have happened that one of the threads crashed when it was starting and another slightly lagged thread set up its flag. It could have led that the reconnecting thread saw RECONNECT + R/W state instead of pure RECONNECT. Again, this case broke down all next reconnection attempts. Now reading and writing threads take into account whether RECONNECT state is already set or not. - Replace LockSupport with ReentrantLock.Condition for a thread to be suspended and woken up. Our cluster tests and standalone demo app show that LockSupport is not a safe memory barrier as it could be. The reconnect thread relies on a visibility guarantee between park-unpark invocations which, actually, sometimes doesn't work. Also, according to java-docs LockSupport is more like an internal component to build high-level blocking primitives. It is not recommended using this class directly. It was replaced by ReentrantLock.Condition primitive based on LockSupport but which has proper LockSupport usage inside. Fixes: #142 Affects: #34, #136
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
- Avoid a possible race between reading, writing and reconnecting threads when a reconnection process is started. It might have happened that the lagged thread (reading or writing) could reset the state to RECONNECT after the reconnecting thread has already started and set the state to 0. As a result, all next attempts to reconnect will never happen. Now the reconnect thread holds on the state as long as it is required. - Avoid another possible race between reading and writing threads when they are started during the reconnection process. It might have happened that one of the threads crashed when it was starting and another slightly lagged thread set up its flag. It could have led that the reconnecting thread saw RECONNECT + R/W state instead of pure RECONNECT. Again, this case broke down all next reconnection attempts. Now reading and writing threads take into account whether RECONNECT state is already set or not. - Replace LockSupport with ReentrantLock.Condition for a thread to be suspended and woken up. Our cluster tests and standalone demo app show that LockSupport is not a safe memory barrier as it could be. The reconnect thread relies on a visibility guarantee between park-unpark invocations which, actually, sometimes doesn't work. Also, according to java-docs LockSupport is more like an internal component to build high-level blocking primitives. It is not recommended using this class directly. It was replaced by ReentrantLock.Condition primitive based on LockSupport but which has proper LockSupport usage inside. Fixes: #142 Affects: #34, #136
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34
- Avoid a possible race between reading, writing and reconnecting threads when a reconnection process is started. It might have happened that the lagged thread (reading or writing) could reset the state to RECONNECT after the reconnecting thread has already started and set the state to 0. As a result, all next attempts to reconnect will never happen. Now the reconnect thread holds on the state as long as it is required. - Avoid another possible race between reading and writing threads when they are started during the reconnection process. It might have happened that one of the threads crashed when it was starting and another slightly lagged thread set up its flag. It could have led that the reconnecting thread saw RECONNECT + R/W state instead of pure RECONNECT. Again, this case broke down all next reconnection attempts. Now reading and writing threads take into account whether RECONNECT state is already set or not. - Replace LockSupport with ReentrantLock.Condition for a thread to be suspended and woken up. Our cluster tests and standalone demo app show that LockSupport is not a safe memory barrier as it could be. The reconnect thread relies on a visibility guarantee between park-unpark invocations which, actually, sometimes doesn't work. Also, according to java-docs LockSupport is more like an internal component to build high-level blocking primitives. It is not recommended using this class directly. It was replaced by ReentrantLock.Condition primitive based on LockSupport but which has proper LockSupport usage inside. Fixes: #142 Affects: #34, #136
- Avoid a possible race between reading, writing and reconnecting threads when a reconnection process is started. It might have happened that the lagged thread (reading or writing) could reset the state to RECONNECT after the reconnecting thread has already started and set the state to 0. As a result, all next attempts to reconnect will never happen. Now the reconnect thread holds on the state as long as it is required. - Avoid another possible race between reading and writing threads when they are started during the reconnection process. It might have happened that one of the threads crashed when it was starting and another slightly lagged thread set up its flag. It could have led that the reconnecting thread saw RECONNECT + R/W state instead of pure RECONNECT. Again, this case broke down all next reconnection attempts. Now reading and writing threads take into account whether RECONNECT state is already set or not. - Replace LockSupport with ReentrantLock.Condition for a thread to be suspended and woken up. Our cluster tests and standalone demo app show that LockSupport is not a safe memory barrier as it could be. The reconnect thread relies on a visibility guarantee between park-unpark invocations which, actually, sometimes doesn't work. Also, according to java-docs LockSupport is more like an internal component to build high-level blocking primitives. It is not recommended using this class directly. It was replaced by ReentrantLock.Condition primitive based on LockSupport but which has proper LockSupport usage inside. Fixes: #142 Affects: #34, #136
- Avoid a possible race between reading, writing and reconnecting threads when a reconnection process is started. It might have happened that the lagged thread (reading or writing) could reset the state to RECONNECT after the reconnecting thread has already started and set the state to 0. As a result, all next attempts to reconnect will never happen. Now the reconnect thread holds on the state as long as it is required. - Avoid another possible race between reading and writing threads when they are started during the reconnection process. It might have happened that one of the threads crashed when it was starting and another slightly lagged thread set up its flag. It could have led that the reconnecting thread saw RECONNECT + R/W state instead of pure RECONNECT. Again, this case broke down all next reconnection attempts. Now reading and writing threads take into account whether RECONNECT state is already set or not. - Replace LockSupport with ReentrantLock.Condition for a thread to be suspended and woken up. Our cluster tests and standalone demo app show that LockSupport is not a safe memory barrier as it could be. The reconnect thread relies on a visibility guarantee between park-unpark invocations which, actually, sometimes doesn't work. Also, according to java-docs LockSupport is more like an internal component to build high-level blocking primitives. It is not recommended using this class directly. It was replaced by ReentrantLock.Condition primitive based on LockSupport but which has proper LockSupport usage inside. Fixes: #142 Affects: #34, #136
- Avoid a possible race between reading, writing and reconnecting threads when a reconnection process is started. It might have happened that the lagged thread (reading or writing) could reset the state to RECONNECT after the reconnecting thread has already started and set the state to 0. As a result, all next attempts to reconnect will never happen. Now the reconnect thread holds on the state as long as it is required. - Avoid another possible race between reading and writing threads when they are started during the reconnection process. It might have happened that one of the threads crashed when it was starting and another slightly lagged thread set up its flag. It could have led that the reconnecting thread saw RECONNECT + R/W state instead of pure RECONNECT. Again, this case broke down all next reconnection attempts. Now reading and writing threads take into account whether RECONNECT state is already set or not. - Replace LockSupport with ReentrantLock.Condition for a thread to be suspended and woken up. Our cluster tests and standalone demo app show that LockSupport is not a safe memory barrier as it could be. The reconnect thread relies on a visibility guarantee between park-unpark invocations which, actually, sometimes doesn't work. Also, according to java-docs LockSupport is more like an internal component to build high-level blocking primitives. It is not recommended using this class directly. It was replaced by ReentrantLock.Condition primitive based on LockSupport but which has proper LockSupport usage inside. Fixes: #142 Affects: #34, #136
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a Tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34 See also: #142
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a Tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34 See also: #142
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a Tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34 See also: #142
A state of a client is a set of the following flags: {READING, WRITING, RECONNECT, CLOSED}. Let's name a state when no flags are set UNINITIALIZED. A reader thread sets READING, performs reading until an error or an interruption, drops READING and tries to trigger reconnection (if a state allows, see below). A writer do quite same things, but with the WRITING flag. The key point here is that a reconnection is triggered from a reader/writer thread and only when certain conditions are met. The prerequisite to set RECONNECT and signal (unpark) a connector thread is that a client has UNINITIALIZED state. There are several problems here: - Say, a reader stalls a bit after dropping READING, then a writer drops WRITING and trigger reconnection. Then reader wokes up and set RECONNECT again. - Calling unpark() N times for a connector thread when it is alive doesn't lead to skipping next N park() calls, so the problem above is not just about extra reconnection, but lead the connector thread to be stuck. - Say, a reader stalls just before setting READING. A writer is hit by an IO error and triggers reconnection (set RECONNECT, unpark connector). Then the reader wakes up and set READING+RECONNECT state that disallows a connector thread to proceed further (it expects pure RECONNECT). Even when the reader drops READING it will not wake up (unpark) the connector thread, because RECONNECT was already set (state is not UNINITIALIZED). This commit introduces several changes that eliminate the problems above: - Use ReentrantLock + Condition instead of park() / unpark() to never miss signals to reconnect, does not matter whether a connector is parked. - Ensure a reader and a writer threads from one generation (that are created on the same reconnection iteration) triggers reconnection once. - Hold RECONNECT state most of time a connector works (while acquiring a new socket, connecting and reading Tarantool greeting) and prevent to set READING/WRITING while RECONNECT is set. Fixes: #142 Affects: #34, #136
A state of a client is a set of the following flags: {READING, WRITING, RECONNECT, CLOSED}. Let's name a state when no flags are set UNINITIALIZED. A reader thread sets READING, performs reading until an error or an interruption, drops READING and tries to trigger reconnection (if a state allows, see below). A writer do quite same things, but with the WRITING flag. The key point here is that a reconnection is triggered from a reader/writer thread and only when certain conditions are met. The prerequisite to set RECONNECT and signal (unpark) a connector thread is that a client has UNINITIALIZED state. There are several problems here: - Say, a reader stalls a bit after dropping READING, then a writer drops WRITING and trigger reconnection. Then reader wokes up and set RECONNECT again. - Calling unpark() N times for a connector thread when it is alive doesn't lead to skipping next N park() calls, so the problem above is not just about extra reconnection, but lead the connector thread to be stuck. - Say, a reader stalls just before setting READING. A writer is hit by an IO error and triggers reconnection (set RECONNECT, unpark connector). Then the reader wakes up and set READING+RECONNECT state that disallows a connector thread to proceed further (it expects pure RECONNECT). Even when the reader drops READING it will not wake up (unpark) the connector thread, because RECONNECT was already set (state is not UNINITIALIZED). This commit introduces several changes that eliminate the problems above: - Use ReentrantLock + Condition instead of park() / unpark() to never miss signals to reconnect, does not matter whether a connector is parked. - Ensure a reader and a writer threads from one generation (that are created on the same reconnection iteration) triggers reconnection once. - Hold RECONNECT state most of time a connector works (while acquiring a new socket, connecting and reading Tarantool greeting) and prevent to set READING/WRITING while RECONNECT is set. - Ensure a new reconnection iteration will start only after old reader and old writer threads exit (because we cannot receive a reconnection signal until we send it). Fixes: #142 Affects: #34, #136
A state of a client is a set of the following flags: {READING, WRITING, RECONNECT, CLOSED}. Let's name a state when no flags are set UNINITIALIZED. A reader thread sets READING, performs reading until an error or an interruption, drops READING and tries to trigger reconnection (if a state allows, see below). A writer do quite same things, but with the WRITING flag. The key point here is that a reconnection is triggered from a reader/writer thread and only when certain conditions are met. The prerequisite to set RECONNECT and signal (unpark) a connector thread is that a client has UNINITIALIZED state. There are several problems here: - Say, a reader stalls a bit after dropping READING, then a writer drops WRITING and trigger reconnection. Then reader wokes up and set RECONNECT again. - Calling unpark() N times for a connector thread when it is alive doesn't lead to skipping next N park() calls, so the problem above is not just about extra reconnection, but lead the connector thread to be stuck. - Say, a reader stalls just before setting READING. A writer is hit by an IO error and triggers reconnection (set RECONNECT, unpark connector). Then the reader wakes up and set READING+RECONNECT state that disallows a connector thread to proceed further (it expects pure RECONNECT). Even when the reader drops READING it will not wake up (unpark) the connector thread, because RECONNECT was already set (state is not UNINITIALIZED). This commit introduces several changes that eliminate the problems above: - Use ReentrantLock + Condition instead of park() / unpark() to never miss signals to reconnect, does not matter whether a connector is parked. - Ensure a reader and a writer threads from one generation (that are created on the same reconnection iteration) triggers reconnection once. - Hold RECONNECT state most of time a connector works (while acquiring a new socket, connecting and reading Tarantool greeting) and prevent to set READING/WRITING while RECONNECT is set. - Ensure a new reconnection iteration will start only after old reader and old writer threads exit (because we cannot receive a reconnection signal until we send it). Fixes: #142 Affects: #34, #136
A state of a client is a set of the following flags: {READING, WRITING, RECONNECT, CLOSED}. Let's name a state when no flags are set UNINITIALIZED. A reader thread sets READING, performs reading until an error or an interruption, drops READING and tries to trigger reconnection (if a state allows, see below). A writer do quite same things, but with the WRITING flag. The key point here is that a reconnection is triggered from a reader/writer thread and only when certain conditions are met. The prerequisite to set RECONNECT and signal (unpark) a connector thread is that a client has UNINITIALIZED state. There are several problems here: - Say, a reader stalls a bit after dropping READING, then a writer drops WRITING and trigger reconnection. Then reader wokes up and set RECONNECT again. - Calling unpark() N times for a connector thread when it is alive doesn't lead to skipping next N park() calls, so the problem above is not just about extra reconnection, but lead the connector thread to be stuck. - Say, a reader stalls just before setting READING. A writer is hit by an IO error and triggers reconnection (set RECONNECT, unpark connector). Then the reader wakes up and set READING+RECONNECT state that disallows a connector thread to proceed further (it expects pure RECONNECT). Even when the reader drops READING it will not wake up (unpark) the connector thread, because RECONNECT was already set (state is not UNINITIALIZED). This commit introduces several changes that eliminate the problems above: - Use ReentrantLock + Condition instead of park() / unpark() to never miss signals to reconnect, does not matter whether a connector is parked. - Ensure a reader and a writer threads from one generation (that are created on the same reconnection iteration) triggers reconnection once. - Hold RECONNECT state most of time a connector works (while acquiring a new socket, connecting and reading Tarantool greeting) and prevent to set READING/WRITING while RECONNECT is set. - Ensure a new reconnection iteration will start only after old reader and old writer threads exit (because we cannot receive a reconnection signal until we send it). Fixes: #142 Affects: #34, #136
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a Tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34 See also: #142
Refactor SocketChannelProvider implementations. Now we have two SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by simple and cluster clients respectively. To achieve this a BaseSocketChannelProvider was extracted. Add a service discovery implementation based on a Tarantool stored procedure which is called to obtain a new list of cluster nodes. Integrate service discovery and current cluster client. The client now is aware of potential nodes changing using a configurable background job which periodically checks whether node addresses have changed. If so the client refreshes the RoundRobinSocketProviderImpl and replaces old nodes by new ones. It also requires some additional effort in case of missing the current node in the list. We need to reconnect to another node as soon as possible with a minimal delay between client requests. To achieve this we currently try to catch a moment when the requests in progress have been accomplished and we can finish reconnection process. Closes: #34 See also: #142
https://github.com/tarantool/vshard is current cluster solution for Tarantool. It introduces concept of "routers" - Tarantool instances which route requests to appropriate "storage" instances where data is actually stored.
I think we could add
TarantoolClusterClient
for tarantool clusters, which will round-robin requests to a group of Tarantool instances. It will be also useful to be able to pass custom sharding function to it (for custom sharding solutions).The text was updated successfully, but these errors were encountered: