Skip to content

Auto refresh a list of cluster nodes in TarantoolClusterClient #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hypersolid opened this issue Jul 25, 2018 · 6 comments · Fixed by #136
Closed

Auto refresh a list of cluster nodes in TarantoolClusterClient #34

hypersolid opened this issue Jul 25, 2018 · 6 comments · Fixed by #136
Assignees
Labels

Comments

@hypersolid
Copy link

https://github.com/tarantool/vshard is current cluster solution for Tarantool. It introduces concept of "routers" - Tarantool instances which route requests to appropriate "storage" instances where data is actually stored.

I think we could add TarantoolClusterClient for tarantool clusters, which will round-robin requests to a group of Tarantool instances. It will be also useful to be able to pass custom sharding function to it (for custom sharding solutions).

@sharonovd
Copy link

Since we now have TarantoolClusterClient, we could add the following to it:
[ ] ability to extract list of available nodes from first connected node and add them to TarantoolClusterClientConfig
[ ] ability to refresh this list periodically

The procedure called on Tarantool side to determine the list of available nodes must be pluggable, since we have several use-cases (sharded cluster, replicaset, 'cluster' module etc.)

We may consider round robin between available instances

@Totktonada
Copy link
Member

Notes from meeting with Dmitry (@sharonovd) (Jan, 31).

  • Is it possible that the procedure will return an id that is not an URI?

    • The procedure should return list of URIs. It is okay to expect in the java connector that all URIs will be in the inet domain ([user[:pass]@]host:port), because the connector does not support unix sockets.
  • Whether we should store arguments in TarantoolClusterClientConfig and pass them to the procedure?

    • It is not so needed, so let's implement it only if it is simple.
  • What to do if a node we currently connected disappears from the list?

    • Raw idea: If there are no errors in last requests we can leave remaining requests to end with the current node, but close it for new request. It doesn't look as a simple solution.
    • We can retry all requests that were on the fly with the new node, but anyway we need to handle DML requests correctly (ensure we don't do it twice).
  • Whether we want to support a list of instances that can be asked for cluster URLs instead of just one?

    • We need to elaborate our behaviour in the case when a node is dead when we try to fetch the list at the first time or update the list.
    • It seems that we need to implement round robin across these informational nodes too.
    • Raw idea: The round robin logic can be separated from certain actions one want to do on these nodes. It can be implemented as a separate class that holds a list of URLs, but doesn't hold a connection for one of them. It provides call method and connects to a node (try each across the list) right during a call to this method.
  • NB: Look at the couchbase API.

@Totktonada
Copy link
Member

Whether we want to support a list of instances that can be asked for cluster URLs instead of just one?

I thought about that again and decided that there are no much sense to support a list of instances here. There are three possible situations when we need to switch from one instance to an another one to obtain a list of node in a cluster, all are about unavailability of the original node: accidental node downtime, planned maintanence on the node, cluster updating with moving out the node.

Accidental node downtime will break fetching a list of nodes, but DQL/DML requests will be switched by a cluster client onto another node. The nodes list updating is not a final goal; it'll stuck for some time, but DQL/DML will work. So this case does not show us any reason to expand 'info node' to a list of such nodes.

Planned maintanence when the node is temporary down is quite same.

When we want to move the node out of a cluster it is some kind of planned work. Expanding 'info node' to a list would not save us from a situation when all nodes in the list will be down (sooner or later). So we anyway should include updating of an info node address / list into the plan for such cluster reconstruction. A list does not add any value here.

@ponomarevDmitri
Copy link

ponomarevDmitri commented Feb 19, 2019

Ok, the final algorithm will consist from following steps:

  1. Configuring. A connector client must specify an 'info node' address as host:port string and username/password as separate fields in a cluster config (like it've been made in current cluster config). Also the list must specify a function on a tarantool instance side that have to return a list of strings which consists from 'host:port' addresses of nodes (let's call them 'target nodes').
  2. Fetching list of nodes. The connector fetches list of cluster nodes using the function from 'info node'. (This operation can run in background)
  3. Updating the target nodes list.
    If newly obtained list does not contain the current node then wait responses of requests which've been sent to this node in background. Retry failed request at nodes from the new list. Instant failing the sent CUD requests make less sense cause if the node is down then all requests will be automatically retried anyways, or if the node is alive then the requests will be completed. It will be more transparent for a connector client - he will know nothing about hypothetical node fail.
  4. The client round-robins amid nodes in the list.
  5. Update the node list by timeout (return to step 2))

@Totktonada
Copy link
Member

There is the question how to handle the situation when a new list of nodes
does not contain a node we currectly connected. We need to switch a cluster
client to one of the new nodes. It is possible that we'll have some amount of
requests 'in fly': they were sent, but responses are not received for now.

There are the following ways to handle this situation:

  1. Pause new requests (don't send them, but add to a separate queue), wait for
    'in fly' requests, then switch node.
  2. Pause switching to a new node (and send all new requests to an old node)
    until we'll detect one of two situations: there are no requests 'in fly' or
    a reconnect occurs.
  3. Wait for responses to requests 'in fly', but send all new requests to a new
    node.

Third way requires to store two sockets in a cluster client and handle both.
readPacket() and other utility methods from TarantoolBase need to be
factored out to be able to work with arbitrary socket. That is why refactoring
in PR #135 was made. Maybe also other refactoring needed.

I don't sure it was right thing to inherit TarantoolClusterClient from
TarantoolClientImpl (we add round robin socket provider and retry strategy to
a regular client in a cluster client). Maybe this would be more natural if we
would store several TarantoolClientImpl instances in TarantoolClusterClient?
Don't sure, so consider this as a raw idea.

I would say that 2nd way should be less intruisive considering our current
architerture. But it is up to a developer which way is the best one
considering amount of work (the deadline is near), complexity of changes (size
of a patch) and resulting code readability.

@Totktonada Totktonada changed the title Add support for tarantool clusters (e.g. tarantool/vshard routers) Auto refresh a list of cluster nodes in TarantoolClusterClient Mar 21, 2019
nicktorwald added a commit that referenced this issue Mar 22, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 22, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 24, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 25, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 25, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 25, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 25, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 26, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 26, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 26, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 26, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 26, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 26, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 27, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 27, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 27, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 27, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 27, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 27, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Mar 27, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
@Totktonada
Copy link
Member

Whether we want to support a list of instances that can be asked for cluster URLs instead of just one?

Now I understood that I missed the case when we initially connect a cluster client. At this point it worth to have several addresses to bootstrap a cluster client, because it is possible that one node is down at the moment.

Also there is no much need to split discovery and DQL/DML nodes when both are lists.

nicktorwald added a commit to nicktorwald/tarantool-java that referenced this issue Mar 28, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: tarantool#34
nicktorwald added a commit to nicktorwald/tarantool-java that referenced this issue Mar 28, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: tarantool#34
nicktorwald added a commit that referenced this issue Mar 28, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit to nicktorwald/tarantool-java that referenced this issue Mar 29, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: tarantool#34
nicktorwald added a commit to nicktorwald/tarantool-java that referenced this issue Mar 29, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: tarantool#34
nicktorwald pushed a commit that referenced this issue Mar 29, 2019
- Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

- Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

- Avoid LockSupport class usage for a thread to be suspended and woken
up. Actually, LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Addects: #34, #136
nicktorwald added a commit that referenced this issue Mar 29, 2019
- Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

- Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

- Avoid LockSupport class usage for a thread to be suspended and woken
up. Actually, LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 1, 2019
- Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

- Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

- Replace LockSupport with ReentrantLock.Condition for a thread to be
suspended and woken up. Our cluster tests and standalone demo app show
that LockSupport is not a safe memory barrier as it could be. The
reconnect thread relies on a visibility guarantee between park-unpark
invocations which, actually, sometimes doesn't work. Also, according to
java-docs LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 2, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Apr 2, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Apr 2, 2019
- Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

- Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

- Replace LockSupport with ReentrantLock.Condition for a thread to be
suspended and woken up. Our cluster tests and standalone demo app show
that LockSupport is not a safe memory barrier as it could be. The
reconnect thread relies on a visibility guarantee between park-unpark
invocations which, actually, sometimes doesn't work. Also, according to
java-docs LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 3, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
nicktorwald added a commit that referenced this issue Apr 3, 2019
- Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

- Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

- Replace LockSupport with ReentrantLock.Condition for a thread to be
suspended and woken up. Our cluster tests and standalone demo app show
that LockSupport is not a safe memory barrier as it could be. The
reconnect thread relies on a visibility guarantee between park-unpark
invocations which, actually, sometimes doesn't work. Also, according to
java-docs LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 3, 2019
- Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

- Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

- Replace LockSupport with ReentrantLock.Condition for a thread to be
suspended and woken up. Our cluster tests and standalone demo app show
that LockSupport is not a safe memory barrier as it could be. The
reconnect thread relies on a visibility guarantee between park-unpark
invocations which, actually, sometimes doesn't work. Also, according to
java-docs LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 4, 2019
- Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

- Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

- Replace LockSupport with ReentrantLock.Condition for a thread to be
suspended and woken up. Our cluster tests and standalone demo app show
that LockSupport is not a safe memory barrier as it could be. The
reconnect thread relies on a visibility guarantee between park-unpark
invocations which, actually, sometimes doesn't work. Also, according to
java-docs LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 4, 2019
- Avoid a possible race between reading, writing and reconnecting
threads when a reconnection process is started.
It might have happened that the lagged thread (reading or writing) could
reset the state to RECONNECT after the reconnecting thread has already
started and set the state to 0. As a result, all next attempts to
reconnect will never happen. Now the reconnect thread holds on the state
as long as it is required.

- Avoid another possible race between reading and writing threads when
they are started during the reconnection process.
It might have happened that one of the threads crashed when it was
starting and another slightly lagged thread set up its flag. It could
have led that the reconnecting thread saw RECONNECT + R/W state instead
of pure RECONNECT. Again, this case broke down all next reconnection
attempts. Now reading and writing threads take into account whether
RECONNECT state is already set or not.

- Replace LockSupport with ReentrantLock.Condition for a thread to be
suspended and woken up. Our cluster tests and standalone demo app show
that LockSupport is not a safe memory barrier as it could be. The
reconnect thread relies on a visibility guarantee between park-unpark
invocations which, actually, sometimes doesn't work. Also, according to
java-docs LockSupport is more like an internal component to build
high-level blocking primitives. It is not recommended using this class
directly. It was replaced by ReentrantLock.Condition primitive based
on LockSupport but which has proper LockSupport usage inside.

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 13, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a Tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
See also: #142
nicktorwald added a commit that referenced this issue Apr 13, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a Tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
See also: #142
nicktorwald added a commit that referenced this issue Apr 14, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a Tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
See also: #142
nicktorwald added a commit that referenced this issue Apr 15, 2019
A state of a client is a set of the following flags: {READING, WRITING,
RECONNECT, CLOSED}. Let's name a state when no flags are set
UNINITIALIZED.

A reader thread sets READING, performs reading until an error or an
interruption, drops READING and tries to trigger reconnection (if a
state allows, see below). A writer do quite same things, but with the
WRITING flag. The key point here is that a reconnection is triggered
from a reader/writer thread and only when certain conditions are met.

The prerequisite to set RECONNECT and signal (unpark) a connector thread
is that a client has UNINITIALIZED state.

There are several problems here:

- Say, a reader stalls a bit after dropping READING, then a writer drops
WRITING and trigger reconnection. Then reader wokes up and set RECONNECT
again.

- Calling unpark() N times for a connector thread when it is alive
doesn't lead to skipping next N park() calls, so the problem above is
not just about extra reconnection, but lead the connector thread to be
stuck.

- Say, a reader stalls just before setting READING. A writer is hit by
an IO error and triggers reconnection (set RECONNECT, unpark connector).
Then the reader wakes up and set READING+RECONNECT state that disallows
a connector thread to proceed further (it expects pure RECONNECT). Even
when the reader drops READING it will not wake up (unpark) the connector
thread, because RECONNECT was already set (state is not UNINITIALIZED).

This commit introduces several changes that eliminate the problems
above:

- Use ReentrantLock + Condition instead of park() / unpark() to never
miss signals to reconnect, does not matter whether a connector is
parked.

- Ensure a reader and a writer threads from one generation (that are
created on the same reconnection iteration) triggers reconnection once.

- Hold RECONNECT state most of time a connector works (while acquiring
a new socket, connecting and reading Tarantool greeting) and prevent to
set READING/WRITING while RECONNECT is set.

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 15, 2019
A state of a client is a set of the following flags: {READING, WRITING,
RECONNECT, CLOSED}. Let's name a state when no flags are set
UNINITIALIZED.

A reader thread sets READING, performs reading until an error or an
interruption, drops READING and tries to trigger reconnection (if a
state allows, see below). A writer do quite same things, but with the
WRITING flag. The key point here is that a reconnection is triggered
from a reader/writer thread and only when certain conditions are met.

The prerequisite to set RECONNECT and signal (unpark) a connector thread
is that a client has UNINITIALIZED state.

There are several problems here:

- Say, a reader stalls a bit after dropping READING, then a writer drops
WRITING and trigger reconnection. Then reader wokes up and set RECONNECT
again.

- Calling unpark() N times for a connector thread when it is alive
doesn't lead to skipping next N park() calls, so the problem above is
not just about extra reconnection, but lead the connector thread to be
stuck.

- Say, a reader stalls just before setting READING. A writer is hit by
an IO error and triggers reconnection (set RECONNECT, unpark connector).
Then the reader wakes up and set READING+RECONNECT state that disallows
a connector thread to proceed further (it expects pure RECONNECT). Even
when the reader drops READING it will not wake up (unpark) the connector
thread, because RECONNECT was already set (state is not UNINITIALIZED).

This commit introduces several changes that eliminate the problems
above:

- Use ReentrantLock + Condition instead of park() / unpark() to never
miss signals to reconnect, does not matter whether a connector is
parked.

- Ensure a reader and a writer threads from one generation (that are
created on the same reconnection iteration) triggers reconnection once.

- Hold RECONNECT state most of time a connector works (while acquiring
a new socket, connecting and reading Tarantool greeting) and prevent to
set READING/WRITING while RECONNECT is set.

- Ensure a new reconnection iteration will start only after old reader
and old writer threads exit (because we cannot receive a reconnection
signal until we send it).

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 18, 2019
A state of a client is a set of the following flags: {READING, WRITING,
RECONNECT, CLOSED}. Let's name a state when no flags are set
UNINITIALIZED.

A reader thread sets READING, performs reading until an error or an
interruption, drops READING and tries to trigger reconnection (if a
state allows, see below). A writer do quite same things, but with the
WRITING flag. The key point here is that a reconnection is triggered
from a reader/writer thread and only when certain conditions are met.

The prerequisite to set RECONNECT and signal (unpark) a connector thread
is that a client has UNINITIALIZED state.

There are several problems here:

- Say, a reader stalls a bit after dropping READING, then a writer drops
WRITING and trigger reconnection. Then reader wokes up and set RECONNECT
again.

- Calling unpark() N times for a connector thread when it is alive
doesn't lead to skipping next N park() calls, so the problem above is
not just about extra reconnection, but lead the connector thread to be
stuck.

- Say, a reader stalls just before setting READING. A writer is hit by
an IO error and triggers reconnection (set RECONNECT, unpark connector).
Then the reader wakes up and set READING+RECONNECT state that disallows
a connector thread to proceed further (it expects pure RECONNECT). Even
when the reader drops READING it will not wake up (unpark) the connector
thread, because RECONNECT was already set (state is not UNINITIALIZED).

This commit introduces several changes that eliminate the problems
above:

- Use ReentrantLock + Condition instead of park() / unpark() to never
miss signals to reconnect, does not matter whether a connector is
parked.

- Ensure a reader and a writer threads from one generation (that are
created on the same reconnection iteration) triggers reconnection once.

- Hold RECONNECT state most of time a connector works (while acquiring
a new socket, connecting and reading Tarantool greeting) and prevent to
set READING/WRITING while RECONNECT is set.

- Ensure a new reconnection iteration will start only after old reader
and old writer threads exit (because we cannot receive a reconnection
signal until we send it).

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 18, 2019
A state of a client is a set of the following flags: {READING, WRITING,
RECONNECT, CLOSED}. Let's name a state when no flags are set
UNINITIALIZED.

A reader thread sets READING, performs reading until an error or an
interruption, drops READING and tries to trigger reconnection (if a
state allows, see below). A writer do quite same things, but with the
WRITING flag. The key point here is that a reconnection is triggered
from a reader/writer thread and only when certain conditions are met.

The prerequisite to set RECONNECT and signal (unpark) a connector thread
is that a client has UNINITIALIZED state.

There are several problems here:

- Say, a reader stalls a bit after dropping READING, then a writer drops
WRITING and trigger reconnection. Then reader wokes up and set RECONNECT
again.

- Calling unpark() N times for a connector thread when it is alive
doesn't lead to skipping next N park() calls, so the problem above is
not just about extra reconnection, but lead the connector thread to be
stuck.

- Say, a reader stalls just before setting READING. A writer is hit by
an IO error and triggers reconnection (set RECONNECT, unpark connector).
Then the reader wakes up and set READING+RECONNECT state that disallows
a connector thread to proceed further (it expects pure RECONNECT). Even
when the reader drops READING it will not wake up (unpark) the connector
thread, because RECONNECT was already set (state is not UNINITIALIZED).

This commit introduces several changes that eliminate the problems
above:

- Use ReentrantLock + Condition instead of park() / unpark() to never
miss signals to reconnect, does not matter whether a connector is
parked.

- Ensure a reader and a writer threads from one generation (that are
created on the same reconnection iteration) triggers reconnection once.

- Hold RECONNECT state most of time a connector works (while acquiring
a new socket, connecting and reading Tarantool greeting) and prevent to
set READING/WRITING while RECONNECT is set.

- Ensure a new reconnection iteration will start only after old reader
and old writer threads exit (because we cannot receive a reconnection
signal until we send it).

Fixes: #142
Affects: #34, #136
nicktorwald added a commit that referenced this issue Apr 18, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a Tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
See also: #142
nicktorwald added a commit that referenced this issue Apr 18, 2019
Refactor SocketChannelProvider implementations. Now we have two
SingleSocketChannelProviderImpl and RoundRobinSocketProviderImpl used by
simple and cluster clients respectively. To achieve this a
BaseSocketChannelProvider was extracted.

Add a service discovery implementation based on a Tarantool stored
procedure which is called to obtain a new list of cluster nodes.

Integrate service discovery and current cluster client. The client now
is aware of potential nodes changing using a configurable background job
which periodically checks whether node addresses have changed. If so
the client refreshes the RoundRobinSocketProviderImpl and replaces old
nodes by new ones. It also requires some additional effort in case of
missing the current node in the list. We need to reconnect to another
node as soon as possible with a minimal delay between client requests.
To achieve this we currently try to catch a moment when the requests in
progress have been accomplished and we can finish reconnection process.

Closes: #34
See also: #142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants