Skip to content

Fix a file descriptor leak with wrong user:pass #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 13, 2019

Conversation

Totktonada
Copy link
Member

@Totktonada Totktonada commented Mar 12, 2019

Set SO_LINGER to 0 in tests that produce many outgoing connections,
because so many sockets in the TIME_WAIT state can lead to
'java.net.NoRouteToHostException: Cannot assign requested address
(Address not available)' exceptions on Travis-CI.

Fixes #132.

@Totktonada Totktonada added the bug Something isn't working label Mar 12, 2019
@Totktonada Totktonada self-assigned this Mar 12, 2019
@Totktonada Totktonada mentioned this pull request Mar 12, 2019
@Totktonada Totktonada force-pushed the Totktonada/gh-132-fix-fd-leak branch from ae48631 to 10febcb Compare March 12, 2019 06:50
@Totktonada
Copy link
Member Author

Caused by: java.net.NoRouteToHostException: Cannot assign requested address (Address not available)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
	at java.base/java.net.Socket.connect(Socket.java:591)
	at java.base/java.net.Socket.connect(Socket.java:540)
	at org.tarantool.AbstractTarantoolConnectorIT.openConnection(AbstractTarantoolConnectorIT.java:168)

It seems that we cannot do so much attemps to connect on travis-ci. We should either set SO_REUSEADDR or disable these test cases on travis-ci.

@Totktonada Totktonada changed the title Fix file desc. leak with wrong credentials connect WIP: Fix file desc. leak with wrong credentials connect Mar 12, 2019
@Totktonada
Copy link
Member Author

SO_REUSEADDR does not help. I'll fix and then add reviewers.

@Totktonada Totktonada force-pushed the Totktonada/gh-132-fix-fd-leak branch from d14693b to e466400 Compare March 12, 2019 15:13
@Totktonada Totktonada changed the title WIP: Fix file desc. leak with wrong credentials connect WIP: Fix a file descriptor leak with wrong user:pass Mar 12, 2019
@Totktonada Totktonada force-pushed the Totktonada/gh-132-fix-fd-leak branch from e466400 to 3f3a004 Compare March 12, 2019 15:44
@Totktonada
Copy link
Member Author

For the record: It is possible to fix CI with SO_REUSEADDR, but we need either lean on a test order or set it everywhere. I decided to set SO_LINGER to 0 for outgoing connections in tests where outgoing connections are created intensively.

@Totktonada Totktonada force-pushed the Totktonada/gh-132-fix-fd-leak branch from 3f3a004 to 0246cca Compare March 12, 2019 16:03
@Totktonada Totktonada changed the title WIP: Fix a file descriptor leak with wrong user:pass Fix a file descriptor leak with wrong user:pass Mar 12, 2019
@Totktonada
Copy link
Member Author

NB: PR #135 fixes the problem too.

@Totktonada
Copy link
Member Author

I'm going to remove testReconnectNonExist(), because it does not make several attempts within one for-loop iteration and so cannot exceed 1024 (Linux) / 256 (Mac OS) default soft limit of file descriptors.

Set SO_LINGER to 0 in tests that produce many outgoing connections,
because so many sockets in the TIME_WAIT state can lead to
'java.net.NoRouteToHostException: Cannot assign requested address
(Address not available)' exceptions on Travis-CI.

Fixes #132.
@Totktonada Totktonada force-pushed the Totktonada/gh-132-fix-fd-leak branch from 0246cca to e407078 Compare March 12, 2019 23:34
@Totktonada
Copy link
Member Author

Totktonada commented Mar 13, 2019

@ponomarevDmitri complains about several things (thanks for pointers!):

Hm. A lifetime of these streams (is and cis) mostly the same as a tarantool client object lifetime. When the reconnection situation is accurately handled with proper closing of old streams and closing newly created streams in case of an error in a middle of (re)connection (due to greeting read fail or auth write/read/verify fail). That is what I made in this PR.

Things become a way different if we'll need to handle several sockets (and so several streams) within one tarantool client. It may be needed in the scope of #34 (see 'way 3' here). So preliminary refactoring (at least factor out functions for writing and reading packages from using saved streams) seems to be right way to proceed if we'll choose an implementation variant of #34, which use several sockets simultaneously inside tarantool client. Maybe it also improves readability of the code even if we'll not going to implement #34 in this way. I cannot say something about that right now.

So I prefer to fix the bug w/o such refactoring here.

About the test case that lean on an OS file descriptor limit. I investigated several other ways to verify we don't lose some fd. In short:

  • Mocking close methods. The strength of this way is that it is native for Java and does not depend on environment. However such test will able to catch only these forgotten close() that we mock. If a leak will appear in some other object in the future (say, in some new class) the test will not catch it.
  • Lean on the fact that fd are increased by one. It is not requirement of a standard AFAIK, but it should mostly work on Linux and Mac OS. However it is not so for dup2() and dup3() AFAIU. I'm doubt there is a reason to use dup2() for anything except 'rebinding' stdin, stdout ot stderr, but dup3() allows to atomically set FD_CLOEXEC, so maybe it is used somewhere in JVM implementation.
  • Track open / connect / close via some mocking package (it seems that File Leak Detector operates in that way). Don't sure whether this method is reliable. It is possible that we'll use a class that is not covered by this tool. We need to understand what is going on inside this tool to lean on it.
  • Use UnixOperatingSystemMXBean#getOpenFileDescriptorCount. It seems that it does not work on Mac OS within all JVMs. It is also unclear how it doing counting.

So I prefer to stick with the current approach: create many connections and verify we don't reach a fds limit.

@Totktonada Totktonada merged commit d0a83eb into master Mar 13, 2019
@Totktonada Totktonada deleted the Totktonada/gh-132-fix-fd-leak branch March 13, 2019 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant