-
Notifications
You must be signed in to change notification settings - Fork 103
Reuse HTTP connections with a connection pool #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
here: https://issues.apache.org/jira/browse/THRIFT-2033 This patch modifies thrift's transport/THttpClient.py PySQL already subclasses this in our sql/auth/thrift_http_client.py file So I should be able to apply the same patch there. Signed-off-by: Jesse Whitehouse <[email protected]>
To make this easier I downloaded the patch from Thrift's bug tracker and ran this command to get it's diff:
|
Signed-off-by: Jesse Whitehouse <[email protected]>
I ran test_create_insert_drop_table_core to prove that the e2e tests still function. Meaning that this step didn't break anything. Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
I was only using this for nice code-completion during refactoring Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
This reverts commit 3261137. Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Connection pools now work for our entire e2e test suite. But I don't have a way to verify yet that one TCP connection is being reused. I would expect to see the e2e test suite execute more quickly if that were the case. But it's actually slower with the connection pool. In Also need to implement |
This improves the e2e test performance by 50% Signed-off-by: Jesse Whitehouse <[email protected]>
Since The remaining step here is to implement the proxy handling. |
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Proxies are now provisionally implemented. We are not testing the proxy support for now. |
Signed-off-by: Jesse Whitehouse <[email protected]>
issues with urllib3 Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We just need to add comments to the important lines that were updated so other readers know what is going on
This reverts commit 259702d. Signed-off-by: Jesse Whitehouse <[email protected]>
This reverts commit aa4fc0a. Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
…#131) Signed-off-by: Jesse Whitehouse <[email protected]>
Background
Right now, PySQL creates a new HTTP connection for every thrift RPC. This is wildly inefficient.
It happens because HTTP connections are handled by the
thrift
package. This is discussed in the Apache Thrift bug tracker here: https://issues.apache.org/jira/browse/THRIFT-2033Quoting from that issue:
The linked issue includes a patch that modifies thrift's
transport/THttpClient.py
PySQL already subclasses this in
sql/auth/thrift_http_client.py
file. The goal of this PR is to apply a similar patch to our subclass ofTHttpClient
.Implementation
I selectively copied over the methods from Thrift's
THttpClient
and implemented them using urllib3'sConnectionPool
classes. After doing so the e2e test suite execution time dipped by 50%.We don't have a good way to test proxy support / behaviour in the connector at this time. I've made a best-effort to implement proxy support using
ProxyManager
. But this could break for some users. I verified that this patch at least attempts to connect through a proxy. Beyond that it's anyone's guess.We would be more careful about changing this behaviour except that PySQL has not historically supported proxies. We only made them work provisionally in #81 which was incorporated into the
v2.5.0
release.This means that the majority of users are not using proxies today and will not be affected by a regression in proxy support. But everyone benefits from an upgrade to HTTP 1.1 support. We'll rely on users to report if this doesn't work for them.
The mitigation if proxies break in the short-term is to pin to a previous release of PySQL and report any issues with proxies to Databricks.
Related Tickets & Documents
PECO-715