-
Notifications
You must be signed in to change notification settings - Fork 144
httplib support #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
httplib support #19
Conversation
oo, should get the unittests setup via Travis/etc |
one question for this PR is it has three segments, I'm not sure I can coalesce it to two given redirects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the PR and we really appreciate your time for adding extra support. I have left a few comments. No major issue so I expect this can be merged soon.
aws_xray_sdk/core/patcher.py
Outdated
@@ -10,6 +10,7 @@ | |||
'requests', | |||
'sqlite3', | |||
'mysql', | |||
'httplib', # TODO: perhaps requests should map to this below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the requests
patch for now. If a user's intention is to only patch requests
but has other libraries depend on httplib
, then patching at a lower level may result in unexpected behavior. IMO the SDK should only patch httplib
when users explicit do so.
A small paragraph on https://github.com/aws/aws-xray-sdk-python/blob/master/docs/thirdparty.rst explaining httplib/httplib2/requests dependency and which to patch could be very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed comment, and added some docs
aws_xray_sdk/ext/httplib/patch.py
Outdated
|
||
|
||
# ? is not a valid entity, and we don't want things after the ? for the segment name | ||
def _strip_url(url: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this helper method to aws_xray_sdk.ext.util
? You use it on requests patch as well. It can be further used to strip any url string on any library patcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW as a helper method a None
check could be useful in case the caller forgets to do the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved, didn't put the None check as that's a fairly low-level API and you'll get an exception of operating on a 'None' already...thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest make it no-op if input is None
. This becomes a general helper under a utility module so it could be used on logging, other patching code etc by other developers. It is bad if the SDK throws a None type error from this part.
Also a nitpick could you remove the prefix _
since its usage is not private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
aws_xray_sdk/ext/httplib/patch.py
Outdated
subsegment.put_http_meta(http.METHOD, xray_data.method) | ||
subsegment.put_http_meta(http.URL, xray_data.url) | ||
subsegment.put_http_meta(http.STATUS, instance.status) | ||
subsegment.apply_status_code(instance.status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to manually call apply_status_code
. put_http_meta
has logic to adjust fault/error/throttle flags if a status code is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, probably copied from an old version of the AWS SDK, removed
aws_xray_sdk/ext/httplib/patch.py
Outdated
xray_data = _XRay_Data(method, instance.host, xray_url) | ||
setattr(instance, _XRAY_PROP, xray_data) | ||
|
||
return xray_recorder.record_subsegment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to record this local computation as a subsegment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC if the connect fails, getresponse will not be called.
aws_xray_sdk/ext/httplib/patch.py
Outdated
|
||
wrapt.wrap_function_wrapper( | ||
'http.client', | ||
'HTTPConnection.getresponse', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are patching both HTTPConnection.getresponse
and HTTPResponse.read
. Is the latter some duplicate work since the former already set url
, method
and status
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I follow, on the read you're reading the body, which you may not do on 204s, and you want all the information from the requests to be marked on the subsegment of the read in case the response segment is sampled out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood.
tests/ext/httplib/test_httplib.py
Outdated
url = 'http://{}/status/{}?foo=bar&baz=foo'.format(BASE_URL, status_code) | ||
requests.get(url) | ||
subsegment = xray_recorder.current_segment().subsegments[1] | ||
assert subsegment.name == url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this assertion pass when query string is stripped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your patching code you use the string before "?" as the subsegment name.
So to verify the http subsegment is named correctly, should the test asserts name == http://{}/status/{}
and pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
tests/ext/httplib/test_httplib.py
Outdated
xray_recorder.clear_trace_entities() | ||
|
||
|
||
def test_ok(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure your intention on using requests
syntax to test httplib
patch. This unit test is valid given the knowledge of requests
depends on httplib
. But I feel like this is a little bit anti-pattern since the scope is beyond httplib
and involves requests
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrote to use httplib
tests/ext/requests/test_requests.py
Outdated
@@ -28,7 +28,7 @@ def construct_ctx(): | |||
|
|||
def test_ok(): | |||
status_code = 200 | |||
url = 'http://{}/status/{}'.format(BASE_URL, status_code) | |||
url = 'http://{}/status/{}?foo=bar'.format(BASE_URL, status_code) | |||
requests.get(url) | |||
subsegment = xray_recorder.current_segment().subsegments[0] | |||
assert subsegment.name == url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about the segment name assertion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
is it possible to quickly add travis or circle ci support to this project? It's pretty simple and free and I think will help a lot, here's an example: https://github.com/aio-libs/aiobotocore/blob/master/.travis.yml |
The PR looks good on most part. But I would like to point out some concerns on the structure of the subsegments this patching code is generating. In a steady state one http outbound call is broken into two subsegments. One is for Ideally in a successfully case there should be only one subsegment representing one complete round trip. In a failure case there is one subsegment representing the failed connection. I understand the ideal case could be a challenge given this is a low level library and a logical operation is broken into several functions. But on the X-Ray service graph level one endpoint should be abstracted to one node with correct statistics. Another question is that you are saying one http call will generate three subsegments? From the PR I can only see two. |
ya I agree, I'm not sure the best way to proceed. Perhaps the the third segment will come in if you call |
I agree on only keep the subsegment for actual internet round trip on a successful case. Also |
ok cool, will update ASAP. Read should read the body of the response, normally the headers come first, followed by optionally reading the body (remember the body can be megabytes in length). So 204s will have no body and no subsegment, whereas something like downloading a 2MB file will. |
another reason is that you may perform logic between getting the headers and the read and don't want to reflect that processing time in your read segment. |
furthermore you may have multiple reads (it can get complicated quickly) :) If you have a fixed-size connection pool now imagine you may want even more segments:
|
Agreed. It really depends on the use cases and having the SDK to be able to provide more insights on a single logical operation is always better. But this could be a good first step. The main thing to keep in mind is the granularity on the segments structure. You can take a look at X-Ray Go SDK as an example: https://github.com/aws/aws-xray-sdk-go. In the screenshot shown at landing page you can see the DynamoDB call as a top level parent and its children subsegments provides insights on the timing on DNS lookup and SSL handshake and retries etc. But you only drill down to this level of details when you see a "slow DynamoDB operation". As long as the top level segment capture provides necessary information so service back-end can aggregate them correctly, more children subsegments are always useful. |
cool, will get to the changes soon hopefully, working with two broken arms at the moment due to VB accident :) |
@haotianw465 so I take it back, after investigating this a bit more, I realized that each subsegment is a separate operation, note the following in the test: conn.request(method, path) # sends headers
resp = conn.getresponse() # waits for response this could then further have a resp.read() so my current subsegment layout I believe is correct, where you have one subsegment per operation, which can be separated by user code. |
OK cool. In that case user will need to use |
ya I just realized I can't create a segment because I won't know when it should end beyond when the instance gets GC'd ;) |
You will have to do the following per @xray_recorder.capture('example.com')
def myfunc():
conn.request(method, path) # sends headers
resp = conn.getresponse()
resp.read() Then you will have one subsegment named "example.com" which has three children representing those three operations, if this is what you expect. |
cool, so anything left? |
No it looks good. I will approve and merge this PR. |
The |
will try now |
fixes in #23, let me know if you want in a clean branch, or squashing is ok on your side |
No worries. Already merged. |
@haotianw465 looks like I forgot to update changelist.rst and add this as a new unrelease feature |
You can have a PR adding this feature to "unreleased". The other thing I noticed is that since The fix is to have some unpatch logic on httplib patcher so that upon The broken tests also show us how you can deliberately patch These are some useful information I think I could share with you since you mentioned on patching |
ya, this module is missing unpatch logic from all the modules, will open a PR with fixes |
ok opened #24, however I wasn't able to reproduce your second issue in python3, is this a python2 only issue, or specific requests issue? |
I saw the following on
This might be a py2 specific issue and I don't have time to fully look at I'm not sure if this is the one you was unable to reproduce. |
seems to be fixed with 5816eab in my new PR |
For unit tests unpatch should solve all the problems. My point is that the following patch code def patch():
wrapt.wrap_function_wrapper(
httplib_client_module,
'HTTPConnection._send_request',
_send_request
)
wrapt.wrap_function_wrapper(
httplib_client_module,
'HTTPConnection.getresponse',
_xray_traced_http_getresponse
)
wrapt.wrap_function_wrapper(
httplib_client_module,
'HTTPResponse.read',
_xray_traced_http_client_read
) In order to capture the 2nd and 3rd operation correctly, 1st must be called, as 2nd and 3rd operation assumes an attribute |
ya I also tested this with requests and reproduced the issue you saw, and fixed with the above commit. From the impl of HTTPConnection I think it'll always get called or else you'd be rewriting major portions of the underlying class |
This is what we're going to use in production. It's important for tracing things like the old google API, which uses httplib2 which uses httplib.
NOTE: perhaps this should replace the requests patch as requests uses httplib
fixes: #20