Skip to content

Considering the usage of strip_url to name a node as a none perfect solution #64

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pfreixes opened this issue May 30, 2018 · 14 comments
Closed

Comments

@pfreixes
Copy link
Contributor

Right now most of the extensions that implement patches or machinery to allow instrumentalize HTTP clients such as Aiohttp, Requests or HTTPLib use by default the strip_url as fallback when there is no name given [1] or even as a primary way to name a subsegment [2][3].

Relying on the output of this function when there is no an alternative name, the number of nodes that are created in the AWS Xray console is as many different URLrs has been generated, so the following URLs will end up creating different nodes each one. While common sense says that all of them refer to the same service:

>>> strip_url("http://localhost/resource/1?debug=true")
'http://localhost/resource/1'
>>> strip_url("http://localhost/resource/2")
'http://localhost/resource/2'
>>> strip_url("http://localhost/resource/1,2,3")
'http://localhost/resource/1,2,3'
>>> strip_url("http://localhost/resource/1,2,3")
'http://localhost/resource/1,2,3'

So, IMO we must address this situation or at least give to the user a way of inferring the name the segment based on the URL hosts or any strategy that allows the user to group those queries that belong to the same node. Taking as example the previous set of URLs all of them will be named as http://localhost.

I guess there is no a silver bullet for that, but before start digging into a proper solution I would like to gather your feedback about the issue by itself, and ofc if you have any proposal I will like to hear them

[1] https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/ext/aiohttp/client.py#L25
[2] https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/ext/requests/patch.py#L29
[3] https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/ext/httplib/patch.py#L54

@haotianw465
Copy link
Contributor

Hi,

Thank you for your time for the writeup. I agree with you. With the presence of path based routing, the SDK doesn't have enough information to do the grouping correctly. In your example it is very likely that the resource/* is served by a single service. But there are also counterexamples: data/1 is routed to control-plane fleet while data/2 is routed to data-plane fleet.

I think it really depends on who owns those services. If you own those services that serving resource/* and they are also integrated with X-Ray, they will also actively emit segments. The X-Ray service graph will respect the segment emitted from the server side other than caller. So if all resource/1, resource2 are served with a Django app which emits segment named my_service, then in service graph you will only see a node called my_service.

If you don't own the service that serving those requests, then the question comes down to if you want to aggregate them or not and aggregate up to which level on the url schema. Generally I would recommend to not aggregate anything because you don't know things behind the scene and it can be difficult to identify issues if all urls are aggregated. But I definitely would like to know more about your use case here. The SDK will for sure provide better customer experience by having more flexibility on user configuration.

Please let me know your thoughts.

@pfreixes
Copy link
Contributor Author

pfreixes commented Jun 4, 2018

Thanks for your answer @haotianw465, some comments about your comments

I think it really depends on who owns those services. If you own those services that serving resource/* and they are also integrated with X-Ray, they will also actively emit segments. The X-Ray service graph will respect the segment emitted from the server side other than caller. So if all resource/1, resource2 are served with a Django app which emits segment named my_service, then in service graph you will only see a node called my_service

Well in an organization that is starting to adopt the tracing technology, where only a few of services have this technology in place it's the perfect scenario which can't trust on the caller to name the node with the proper service name. True that at least for all of the services that you own at some point once the tracing technology is adopted all of the nodes will be named with the proper name. But it will take some time.

In any case, there is the case when you are calling an external service that you don't have any kind of control. In that case, the name of the node that will persist will be always the URL with all of the issues that I've commented.

Going back to your example of having two different URLs that are prefixed by the same hostname but can end up in a different service, data/1 and data/2. This is a scenario is quite similar to one of the use cases in our infrastructure. Where, there is an HTTP middleware that implements some commonalities such as Auth, Rate limiting and routes the traffic to downstream services using part of the URI to identify it univocally. In that case, I'm still thinking that the best option is to use the hostname to identify the node. So, having a unique node in the AWS Xray console that identifies all of the calls to this intermediate layer, while later, the downstream services that are behind to each URL path will have the chance to be identified also in the AWS Xray console with their proper name.

And last but not least take into account that the full URL is always recorded as a tag value.

I'm still thinking that the first part of the URL is the least bad.

@haotianw465
Copy link
Contributor

haotianw465 commented Jun 4, 2018

Thank you for the explanation. I totally understand your concerns and this use case is very reasonable and common. I'm open to discuss a way of configuring custom url name capture. Please let me know if you have any suggestions of how you want to conveniently configure url renaming.

We have a dynamic naming configuration for middleware to name segment based on host header from incoming request. See https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-python-middleware.html#xray-sdk-python-middleware-naming. Please let me know if you would like to see something similar on subsegment naming for outgoing http requests.

@TheSkorm
Copy link

TheSkorm commented May 9, 2019

Screen Shot 2019-05-09 at 8 27 46 pm

I just recently ran into this issue with a backend that uses url parameters for calling an external service.

What I was hoping for was either a config where I could something like

http://hostname/path/*/blah/*

where * gets aggregated with a placeholder in the uploaded segments.

Or alternatively prior to using requests being able to specify a custom name for that segment.

@chanchiem
Copy link
Contributor

Hey, thanks for the feedback! We'll definitely take this into consideration for the UX for this feature request. Do you have any recommendations for how custom names for downstream requests should be specified at the SDK? Would similar to what was mentioned above by Haotian465 be a good experience for naming these traces? https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-python-middleware.html#xray-sdk-python-middleware-naming

@TheSkorm
Copy link

It's pretty close to what I desire and would certainly make a good first pass of the feature. The only enhancement I would consider is making it path aware as http://test/0123/blah and http://test/0123/foo/blah would be classified as http://test/*/blah under a basic wild card approach when really they should be two entries.

@metaskills
Copy link

metaskills commented Sep 6, 2019

Seen this issue in some of our Python work at Custom Ink. Subscribed to the issue to see where we land on this solution. Thanks ahead of time!

In the meantime, I think we have utilized wrapt to freedom patch record_subsegment and that seems to have been working well.

@sreid
Copy link

sreid commented Nov 13, 2019

I have also ended up with a graph much like @TheSkorm has. As an intermediary solution, would it be possible to configure in such a way so that you have the option of only using the domain name? (a boolean config vs custom path matching)

@chanchiem
Copy link
Contributor

Can you elaborate more with what you mean?
As an example, let's say we have a key AggregateDomainName and it's a boolean flag.

Setting it to true would mean that any subsegment which has the same domain name would be aggregated into a single node by having their subsegments all have the same name with just the domain. This would mean that https://amazon.com/get/food and https://amazon.com/get/some/books would both aggregate as https://amazon.com/.

Is this an accurate depiction of what you're requesting?

@sreid
Copy link

sreid commented Nov 22, 2019

Yes, that's along the lines of what I was thinking. It's not as configurable as the wildcard style matching mentioned previously but perhaps easier to implement?

Right now the requests tracing is not helpful in my particular situation, as there is no aggregation on response times/codes for HTTP requests (my graph looks similar to @TheSkorm )

@chanchiem
Copy link
Contributor

Thanks for the response. We will consider both cases and mark this accordingly as a feature request.

The first case would be to allow customers to enable wildcards for url names and aggregate them based on the expression:
For example,
http://hostname/path/*/blah/* would aggregate the following as the same nodes
http://hostname/path/somepath/blah/unknown
http://hostname/path/diffpath/blah/somestring

The second case would be to aggregate whole nodes based on their host names:
This would mean that https://amazon.com/get/food and https://amazon.com/get/some/books would both aggregate as https://amazon.com/.

In either case, we would need a centralized system to be able to keep track of downstream calls. It would probably be better to utilize existing centralized sampling rules to to aggregate known domain names together, and have the SDKs name the downstream calls accordingly.

@chanchiem
Copy link
Contributor

It turns out that the subsegment name should only contain the hostname of the endpoint that it's being targetted and has been a behavior consistent across all our other SDKs except this one.

WIth PR #192, does this fix the issue you guys were having? All then nodes were intended to be aggregated and not be differentiated into different nodes for each unique path.

@sreid
Copy link

sreid commented Dec 20, 2019

Can confirm that the 2.4.3 release that includes #192 fixes the issue enough for me to make requests tracing usable. Thanks!

@srprash
Copy link
Contributor

srprash commented Dec 30, 2019

Hi @sreid
That's great! Hope this fixes the issue for everyone. In case the issue still persists, feel free to reopen this or create a new one.

@srprash srprash closed this as completed Dec 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants