Skip to content

EmbedAPI: log success requests #8689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 30, 2021
Merged

EmbedAPI: log success requests #8689

merged 10 commits into from
Nov 30, 2021

Conversation

humitos
Copy link
Member

@humitos humitos commented Nov 17, 2021

Log success requests using django-structlog and send them New Relic to know
what projects are using EmbedAPI, if they are using sphinx-hoverxref and what
version of it.

This is the minimum configuration required to start using structlog + New Relic.
If we are happy with the results we can migrate the other logs to structlog,
which should be simple, but a little tedious to do:

  1. replace logging by structlog
# from
import logging
log = logging.getLogger(__name__)

# to
import structlog
log = structlog.get_logger(__name__)
  1. move key=value from log messages to keyword attributes
# from
log.info('Domain not allowed. domain=%s url=%s', domain, url)

# to
log.info('Domain not allowed.',  domain=domain, url=url)
  1. make positional arguments to be keyword attributes (similar to the previous point, but the %s is mixed with the message itself)
# from
log.info('Sending team invite for %s to %s', invite.team, invite.email)

# to
log.info('Sending team invite to user.', team=invite.team, email=invite.email)
  1. migrate to Canonical Log Lines (https://www.structlog.org/en/21.3.0/logging-best-practices.html) by using log.bind and context.

In the future, we can also set up django-structlog to log Celery tasks as well (https://django-structlog.readthedocs.io/en/latest/celery.html) which will give us traceability since it logs the task_id automatically and other extra features.

References

@humitos humitos requested a review from a team as a code owner November 17, 2021 11:03
@humitos humitos force-pushed the humitos/embed-api-log branch 3 times, most recently from 446bbe5 to bcbe3b3 Compare November 17, 2021 17:05
@humitos humitos marked this pull request as draft November 17, 2021 17:10
@astrojuanlu
Copy link
Contributor

How it looks in the New Relic dashboard:

Query view Log detail
Screenshot 2021-11-17 at 19-05-03 Logs New Relic One Screenshot 2021-11-17 at 19-05-18 logger Logs New Relic One

How to query it from the CLI:

$ newrelic-cli.newrelic nrql query -a $ACCOUNT --query "select * from Log where hoverxref_version = '0.8b1' "
[
  {
    "absolute_url": "http://community.dev.readthedocs.io/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/api/v3.html",
    "event": "EmbedAPI successful response.",
    "external": true,
    "hoverxref_version": "0.8b1",
    "ip": "10.10.0.100",
    "level": "info",
    "logger": "readthedocs.embed.v3.views",
    "message": "info - EmbedAPI successful response.",
    "messageId": "ef9f8c10-478b-425f-b2f9-84d0c1e3b3ee",
    "newrelic.logs.batchIndex": 1,
    "newrelic.source": "api.logs",
    "project": "[docs.readthedocs.io](http://docs.readthedocs.io)",
    "request_id": "45e17bce-898f-44ce-9b3f-106c8aa05d09",
    "timestamp": 1637167481565,
    "url": "https://docs.readthedocs.io/en/latest/api/v3.html",
    "user_id": 1,
    "user_username": "admin"
  },
  {
    "event": "EmbedAPI successful response.",
    "external": true,
    "hoverxref_version": "0.8b1",
    "ip": "10.10.0.100",
    "level": "info",
    "logger": "readthedocs.embed.v3.views",
    "message": "info - EmbedAPI successful response.",
    "messageId": "1332d1ce-438a-4fd5-a63a-a2f4d07b6aa5",
    "newrelic.logs.batchIndex": 1,
    "newrelic.source": "api.logs",
    "project": "[docs.readthedocs.io](http://docs.readthedocs.io)",
    "request_id": "11ae36ac-079b-47d0-90b2-0f1979414672",
    "timestamp": 1637167425340,
    "url": "https://docs.readthedocs.io/en/latest/api/v3.html",
    "user_id": 1,
    "user_username": "admin"
  },
  {
    "event": "EmbedAPI successful response.",
    "external": true,
    "hoverxref_version": "0.8b1",
    "ip": "10.10.0.100",
    "level": "info",
    "logger": "readthedocs.embed.v3.views",
    "message": "info - EmbedAPI successful response.",
    "messageId": "1e72abe7-5613-4b3b-b266-4877ee82aee7",
    "newrelic.logs.batchIndex": 1,
    "newrelic.source": "api.logs",
    "project": "[docs.readthedocs.io](http://docs.readthedocs.io)",
    "request_id": "b7821d7b-7f5e-46eb-955f-b488f4a802d3",
    "timestamp": 1637166367670,
    "url": "https://docs.readthedocs.io/en/latest/api/v3.html",
    "user_id": 1,
    "user_username": "admin"
  }
]

Log success requests using `django-structlog` and send them New Relic to know
what projects are using EmbedAPI, if they are using sphinx-hoverxref and what
version of it.

This is the minimum configuration required to start using structlog + New Relic.
If we are happy with the results we can migrate the other logs to structlog,
which should be simple, but a little tedious to do search&replace.

In the future, we can also set up django-structlog to log Celery tasks as well.
@humitos humitos force-pushed the humitos/embed-api-log branch from bcbe3b3 to d65241d Compare November 17, 2021 18:38
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an interesting approach. I haven't played around too much with the NR interface, but it looks like a good first attempt at capturing this data in a way we can aggregate. I believe AWS also has a similar product that does log aggregation/querying, so this feels like a good integration point that isn't locking us in to a specific backend too deeply.

I'm a little worried about swapping out our entire logging infra as part of this change. It makes it a higher stakes change if we want to try to get it into production as a test. It seems like structlog would be useful in other ways in our code though, so if y'all think it would be work adding I'm 👍 on taking the work on now as part of this.

@humitos
Copy link
Member Author

humitos commented Nov 18, 2021

@ericholscher

I'm a little worried about swapping out our entire logging infra as part of this change. It makes it a higher stakes change if we want to try to get it into production as a test. It seems like structlog would be useful in other ways in our code though, so if y'all think it would be work adding I'm +1 on taking the work on now as part of this.

Changing(*) our entire logging infra to structlog + NewRelic has other benefits as well and it's something I've been wanting for years now (did I mention Loggly already? 😅 ). For example,

  • logs in the console (under development) are colorized and way easier to follow
  • log entries have more data that gives us traceability: request_id for HTTP and task_id for Celery tasks
  • we can add as many extra fields as we want globally
  • avoid grepping .log files together with other bash tools to answer a question
  • answer questions using New Relic's UI with the data we are already logging

So, I'd be 👍🏼 on working towards this change as a whole by doing the steps mentioned in the PR's description and keep using (r)syslog together with New Relic.

One last thing that I'd like to know if it's possible is to "send logs in chunk" since I would like to avoid hitting an API endpoint for each log.info entry. I didn't see anything in the structlog's documentation about this even in the section where they recommend exactly what we are doing here, but I assume that accumulating the logs and sending them in chunks will be required for this to work properly. In any case, even if this is not possible, I'd move forward with the plan, because we can easily filter "what to send" through the API with one line (eg. if logger == 'readthedocs.embed.v3.views': send()) and still have all the other benefits.

@humitos humitos force-pushed the humitos/embed-api-log branch 6 times, most recently from c45eab5 to 9c1f625 Compare November 22, 2021 14:10
@agjohnson
Copy link
Contributor

Pushing this a release out, this still looks to be in a draft state.

@humitos humitos force-pushed the humitos/embed-api-log branch from 9c1f625 to 4962494 Compare November 23, 2021 09:23
@humitos
Copy link
Member Author

humitos commented Nov 23, 2021

I did the minimal implementation using structlog + New Relic formatter + rsyslog to send data to NR. As we are using a mixture of structlog (for EmbedAPI) and logging (for the rest) we cannot use all the features from structlog. However, since this is an initial test I didn't migrate all the code as proposed in the description of the PR yet. I'd like to do it once we have some data being sent to NR and we are happy with it.

Note that all the logging keeps working in the same way, but the log entries coming from structlog (those from EmbedAPI) are rendered as JSON in the console for now. Once we migrate our logging to structlog we can have a lot nicer rendering. Example,

Screenshot_2021-11-23_10-27-39

@humitos humitos force-pushed the humitos/embed-api-log branch 2 times, most recently from 8622310 to 3444dda Compare November 23, 2021 09:44
Use structlog _only_ for EmbedAPI v2/v3 for now. We will do a test in production
to this and decide the migration path for the whole codebase.
@humitos humitos force-pushed the humitos/embed-api-log branch from 3444dda to 2dc1c94 Compare November 23, 2021 09:45
@humitos humitos requested a review from ericholscher November 23, 2021 09:46
@humitos humitos marked this pull request as ready for review November 23, 2021 09:47
Initial work to migrate our application to structlog for better logging, making
them compatible with New Relic's integration.

There are more work that can be done here to make it better. For example, more
usage of `log.bind` to avoid repetition. Besides, rewriting some log message and
reduce them to have more Canonical Log
Lines (https://www.structlog.org/en/stable/logging-best-practices.html#canonical-log-lines)
instead.
Logs coming from `logging.getLogger` (from other applications) have the same
structure that the ones logged by us via `structlog`.
@agjohnson
Copy link
Contributor

This seems like a good middle ground. I am finding the newrelic interface is harder to watch logs at, but really helpful if you know what you are looking for. It doesn't seem like the new relic UI or the new relic cli could really replace the simplicity of parsing logs with basic unix tools. So maintaining local logging for debugging and operations, alongside new relic logs for inspecting the data inside the logs, is probably about where we want to be.

@humitos
Copy link
Member Author

humitos commented Nov 23, 2021

@agjohnson

It doesn't seem like the new relic UI or the new relic cli could really replace the simplicity of parsing logs with basic unix tools. So maintaining local logging for debugging and operations, alongside new relic logs for inspecting the data inside the logs, is probably about where we want to be.

We can output a .log file with any format we want to be parsed with basic unix tools. That's not a problem. We could output JSON (https://www.structlog.org/en/stable/api.html#structlog.processors.JSONRenderer) and use jq to query it or just something similar to our key=value structure that we are currently using (https://www.structlog.org/en/stable/api.html#structlog.processors.KeyValueRenderer).

On the other hand, having these colors in the console's output is super handy while developing locally.

Let me know if you have specific requests about the output formatting, and we can work together to output a .log file with that particular format.

I'd move forward by merging all the PRs that make usage of structlog all around our codebase and improve any particular thing that we found can be improved.

@agjohnson
Copy link
Contributor

I think the format you posted looks great. It's a bit easier to grep than our current format as the module name has clear delineation and it splits up the log text and parameters. I would have an easier time processing the log format you suggested compared to what we have now.

I'd be curious how this looks after going through rsyslog to the central logging server.

@humitos
Copy link
Member Author

humitos commented Nov 23, 2021

@agjohnson

I'd be curious how this looks after going through rsyslog to the central logging server.

I already deployed an instance of web-extra as a test. You can see the logs by doing:

readthedocs@util01(org):~$ tail -f /var/log/readthedocs/celery.log | grep web-extra

Note that they are colorized currently, which is something we may not want, but that's as simple as passing colors=False to the processor 😉

@agjohnson
Copy link
Contributor

agjohnson commented Nov 23, 2021

I just see a lot of lines like:

Nov 23 23:53:01 web-extra-i-03d4571ac924022a3 readthedocs/readthedocsext.monit INFO {'task_id': 'fa24bf25-4209-446f-b4a5-e8f1e0305dd0', 'metric_slug': 'celery-active-tasks', 'event': 'Excecuting metric.', 'timestamp': '2021-11-23T23:53:01.096193Z', 'logger': 'readthedocsext.monitoring.metrics.tasks', 'level': 'info'} [readthedocsext.monitoring.metrics.tasks:107]

JSON is very handy for scanning if you are a robot, but alas, I am human and this is very hard to read

@humitos
Copy link
Member Author

humitos commented Nov 24, 2021

@agjohnson I don't know where you got those lines. The file that I mentioned does not contain those lines. Which file on what instance contains those lines?

@humitos
Copy link
Member Author

humitos commented Nov 24, 2021

OK! I found those lines in /var/log/readthedocs/messages.log.

I updated the syslog formatter to use structlog.dev.ConsoleRenderer(colors=False) so it gets formatted nicely. Besides, I updated the rsyslog filter rules to redirect the logs to the proper files. This work is already tested in web-extra and done in the -ops PR: https://github.com/readthedocs/readthedocs-ops/pull/1097

Now, the logs in /var/logs/readthedocs/message.log look like:

Nov 24 10:13:36 web-extra-i-03d4571ac924022a3 readthedocs: 2021-11-24T10:13:36.195455Z [info     ] Main node found. selector=[role=main] [readthedocs.embed.v3.views] ip=10.32.30.182 request_id=8f3d9f35-e0b1-4da7-b2c0-3bd34c05b8cf user_id=xxx
Nov 24 10:13:36 web-extra-i-03d4571ac924022a3 readthedocs: 2021-11-24T10:13:36.197713Z [info     ] EmbedAPI successful response.  [readthedocs.embed.v3.views] domain=docs.sympy.org external=True hoverxref_version=humitos-test ip=10.32.30.182 project_slug=None referer=None request_id=8f3d9f35-e0b1-4da7-b2c0-3bd34c05b8cf url=https://docs.sympy.org/latest/tutorial/gotchas.html user_id=xxx

@humitos
Copy link
Member Author

humitos commented Nov 24, 2021

I made the last adjustments to use colors only locally and plain text in production. IMO, this is ready to be merged together with the other PRs that migrate our app logging to structlog.

@agjohnson
Copy link
Contributor

agjohnson commented Nov 25, 2021

I don't know where you got those lines.

Yup, I was looking at messages.log, for all messages sent through to the rsyslog server.

Now, the logs in /var/logs/readthedocs/message.log look like

Looks good. If there is a way to remove the timestamp, it's duplicated in the message.

image

Also, the current logs have:

  • process id, which is incredibly helpful in both debugging stray processes, but also in aligning log lines from a single process, such as determining what a worker process is doing and it's timing.
  • module line number for the log line. Something that can be guessed in most cases, but we do have modules with similar log lines and a line number is explicit

@humitos
Copy link
Member Author

humitos commented Nov 25, 2021

@agjohnson

If there is a way to remove the timestamp, it's duplicated in the message.

Yes, we are adding our own timestamp. Ideally, it would be good to remove the timestamp added by rsyslog and control everything from structlog instead of having a hybrid.

Also, the current logs have:
* process id,
* module line number

We have this information in New Relic (line.number and process.id), together with a request_id and a task_id as well --that allows us to check all the logs for a particular request or task.

We can add or modify the log lines as we want, it's possible by adding more structlog processors. However, I want to move forward with all of these PRs instead of blocking them on these small differences. We can tweak these lines as we go once we have the big parts connected and working all together.

Use a custom renderer to match the structure we want:

```
%programname[%processid] [%level] %message [%logger:%linenumber] %structureddata
```

By default, rsyslog will prepend

```
%timestamp %hostname
```

and it will look exactly as we want.
@humitos
Copy link
Member Author

humitos commented Nov 25, 2021

@agjohnson I wrote a custom structlog processor and renderer to add the process id and the line number in a similar way as it was. I'm leaving the timestamp and hostname to be added by rsyslog as it was. The result is:

Nov 25 18:16:46 web-extra-i-03d4571ac924022a3 readthedocs[22130]: [info     ] request_started                [django_structlog.middlewares.request:56] ip=10.32.30.182 request=http://10.32.29.10/_/health_check/ request_id=58a12996-6d36-4a0f-ae4e-d487566447c8 user_agent=ELB-HealthChecker/2.0 user_id=None
Nov 25 18:30:39 web-extra-i-03d4571ac924022a3 readthedocs[22429]: [info     ] System Redirect.               [readthedocs.proxito.views.mixins:210] filename= from_url= host=test-builds.readthedocs.io ip=10.32.30.182 lang_slug=None project_slug=test-builds request_id=ed529f8f-4eab-4aaa-9734-ef4cbe723731 subproject_slug=None to_url=https://test-builds.readthedocs.io/en/4.0/ user_id=None version_slug=None
Nov 25 18:18:00 web-extra-i-03d4571ac924022a3 readthedocs[22242]: [info     ] Excecuting metric.             [readthedocsext.monitoring.metrics.tasks:107] metric_slug=queue-length task_id=4785d0c8-4fdd-4186-9adb-c9325112eb5e

Note that we have readthedocs[22242] where 22242 is the process id and [readthedocsext.monitoring.metrics.tasks:107] where 107 is the line number.

All these changes are done in the other PR that is based on this one #8705 and these examples are taken from the web-extra instance already deployed.

@agjohnson
Copy link
Contributor

Yes, we are adding our own timestamp. Ideally, it would be good to remove the timestamp added by rsyslog and control everything from structlog instead of having a hybrid

Well, the duplicated timestamp is mostly an annoyance. If it's for some reason required from structlog, it's fine to keep it. It wasn't clear if this was the case though.

It might be possible for rsyslog to control the log output per file, but if the only option is to adjust the rsyslog format at the server level, we'd lose important debugging information from all of the host services not using structlog.

We have this information in New Relic

We shouldn't be removing functionality from our logging though. I'm happy to include a secondary service to make log analysis easier for folks. But, there is a big cost to learning new tooling and shifting our processes. Personally speaking, log parsing was a solved problem and I'd much sooner use standard tools than buy into New Relic tooling. I don't want others to have to struggle with manual log parsing however, so the secondary tooling is a great optional addition.

However, I want to move forward with all of these PRs instead of blocking them on these small differences

Agreed, I'd like to see this merged. The best way forward is going to be affecting our current logging as little as possible. It sounded like @ericholscher echoed my concerns here. It seems like there was maybe some work here that could have been considered separate work from the project for metric collection, so there is a bit more friction to merging the work now.

I wrote a custom structlog processor and renderer to add the process id and the line number

Sorry going through everything sequentially here

The format you landed on looks good 👍 I agree we can tweak later, and preserving existing functionality allows a secondary service to exist without interrupting our current processes.

@agjohnson agjohnson force-pushed the humitos/embed-api-log branch from 18a8d71 to e39b3bb Compare November 29, 2021 21:04
@agjohnson
Copy link
Contributor

agjohnson commented Nov 29, 2021

I rebased to make history happy with #8705 merged.

Edit: it is not happier.

@agjohnson agjohnson changed the base branch from master to agj/please-stop-it-github November 29, 2021 21:16
@agjohnson agjohnson changed the base branch from agj/please-stop-it-github to master November 29, 2021 21:16
@agjohnson agjohnson changed the base branch from master to agj/redo-merge-structlog November 29, 2021 21:30
@agjohnson
Copy link
Contributor

agjohnson commented Nov 29, 2021

It wasn't clear that #8705 was merging here and the additional commits made review hard here. Rebasing made things worse.

I opened #8729 to review just the commits that were originally in this PR.

Edit: review looked good there. I returned this to the original commit before I rebased. This includes the #8705 merge into this branch. Everything now looks good to merge, pending tests are happy.

@agjohnson agjohnson changed the base branch from agj/redo-merge-structlog to master November 29, 2021 22:05
@agjohnson agjohnson force-pushed the humitos/embed-api-log branch from e39b3bb to 18a8d71 Compare November 29, 2021 22:07
Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved in #8729 👍

@agjohnson agjohnson merged commit c3b4bef into master Nov 30, 2021
@agjohnson agjohnson deleted the humitos/embed-api-log branch November 30, 2021 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants