EmbedAPI: log success requests #8689

humitos · 2021-11-17T11:03:54Z

Log success requests using django-structlog and send them New Relic to know
what projects are using EmbedAPI, if they are using sphinx-hoverxref and what
version of it.

This is the minimum configuration required to start using structlog + New Relic.
If we are happy with the results we can migrate the other logs to structlog,
which should be simple, but a little tedious to do:

replace logging by structlog

# from
import logging
log = logging.getLogger(__name__)

# to
import structlog
log = structlog.get_logger(__name__)

move key=value from log messages to keyword attributes

# from
log.info('Domain not allowed. domain=%s url=%s', domain, url)

# to
log.info('Domain not allowed.',  domain=domain, url=url)

make positional arguments to be keyword attributes (similar to the previous point, but the %s is mixed with the message itself)

# from
log.info('Sending team invite for %s to %s', invite.team, invite.email)

# to
log.info('Sending team invite to user.', team=invite.team, email=invite.email)

migrate to Canonical Log Lines (https://www.structlog.org/en/21.3.0/logging-best-practices.html) by using log.bind and context.

In the future, we can also set up django-structlog to log Celery tasks as well (https://django-structlog.readthedocs.io/en/latest/celery.html) which will give us traceability since it logs the task_id automatically and other extra features.

References

New Relic Log's documentation: https://newrelic.com/blog/how-to-relic/python-structured-logging
django-structlog: https://django-structlog.readthedocs.io/en/latest/
structlog: https://www.structlog.org/en/stable/index.html

astrojuanlu · 2021-11-17T18:08:09Z

How it looks in the New Relic dashboard:

Query view	Log detail

How to query it from the CLI:

$ newrelic-cli.newrelic nrql query -a $ACCOUNT --query "select * from Log where hoverxref_version = '0.8b1' "
[
  {
    "absolute_url": "http://community.dev.readthedocs.io/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/api/v3.html",
    "event": "EmbedAPI successful response.",
    "external": true,
    "hoverxref_version": "0.8b1",
    "ip": "10.10.0.100",
    "level": "info",
    "logger": "readthedocs.embed.v3.views",
    "message": "info - EmbedAPI successful response.",
    "messageId": "ef9f8c10-478b-425f-b2f9-84d0c1e3b3ee",
    "newrelic.logs.batchIndex": 1,
    "newrelic.source": "api.logs",
    "project": "[docs.readthedocs.io](http://docs.readthedocs.io)",
    "request_id": "45e17bce-898f-44ce-9b3f-106c8aa05d09",
    "timestamp": 1637167481565,
    "url": "https://docs.readthedocs.io/en/latest/api/v3.html",
    "user_id": 1,
    "user_username": "admin"
  },
  {
    "event": "EmbedAPI successful response.",
    "external": true,
    "hoverxref_version": "0.8b1",
    "ip": "10.10.0.100",
    "level": "info",
    "logger": "readthedocs.embed.v3.views",
    "message": "info - EmbedAPI successful response.",
    "messageId": "1332d1ce-438a-4fd5-a63a-a2f4d07b6aa5",
    "newrelic.logs.batchIndex": 1,
    "newrelic.source": "api.logs",
    "project": "[docs.readthedocs.io](http://docs.readthedocs.io)",
    "request_id": "11ae36ac-079b-47d0-90b2-0f1979414672",
    "timestamp": 1637167425340,
    "url": "https://docs.readthedocs.io/en/latest/api/v3.html",
    "user_id": 1,
    "user_username": "admin"
  },
  {
    "event": "EmbedAPI successful response.",
    "external": true,
    "hoverxref_version": "0.8b1",
    "ip": "10.10.0.100",
    "level": "info",
    "logger": "readthedocs.embed.v3.views",
    "message": "info - EmbedAPI successful response.",
    "messageId": "1e72abe7-5613-4b3b-b266-4877ee82aee7",
    "newrelic.logs.batchIndex": 1,
    "newrelic.source": "api.logs",
    "project": "[docs.readthedocs.io](http://docs.readthedocs.io)",
    "request_id": "b7821d7b-7f5e-46eb-955f-b488f4a802d3",
    "timestamp": 1637166367670,
    "url": "https://docs.readthedocs.io/en/latest/api/v3.html",
    "user_id": 1,
    "user_username": "admin"
  }
]

Log success requests using `django-structlog` and send them New Relic to know what projects are using EmbedAPI, if they are using sphinx-hoverxref and what version of it. This is the minimum configuration required to start using structlog + New Relic. If we are happy with the results we can migrate the other logs to structlog, which should be simple, but a little tedious to do search&replace. In the future, we can also set up django-structlog to log Celery tasks as well.

ericholscher

This looks like an interesting approach. I haven't played around too much with the NR interface, but it looks like a good first attempt at capturing this data in a way we can aggregate. I believe AWS also has a similar product that does log aggregation/querying, so this feels like a good integration point that isn't locking us in to a specific backend too deeply.

I'm a little worried about swapping out our entire logging infra as part of this change. It makes it a higher stakes change if we want to try to get it into production as a test. It seems like structlog would be useful in other ways in our code though, so if y'all think it would be work adding I'm 👍 on taking the work on now as part of this.

readthedocs/core/signals.py

readthedocs/settings/base.py

readthedocs/settings/dev.py

humitos · 2021-11-18T09:52:17Z

@ericholscher

I'm a little worried about swapping out our entire logging infra as part of this change. It makes it a higher stakes change if we want to try to get it into production as a test. It seems like structlog would be useful in other ways in our code though, so if y'all think it would be work adding I'm +1 on taking the work on now as part of this.

Changing(*) our entire logging infra to structlog + NewRelic has other benefits as well and it's something I've been wanting for years now (did I mention Loggly already? 😅 ). For example,

logs in the console (under development) are colorized and way easier to follow
log entries have more data that gives us traceability: request_id for HTTP and task_id for Celery tasks
we can add as many extra fields as we want globally
avoid grepping .log files together with other bash tools to answer a question
answer questions using New Relic's UI with the data we are already logging

So, I'd be 👍🏼 on working towards this change as a whole by doing the steps mentioned in the PR's description and keep using (r)syslog together with New Relic.

One last thing that I'd like to know if it's possible is to "send logs in chunk" since I would like to avoid hitting an API endpoint for each log.info entry. I didn't see anything in the structlog's documentation about this even in the section where they recommend exactly what we are doing here, but I assume that accumulating the logs and sending them in chunks will be required for this to work properly. In any case, even if this is not possible, I'd move forward with the plan, because we can easily filter "what to send" through the API with one line (eg. if logger == 'readthedocs.embed.v3.views': send()) and still have all the other benefits.

agjohnson · 2021-11-22T20:59:16Z

Pushing this a release out, this still looks to be in a draft state.

humitos · 2021-11-23T09:28:16Z

I did the minimal implementation using structlog + New Relic formatter + rsyslog to send data to NR. As we are using a mixture of structlog (for EmbedAPI) and logging (for the rest) we cannot use all the features from structlog. However, since this is an initial test I didn't migrate all the code as proposed in the description of the PR yet. I'd like to do it once we have some data being sent to NR and we are happy with it.

Note that all the logging keeps working in the same way, but the log entries coming from structlog (those from EmbedAPI) are rendered as JSON in the console for now. Once we migrate our logging to structlog we can have a lot nicer rendering. Example,

Use structlog _only_ for EmbedAPI v2/v3 for now. We will do a test in production to this and decide the migration path for the whole codebase.

Initial work to migrate our application to structlog for better logging, making them compatible with New Relic's integration. There are more work that can be done here to make it better. For example, more usage of `log.bind` to avoid repetition. Besides, rewriting some log message and reduce them to have more Canonical Log Lines (https://www.structlog.org/en/stable/logging-best-practices.html#canonical-log-lines) instead.

Logs coming from `logging.getLogger` (from other applications) have the same structure that the ones logged by us via `structlog`.

agjohnson · 2021-11-23T19:39:05Z

This seems like a good middle ground. I am finding the newrelic interface is harder to watch logs at, but really helpful if you know what you are looking for. It doesn't seem like the new relic UI or the new relic cli could really replace the simplicity of parsing logs with basic unix tools. So maintaining local logging for debugging and operations, alongside new relic logs for inspecting the data inside the logs, is probably about where we want to be.

humitos · 2021-11-23T19:56:55Z

@agjohnson

It doesn't seem like the new relic UI or the new relic cli could really replace the simplicity of parsing logs with basic unix tools. So maintaining local logging for debugging and operations, alongside new relic logs for inspecting the data inside the logs, is probably about where we want to be.

We can output a .log file with any format we want to be parsed with basic unix tools. That's not a problem. We could output JSON (https://www.structlog.org/en/stable/api.html#structlog.processors.JSONRenderer) and use jq to query it or just something similar to our key=value structure that we are currently using (https://www.structlog.org/en/stable/api.html#structlog.processors.KeyValueRenderer).

On the other hand, having these colors in the console's output is super handy while developing locally.

Let me know if you have specific requests about the output formatting, and we can work together to output a .log file with that particular format.

I'd move forward by merging all the PRs that make usage of structlog all around our codebase and improve any particular thing that we found can be improved.

agjohnson · 2021-11-23T21:10:36Z

I think the format you posted looks great. It's a bit easier to grep than our current format as the module name has clear delineation and it splits up the log text and parameters. I would have an easier time processing the log format you suggested compared to what we have now.

I'd be curious how this looks after going through rsyslog to the central logging server.

humitos · 2021-11-23T21:16:48Z

@agjohnson

I'd be curious how this looks after going through rsyslog to the central logging server.

I already deployed an instance of web-extra as a test. You can see the logs by doing:

readthedocs@util01(org):~$ tail -f /var/log/readthedocs/celery.log | grep web-extra

Note that they are colorized currently, which is something we may not want, but that's as simple as passing colors=False to the processor 😉

agjohnson · 2021-11-23T23:56:01Z

I just see a lot of lines like:

Nov 23 23:53:01 web-extra-i-03d4571ac924022a3 readthedocs/readthedocsext.monit INFO {'task_id': 'fa24bf25-4209-446f-b4a5-e8f1e0305dd0', 'metric_slug': 'celery-active-tasks', 'event': 'Excecuting metric.', 'timestamp': '2021-11-23T23:53:01.096193Z', 'logger': 'readthedocsext.monitoring.metrics.tasks', 'level': 'info'} [readthedocsext.monitoring.metrics.tasks:107]

JSON is very handy for scanning if you are a robot, but alas, I am human and this is very hard to read

humitos · 2021-11-24T09:00:49Z

@agjohnson I don't know where you got those lines. The file that I mentioned does not contain those lines. Which file on what instance contains those lines?

humitos · 2021-11-24T10:17:44Z

OK! I found those lines in /var/log/readthedocs/messages.log.

I updated the syslog formatter to use structlog.dev.ConsoleRenderer(colors=False) so it gets formatted nicely. Besides, I updated the rsyslog filter rules to redirect the logs to the proper files. This work is already tested in web-extra and done in the -ops PR: https://github.com/readthedocs/readthedocs-ops/pull/1097

Now, the logs in /var/logs/readthedocs/message.log look like:

Nov 24 10:13:36 web-extra-i-03d4571ac924022a3 readthedocs: 2021-11-24T10:13:36.195455Z [info     ] Main node found. selector=[role=main] [readthedocs.embed.v3.views] ip=10.32.30.182 request_id=8f3d9f35-e0b1-4da7-b2c0-3bd34c05b8cf user_id=xxx
Nov 24 10:13:36 web-extra-i-03d4571ac924022a3 readthedocs: 2021-11-24T10:13:36.197713Z [info     ] EmbedAPI successful response.  [readthedocs.embed.v3.views] domain=docs.sympy.org external=True hoverxref_version=humitos-test ip=10.32.30.182 project_slug=None referer=None request_id=8f3d9f35-e0b1-4da7-b2c0-3bd34c05b8cf url=https://docs.sympy.org/latest/tutorial/gotchas.html user_id=xxx

humitos · 2021-11-24T10:22:53Z

I made the last adjustments to use colors only locally and plain text in production. IMO, this is ready to be merged together with the other PRs that migrate our app logging to structlog.

agjohnson · 2021-11-25T00:19:21Z

I don't know where you got those lines.

Yup, I was looking at messages.log, for all messages sent through to the rsyslog server.

Now, the logs in /var/logs/readthedocs/message.log look like

Looks good. If there is a way to remove the timestamp, it's duplicated in the message.

Also, the current logs have:

process id, which is incredibly helpful in both debugging stray processes, but also in aligning log lines from a single process, such as determining what a worker process is doing and it's timing.
module line number for the log line. Something that can be guessed in most cases, but we do have modules with similar log lines and a line number is explicit

humitos · 2021-11-25T09:52:48Z

@agjohnson

If there is a way to remove the timestamp, it's duplicated in the message.

Yes, we are adding our own timestamp. Ideally, it would be good to remove the timestamp added by rsyslog and control everything from structlog instead of having a hybrid.

Also, the current logs have:
* process id,
* module line number

We have this information in New Relic (line.number and process.id), together with a request_id and a task_id as well --that allows us to check all the logs for a particular request or task.

We can add or modify the log lines as we want, it's possible by adding more structlog processors. However, I want to move forward with all of these PRs instead of blocking them on these small differences. We can tweak these lines as we go once we have the big parts connected and working all together.

Use a custom renderer to match the structure we want: ``` %programname[%processid] [%level] %message [%logger:%linenumber] %structureddata ``` By default, rsyslog will prepend ``` %timestamp %hostname ``` and it will look exactly as we want.

humitos · 2021-11-25T18:19:53Z

@agjohnson I wrote a custom structlog processor and renderer to add the process id and the line number in a similar way as it was. I'm leaving the timestamp and hostname to be added by rsyslog as it was. The result is:

Nov 25 18:16:46 web-extra-i-03d4571ac924022a3 readthedocs[22130]: [info     ] request_started                [django_structlog.middlewares.request:56] ip=10.32.30.182 request=http://10.32.29.10/_/health_check/ request_id=58a12996-6d36-4a0f-ae4e-d487566447c8 user_agent=ELB-HealthChecker/2.0 user_id=None

Nov 25 18:30:39 web-extra-i-03d4571ac924022a3 readthedocs[22429]: [info     ] System Redirect.               [readthedocs.proxito.views.mixins:210] filename= from_url= host=test-builds.readthedocs.io ip=10.32.30.182 lang_slug=None project_slug=test-builds request_id=ed529f8f-4eab-4aaa-9734-ef4cbe723731 subproject_slug=None to_url=https://test-builds.readthedocs.io/en/4.0/ user_id=None version_slug=None

Nov 25 18:18:00 web-extra-i-03d4571ac924022a3 readthedocs[22242]: [info     ] Excecuting metric.             [readthedocsext.monitoring.metrics.tasks:107] metric_slug=queue-length task_id=4785d0c8-4fdd-4186-9adb-c9325112eb5e

Note that we have readthedocs[22242] where 22242 is the process id and [readthedocsext.monitoring.metrics.tasks:107] where 107 is the line number.

All these changes are done in the other PR that is based on this one #8705 and these examples are taken from the web-extra instance already deployed.

structlog: migrate application code to better logging

agjohnson · 2021-11-29T20:44:59Z

Yes, we are adding our own timestamp. Ideally, it would be good to remove the timestamp added by rsyslog and control everything from structlog instead of having a hybrid

Well, the duplicated timestamp is mostly an annoyance. If it's for some reason required from structlog, it's fine to keep it. It wasn't clear if this was the case though.

It might be possible for rsyslog to control the log output per file, but if the only option is to adjust the rsyslog format at the server level, we'd lose important debugging information from all of the host services not using structlog.

We have this information in New Relic

We shouldn't be removing functionality from our logging though. I'm happy to include a secondary service to make log analysis easier for folks. But, there is a big cost to learning new tooling and shifting our processes. Personally speaking, log parsing was a solved problem and I'd much sooner use standard tools than buy into New Relic tooling. I don't want others to have to struggle with manual log parsing however, so the secondary tooling is a great optional addition.

However, I want to move forward with all of these PRs instead of blocking them on these small differences

Agreed, I'd like to see this merged. The best way forward is going to be affecting our current logging as little as possible. It sounded like @ericholscher echoed my concerns here. It seems like there was maybe some work here that could have been considered separate work from the project for metric collection, so there is a bit more friction to merging the work now.

I wrote a custom structlog processor and renderer to add the process id and the line number

Sorry going through everything sequentially here

The format you landed on looks good 👍 I agree we can tweak later, and preserving existing functionality allows a secondary service to exist without interrupting our current processes.

agjohnson · 2021-11-29T21:04:29Z

I rebased to make history happy with #8705 merged.

Edit: it is not happier.

agjohnson · 2021-11-29T22:00:36Z

It wasn't clear that #8705 was merging here and the additional commits made review hard here. Rebasing made things worse.

I opened #8729 to review just the commits that were originally in this PR.

Edit: review looked good there. I returned this to the original commit before I rebased. This includes the #8705 merge into this branch. Everything now looks good to merge, pending tests are happy.

agjohnson

Approved in #8729 👍

humitos requested review from ericholscher and astrojuanlu November 17, 2021 11:03

humitos requested a review from a team as a code owner November 17, 2021 11:03

humitos force-pushed the humitos/embed-api-log branch 3 times, most recently from 446bbe5 to bcbe3b3 Compare November 17, 2021 17:05

humitos marked this pull request as draft November 17, 2021 17:10

humitos force-pushed the humitos/embed-api-log branch from bcbe3b3 to d65241d Compare November 17, 2021 18:38

ericholscher reviewed Nov 17, 2021

View reviewed changes

humitos force-pushed the humitos/embed-api-log branch 6 times, most recently from c45eab5 to 9c1f625 Compare November 22, 2021 14:10

humitos force-pushed the humitos/embed-api-log branch from 9c1f625 to 4962494 Compare November 23, 2021 09:23

humitos force-pushed the humitos/embed-api-log branch 2 times, most recently from 8622310 to 3444dda Compare November 23, 2021 09:44

Minimal implementation of structlog for EmbedAPI

2dc1c94

Use structlog _only_ for EmbedAPI v2/v3 for now. We will do a test in production to this and decide the migration path for the whole codebase.

humitos force-pushed the humitos/embed-api-log branch from 3444dda to 2dc1c94 Compare November 23, 2021 09:45

humitos requested a review from ericholscher November 23, 2021 09:46

humitos marked this pull request as ready for review November 23, 2021 09:47

humitos mentioned this pull request Nov 23, 2021

structlog: migrate application code to better logging #8705

Merged

humitos added 2 commits November 23, 2021 17:47

Make logs from logging.getLogger to look better

9b261ab

Logs coming from `logging.getLogger` (from other applications) have the same structure that the ones logged by us via `structlog`.

Colored console locally and plain console in production

e04aa5e

astrojuanlu mentioned this pull request Nov 24, 2021

Design doc: collect data about builds #8124

Merged

humitos and others added 3 commits November 25, 2021 19:21

Missing imports

a465871

Ignore request_started and request_finished events

41b325e

Merge pull request #8705 from readthedocs/humitos/migrate-to-structlog

18a8d71

structlog: migrate application code to better logging

agjohnson force-pushed the humitos/embed-api-log branch from 18a8d71 to e39b3bb Compare November 29, 2021 21:04

agjohnson changed the base branch from master to agj/please-stop-it-github November 29, 2021 21:16

agjohnson changed the base branch from agj/please-stop-it-github to master November 29, 2021 21:16

agjohnson changed the base branch from master to agj/redo-merge-structlog November 29, 2021 21:30

agjohnson mentioned this pull request Nov 29, 2021

Break out commits for review. #8729

Closed

agjohnson changed the base branch from agj/redo-merge-structlog to master November 29, 2021 22:05

agjohnson force-pushed the humitos/embed-api-log branch from e39b3bb to 18a8d71 Compare November 29, 2021 22:07

agjohnson approved these changes Nov 29, 2021

View reviewed changes

agjohnson merged commit c3b4bef into master Nov 30, 2021

agjohnson deleted the humitos/embed-api-log branch November 30, 2021 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EmbedAPI: log success requests #8689

EmbedAPI: log success requests #8689

humitos commented Nov 17, 2021 •

edited

Loading

astrojuanlu commented Nov 17, 2021

ericholscher left a comment •

edited

Loading

humitos commented Nov 18, 2021

agjohnson commented Nov 22, 2021

humitos commented Nov 23, 2021 •

edited

Loading

agjohnson commented Nov 23, 2021

humitos commented Nov 23, 2021

agjohnson commented Nov 23, 2021

humitos commented Nov 23, 2021

agjohnson commented Nov 23, 2021 •

edited

Loading

humitos commented Nov 24, 2021

humitos commented Nov 24, 2021

humitos commented Nov 24, 2021 •

edited

Loading

agjohnson commented Nov 25, 2021 •

edited

Loading

humitos commented Nov 25, 2021 •

edited

Loading

humitos commented Nov 25, 2021 •

edited

Loading

agjohnson commented Nov 29, 2021

agjohnson commented Nov 29, 2021 •

edited

Loading

agjohnson commented Nov 29, 2021 •

edited

Loading

agjohnson left a comment

EmbedAPI: log success requests #8689

EmbedAPI: log success requests #8689

Conversation

humitos commented Nov 17, 2021 • edited Loading

References

astrojuanlu commented Nov 17, 2021

ericholscher left a comment • edited Loading

Choose a reason for hiding this comment

humitos commented Nov 18, 2021

agjohnson commented Nov 22, 2021

humitos commented Nov 23, 2021 • edited Loading

agjohnson commented Nov 23, 2021

humitos commented Nov 23, 2021

agjohnson commented Nov 23, 2021

humitos commented Nov 23, 2021

agjohnson commented Nov 23, 2021 • edited Loading

humitos commented Nov 24, 2021

humitos commented Nov 24, 2021

humitos commented Nov 24, 2021 • edited Loading

agjohnson commented Nov 25, 2021 • edited Loading

humitos commented Nov 25, 2021 • edited Loading

humitos commented Nov 25, 2021 • edited Loading

agjohnson commented Nov 29, 2021

agjohnson commented Nov 29, 2021 • edited Loading

agjohnson commented Nov 29, 2021 • edited Loading

agjohnson left a comment

Choose a reason for hiding this comment

humitos commented Nov 17, 2021 •

edited

Loading

ericholscher left a comment •

edited

Loading

humitos commented Nov 23, 2021 •

edited

Loading

agjohnson commented Nov 23, 2021 •

edited

Loading

humitos commented Nov 24, 2021 •

edited

Loading

agjohnson commented Nov 25, 2021 •

edited

Loading

humitos commented Nov 25, 2021 •

edited

Loading

humitos commented Nov 25, 2021 •

edited

Loading

agjohnson commented Nov 29, 2021 •

edited

Loading

agjohnson commented Nov 29, 2021 •

edited

Loading