Skip to content

Implement API key scrubbing and structured logging #806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Feb 23, 2022

Conversation

lee14257
Copy link

@lee14257 lee14257 commented Jan 13, 2022

Summary

  • Adds structured logging module to parse server logs in JSON format
  • Applies API key scrubbing / masking for Flask logs according to User authentication and no-track status
  • Adds additional getter functions for User class: is_authenticated() and is_tracking()
  • Elevates SQLAlchemy logging level to "ERROR" to prevent API keys to show up in query logs

Notes

This PR implements API key scrubbing for logs in the application layer (Flask + gunicorn). If the user is authorized and tracked, the API key will be logged whenever the user makes an API request. If the user is not tracked, the API key parameter will be masked / scrubbed out.

For logs in the web server layer (NGINX), the API key will be shown as a query string during client requests. This will need to be filtered by creating a custom filter or ingestion pipeline on Filebeat before ingesting to Elasticsearch.

TODO in separate PR:

  • Create an ingestion pipeline / filter in Filebeat to scrub the "api_key" query string from the incoming NGINX logs.
  • Make Flask logs more useful (analytics-friendly) for Kibana by adding more parameters such as list of COVIDcast data sources, etc.

@lee14257 lee14257 marked this pull request as ready for review February 8, 2022 21:03
@lee14257 lee14257 requested a review from krivard February 8, 2022 21:03
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start -- but structlog takes kwargs instead of a dict

@lee14257
Copy link
Author

lee14257 commented Feb 20, 2022

Thanks for the feedback, I've made these further changes:

  • Changed all struct log calls to kwargs form.
  • Further applied API key scrubbing for logs containing URL or request path - typically during the resolve_user() call that happens before query runs.

@lee14257 lee14257 requested a review from krivard February 20, 2022 22:20
Comment on lines 72 to 73
sqlalchemy_logger.setLevel(logging.ERROR)
#sqlalchemy_logger.setLevel(gunicorn_logger.level)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If switching to ERROR is intended to be permanent, drop the commented-out line and add a comment explaining why we can't use the gunicorn setting.

If this was temporary for testing, revert.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLAlchemy logs at gunicorn settings show all queries, and changing this to ERROR would prevent it from showing API keys when the server is querying the api_user table (regardless of the no-track option).

The structured logger already stores queries that access data source tables (covidcast, fluview, etc.), so having SQLAlchemy logs to do that seems redundant.

Unless we specifically need all queries to be logged for analytics / Kibana, this setting seems to be more appropriate to hide API key details.

@lee14257 lee14257 requested a review from krivard February 22, 2022 03:09
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice work! do you have merge permissions or should I click the big button?

@lee14257
Copy link
Author

👍 nice work! do you have merge permissions or should I click the big button?

yup! I do have merge permissions

@lee14257 lee14257 merged commit c0efc6c into cmu-delphi:sgratzl/api_key Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants