Skip to content

feat(event_sources): Add __str__ to Data Classes base DictWrapper #2129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

neilramsay
Copy link
Contributor

Issue number: #2128

Summary

Allow Data Classes to be printed with str()

Changes

Add in a __str__ method to the base class used by all Data Classes.
This provides a dictionary string primarily for debugging purposes.

Include example usage for CodePipeline Data Classes

User experience

Prior to change:

>>> event = CodePipelineJobEvent(load_event("codePipelineEventData.json"))
>>> str(event)
'<aws_lambda_powertools.utilities.data_classes.code_pipeline_job_event.CodePipelineJobEvent object at 0x7f9a248342e0>'

Post change

>>> event = CodePipelineJobEvent(load_event("codePipelineEventData.json"))
>>> str(event)
'{\'account_id\': \'123456789012\', \'data\': {\'action_configuration\': {\'configuration\': {\'decoded_user_parameters\': "{\'KEY\': \'VALUE\'}", \'function_name\': \'my-function\', \'raw_event\': \'[SENSITIVE]\', \'user_parameters\': \'{"KEY": "VALUE"}\'}, \'raw_event\': \'[SENSITIVE]\'}, \'artifact_credentials\': {\'access_key_id\': \'AKIAIOSFODNN7EXAMPLE\', \'expiration_time\': \'1575493418000\', \'raw_event\': \'[SENSITIVE]\', \'secret_access_key\': \'[SENSITIVE]\', \'session_token\': \'[SENSITIVE]\'}, \'continuation_token\': \'None\', \'encryption_key\': \'None\', \'input_artifacts\': [{\'location\': {\'get_type\': \'S3\', \'raw_event\': \'[SENSITIVE]\', \'s3_location\': {\'bucket_name\': \'us-west-2-123456789012-my-pipeline\', \'key\': \'my-pipeline/test-api-2/TdOSFRV\', \'object_key\': \'my-pipeline/test-api-2/TdOSFRV\', \'raw_event\': \'[SENSITIVE]\'}}, \'name\': \'my-pipeline-SourceArtifact\', \'raw_event\': \'[SENSITIVE]\', \'revision\': \'e0c7xmpl2308ca3071aa7bab414de234ab52eea\'}], \'output_artifacts\': [{\'location\': {\'get_type\': \'S3\', \'raw_event\': \'[SENSITIVE]\', \'s3_location\': {\'bucket_name\': \'us-west-2-123456789012-my-pipeline\', \'key\': \'my-pipeline/invokeOutp/D0YHsJn\', \'object_key\': \'my-pipeline/invokeOutp/D0YHsJn\', \'raw_event\': \'[SENSITIVE]\'}}, \'name\': \'invokeOutput\', \'raw_event\': \'[SENSITIVE]\', \'revision\': \'None\'}], \'raw_event\': \'[SENSITIVE]\'}, \'decoded_user_parameters\': "{\'KEY\': \'VALUE\'}", \'get_id\': \'c0d76431-b0e7-xmpl-97e3-e8ee786eb6f6\', \'input_bucket_name\': \'us-west-2-123456789012-my-pipeline\', \'input_object_key\': \'my-pipeline/test-api-2/TdOSFRV\', \'raw_event\': \'[SENSITIVE]\', \'user_parameters\': \'{"KEY": "VALUE"}\'}'

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

__str__ isn't implemented elsewhere for Data Classes in the library, so shouldn't impact the library.
I assume most people wouldn't deliberately print the class name, and memory location for debugging/troubleshooting, so shouldn't impact them.

One area where I need assistance is noting what properties throughout the Data Classes should be blacklisted.

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@neilramsay neilramsay requested a review from a team as a code owner April 17, 2023 00:52
@neilramsay neilramsay requested review from rubenfonseca and removed request for a team April 17, 2023 00:52
@boring-cyborg boring-cyborg bot added documentation Improvements or additions to documentation tests labels Apr 17, 2023
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 17, 2023
@neilramsay
Copy link
Contributor Author

One area where I need assistance is noting what properties throughout the Data Classes should be blacklisted.

@github-actions github-actions bot added feature New feature or functionality labels Apr 17, 2023
@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01 ⚠️

Comparison is base (3e313bb) 97.47% compared to head (659e97b) 97.46%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2129      +/-   ##
===========================================
- Coverage    97.47%   97.46%   -0.01%     
===========================================
  Files          147      147              
  Lines         6809     6872      +63     
  Branches       483      505      +22     
===========================================
+ Hits          6637     6698      +61     
- Misses         136      137       +1     
- Partials        36       37       +1     
Impacted Files Coverage Δ
.../utilities/data_classes/code_pipeline_job_event.py 100.00% <100.00%> (ø)
...lambda_powertools/utilities/data_classes/common.py 100.00% <100.00%> (ø)

... and 10 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@neilramsay neilramsay force-pushed the feat_data_class_str branch 3 times, most recently from 0626b6a to 89484f2 Compare April 18, 2023 00:35
@neilramsay
Copy link
Contributor Author

neilramsay commented Apr 18, 2023

Moving back to draft.
Attempted to print an API Event and it failed.
event = APIGatewayProxyEvent(load_event("apiGatewayProxyV2Event.json"))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nramsay/aws-lambda-powertools-python/aws_lambda_powertools/utilities/data_classes/common.py", line 32, in __str__
    return str(self._str_helper())
  File "/home/nramsay/aws-lambda-powertools-python/aws_lambda_powertools/utilities/data_classes/common.py", line 56, in _str_helper
    property_value = getattr(self, property_key)
  File "/home/nramsay/aws-lambda-powertools-python/aws_lambda_powertools/utilities/data_classes/common.py", line 141, in http_method
    return self["httpMethod"]
  File "/home/nramsay/aws-lambda-powertools-python/aws_lambda_powertools/utilities/data_classes/common.py", line 17, in __getitem__
    return self._data[key]
KeyError: 'httpMethod'

This broke due to the wrong input data going to the wrong wrapper.
This is not strictly a fault in my code, but I should handle it.

@neilramsay neilramsay changed the title feat: Add __str__ to Data Classes base DictWrapper draft: feat: Add __str__ to Data Classes base DictWrapper Apr 18, 2023
@neilramsay neilramsay force-pushed the feat_data_class_str branch 2 times, most recently from d7c47cc to 9a0acbe Compare April 19, 2023 00:49
@neilramsay neilramsay changed the title draft: feat: Add __str__ to Data Classes base DictWrapper feat: Add __str__ to Data Classes base DictWrapper Apr 19, 2023
@neilramsay
Copy link
Contributor Author

Moving back to draft. Attempted to print an API Event and it failed. event = APIGatewayProxyEvent(load_event("apiGatewayProxyV2Event.json"))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nramsay/aws-lambda-powertools-python/aws_lambda_powertools/utilities/data_classes/common.py", line 32, in __str__
    return str(self._str_helper())
  File "/home/nramsay/aws-lambda-powertools-python/aws_lambda_powertools/utilities/data_classes/common.py", line 56, in _str_helper
    property_value = getattr(self, property_key)
  File "/home/nramsay/aws-lambda-powertools-python/aws_lambda_powertools/utilities/data_classes/common.py", line 141, in http_method
    return self["httpMethod"]
  File "/home/nramsay/aws-lambda-powertools-python/aws_lambda_powertools/utilities/data_classes/common.py", line 17, in __getitem__
    return self._data[key]
KeyError: 'httpMethod'

This broke due to the wrong input data going to the wrong wrapper. This is not strictly a fault in my code, but I should handle it.

Have updated to show [EXCEPTION <class 'Exception>] or equivalent when calling a property raises an exception.

>>> event = APIGatewayProxyEvent(load_event("apiGatewayProxyV2Event.json"))
>>> print(event)
{'body': '{"message": "hello world", "username": "tom"}', 'decoded_body': '{"message": "hello world", "username": "tom"}', 'headers': {'Header1': 'value1', 'Header2': 'value1,value2'}, 'http_method': "[EXCEPTION <class 'KeyError'>]", 'is_base64_encoded': False, 'json_body': {'message': 'hello world', 'username': 'tom'}, 'multi_value_headers': "[EXCEPTION <class 'KeyError'>]", 'multi_value_query_string_parameters': None, 'path': "[EXCEPTION <class 'KeyError'>]", 'path_parameters': {'parameter1': 'value1'}, 'query_string_parameters': {'parameter1': 'value1,value2', 'parameter2': 'value'}, 'raw_event': '[SENSITIVE]', 'request_context': {'account_id': '123456789012', 'api_id': 'api-id', 'authorizer': {'claims': None, 'integration_latency': None, 'principal_id': None, 'raw_event': '[SENSITIVE]', 'scopes': None}, 'connected_at': None, 'connection_id': None, 'domain_name': 'id.execute-api.us-east-1.amazonaws.com', 'domain_prefix': 'id', 'event_type': None, 'extended_request_id': None, 'http_method': "[EXCEPTION <class 'KeyError'>]", 'identity': {'access_key': "[EXCEPTION <class 'KeyError'>]", 'account_id': "[EXCEPTION <class 'KeyError'>]", 'api_key': "[EXCEPTION <class 'KeyError'>]", 'api_key_id': "[EXCEPTION <class 'KeyError'>]", 'caller': "[EXCEPTION <class 'KeyError'>]", 'client_cert': "[EXCEPTION <class 'KeyError'>]", 'cognito_authentication_provider': "[EXCEPTION <class 'KeyError'>]", 'cognito_authentication_type': "[EXCEPTION <class 'KeyError'>]", 'cognito_identity_id': "[EXCEPTION <class 'KeyError'>]", 'cognito_identity_pool_id': "[EXCEPTION <class 'KeyError'>]", 'principal_org_id': "[EXCEPTION <class 'KeyError'>]", 'raw_event': '[SENSITIVE]', 'source_ip': "[EXCEPTION <class 'KeyError'>]", 'user': "[EXCEPTION <class 'KeyError'>]", 'user_agent': "[EXCEPTION <class 'KeyError'>]", 'user_arn': "[EXCEPTION <class 'KeyError'>]"}, 'message_direction': None, 'message_id': None, 'operation_name': None, 'path': "[EXCEPTION <class 'KeyError'>]", 'protocol': "[EXCEPTION <class 'KeyError'>]", 'raw_event': '[SENSITIVE]', 'request_id': 'id', 'request_time': None, 'request_time_epoch': "[EXCEPTION <class 'KeyError'>]", 'resource_id': "[EXCEPTION <class 'KeyError'>]", 'resource_path': "[EXCEPTION <class 'KeyError'>]", 'route_key': '$default', 'stage': '$default'}, 'resource': "[EXCEPTION <class 'KeyError'>]", 'stage_variables': {'stageVariable1': 'value1', 'stageVariable2': 'value2'}, 'version': '2.0'}

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @neilramsay, thank you very much for sending this PR, it is very useful when debugging an event and having the values of all fields with just a print. And protecting sensitive fields that shouldn't be saved in the log makes perfect sense, very useful indeed!

@leandrodamascena leandrodamascena linked an issue Apr 19, 2023 that may be closed by this pull request
2 tasks
@neilramsay neilramsay force-pushed the feat_data_class_str branch 2 times, most recently from 32c9001 to 5d003c7 Compare April 20, 2023 03:37
Add in a __str__ method to the base class used by all Data Classes.
This provides a dictionary string primarily for debugging purposes.

To construct the dictionary a helper method _str_helper recursively
works through `properties` in each instance of a Data Class.

Some Data Classes have sensitive properties, such as credentials,
which can be excluded by setting the `_sensitive_properties` attribute
(list of strings).
These properties will have their value replaced with "[SENSITIVE]",
which allows a developer to see a value arrived, but was hidden
as it is sensitive.
Update Data Classes documentation to mention Data Classes can be
inspected with str(), and include example.
@neilramsay neilramsay force-pushed the feat_data_class_str branch from 5d003c7 to 0e724c8 Compare April 20, 2023 23:10
Add tests for DictWrapper str() functionality, including Data Classes
with:
- no properties
- single property
- sensitive property
- recursive properties (Data Classes inside Data Classes)
- exceptions reading properties
- exceptions when transforming a list with DictWrapper subclasses inside
Add _sensitive_properties for CodePipelineArtifactCredentials to prevent
logging out the secret_access_key and session_token properties.
@neilramsay neilramsay force-pushed the feat_data_class_str branch from 0e724c8 to 6362732 Compare April 20, 2023 23:51
@neilramsay
Copy link
Contributor Author

One area where I need assistance is noting what properties throughout the Data Classes should be blacklisted.

@leandrodamascena is this an area we need to discuss further, or is this something for separate PRs?

@leandrodamascena
Copy link
Contributor

One area where I need assistance is noting what properties throughout the Data Classes should be blacklisted.

@leandrodamascena is this an area we need to discuss further, or is this something for separate PRs?

I looked quickly through all the classes and didn't see any fields that demand be sensitive, like secret_access_key or session_token. I mean, maybe there is a DynamoDB record field, for example, that maybe contains sensitive information, but we don't have much to do, because we will never know the data that comes in the field.

But yes, it makes a lot of sense for us to discuss this in other PRs as the need arises.

@leandrodamascena leandrodamascena changed the title feat: Add __str__ to Data Classes base DictWrapper feat(event_sources): Add __str__ to Data Classes base DictWrapper Apr 25, 2023
Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neilramsay I don't see anything else we need to change and we're ready to merge this PR! If you have no other considerations, I'll merge this!

image

@neilramsay
Copy link
Contributor Author

Thank you @leandrodamascena for your patience, and guidance.
Your feedback has definitely improved the quality of this PR.

Let's merge this in :-)

@oieduardorabelo
Copy link

one may say this was laser-focus development! 🎉

laser-kiwi-flag

@heitorlessa
Copy link
Contributor

My suggestion:

Some fields contain user supplied data, which could be either plain text, or JSON.
These fields often have a related conveinence field, which converts plain text JSON in to a Python dictionary.
If this conversion (deserialization) fails, the conveinence field will appear as [Cannot be deserialzed].

Hey @neilramsay, I owe you an additional explanation about straight comments I've made. I'd also like to explain our reasoning about our obsession with docs and shorter message, as perhaps that might be useful to learn our techniques too :)

About minimal comments

We have a daily standup to review our activities in the public board. Yesterday, we were reviewing the last pending items to merge it and I commented live as we spoke on suggestions to @leandrodamascena - hence why this short suggestion.

Our take on documentation

We approach documentation like a product. We're not yet where we'd like to be, but we take each PR as an opportunity to make our messaging accessible, shorter, and more easily readable in different formats (e.g., Mobile).

We see documentation as a large reason for customers to use Powertools for the long-term, besides features, bug fixing, and how we work with the community. We also have an incredibly plurality of personas - from Developers, IT hobbyists, Data Engineers, Ops ("DevOps"), SRE, Data Scientists, Mobile engineers, Embedded engineers, etc - and from various spoken languages, where Global English style is a must.

Wait, what does it all mean?

This means we always strike a balance to deliver the message as straight as possible, then provide more info for anyone wanting to know the nitty gritty details. All carefully thinking about mobile readers where text may be wrapped. This, sometimes counterintuitive, pushes us to use shorter but more paragraphs that typically would be merged under different circumstances.

Now that we're on the same page

This is the process we would then follow by working backwards from your second suggestion:

Some fields contain user supplied data, which could be either plain text, or JSON.
These fields often have a related conveinence field, which converts plain text JSON in to a Python dictionary.
If this conversion (deserialization) fails, the conveinence field will appear as [Cannot be deserialzed].

Questions we asked ourselves:

  • What are we trying to educate or warn people here?
  • Are we trying to convey that any data we fail to deserialize will have a default value?
  • What types of data are more prone to this issue?

We then craft a rather verbose message and then ask two more questions:

  • How do we make it shorter and more succinct without losing its intent?
  • If this is of interest for readers, what information do we split so we can provide more information and keep it lean?

For the first one we use a technique called Line Editing. For the second one, it's a combination of us having English as a Second Language (ESL) and using that to our advantage to see how much of a cognitive load it takes to parse the text before we can grasp it.

Combine this scrutiny, we landed at:

If we fail to deserialize a field value (e.g., JSON), they will appear as [Cannot be deserialized]

Which meant:

  • We are saying that Fields that can't be processed will have the default value of [Cannot be deserialized]
  • We are giving examples but not limiting the types of fields this could happen (e.g., known unknown)
  • We reduced to a single type (JSON) to keep it short for mobile readers (UX writing)

Closing

That's why it's a great deal for us to take however as much time we need to give feedback to anyone contributing it, so they get something back too! We hugely appreciate the effort you've made to improve debugging - it's not everyone who takes hours off their time to improve many hours of many many people out there.

Hope this provides more than enough context on terseness at times, and explain our mental model when dealing with docs ;)

@leandrodamascena leandrodamascena merged commit 4663745 into aws-powertools:develop Apr 26, 2023
@neilramsay
Copy link
Contributor Author

@heitorlessa thank you very much for the documentation explanation.
I've been spending a few days to digest, and process it.

I believe others could benefit from this guidance, so will raise a new Issue suggesting some sort of "advanced meta-documentation", and linking it here.

@neilramsay neilramsay mentioned this pull request Apr 30, 2023
2 tasks
@neilramsay neilramsay deleted the feat_data_class_str branch May 1, 2023 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature New feature or functionality size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Debug printing of Data Classes
6 participants