Skip to content

feat(data_masking): add new sensitive data masking utility #2197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 72 commits into from
Sep 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
2d5bfcc
Added logic for sensitive data masking and unit tests
seshubaws May 2, 2023
b2e4d10
Merge branch 'develop' into develop
leandrodamascena May 5, 2023
7d65c7d
Merge branch 'develop' into develop
leandrodamascena May 8, 2023
4b0d0c0
Restructured into smaller files, fixed linting errors
seshubaws May 17, 2023
b34a1ca
Fix linting errors
seshubaws May 17, 2023
092c165
Merge branch 'awslabs:develop' into develop
seshubaws May 17, 2023
4ec6603
Merge branch 'develop' of https://github.com/seshubaws/aws-lambda-pow…
seshubaws May 17, 2023
7b13c6f
Merge branch 'awslabs:develop' into develop
seshubaws May 18, 2023
c6ec149
Lint tests
seshubaws May 18, 2023
8bc8c02
Merge branch 'develop' of https://github.com/seshubaws/aws-lambda-pow…
seshubaws May 18, 2023
21759b5
Fix mypy errors
seshubaws May 19, 2023
6a2e98a
Fixing tests
seshubaws May 22, 2023
d1b6690
Merge branch 'develop' into develop
leandrodamascena May 23, 2023
d39d956
mypy fixes
seshubaws May 23, 2023
2157815
Merge branch 'develop' of https://github.com/seshubaws/aws-lambda-pow…
seshubaws May 23, 2023
97c5b85
Fixed passing in context for aws encryption sdk provider
seshubaws May 24, 2023
f722e70
Use d pytest library for unit testing
seshubaws May 26, 2023
d5f014b
Raise error for unimplemented dm provider
seshubaws May 26, 2023
bef87e0
Fix context for encryption sdk provider
seshubaws May 26, 2023
65eb7e3
Add type annotation to context
seshubaws May 27, 2023
fb3fbc6
Fix context
seshubaws May 27, 2023
ec9f49f
Fixing tests
seshubaws Jun 8, 2023
98ba4d9
Added markdown-lint to pre-commit yaml
seshubaws Jun 9, 2023
f48c2f5
Merging from develop + creating extra dependencies
leandrodamascena Jun 14, 2023
3ad5046
Merging from develop + creating extra dependencies
leandrodamascena Jun 16, 2023
5b7e256
Merging from develop + creating extra dependencies
leandrodamascena Jun 16, 2023
b9053d9
Revisions per comments
seshubaws Jun 22, 2023
0193ee6
Added performance benchmarking tests
seshubaws Jun 29, 2023
22f0b46
Update aws_lambda_powertools/utilities/data_masking/providers/aws_enc…
seshubaws Jul 11, 2023
ece4643
Update aws_lambda_powertools/utilities/data_masking/providers/aws_enc…
seshubaws Jul 11, 2023
8299039
Removed args and ItsDangerous and commented on tests
seshubaws Aug 2, 2023
c36deb5
Merge branch 'develop' of https://github.com/seshubaws/aws-lambda-pow…
seshubaws Aug 2, 2023
5423f7f
Merge branch 'develop' of https://github.com/aws-powertools/powertool…
seshubaws Aug 8, 2023
27eca17
Added functional tests and put input data in separate file
seshubaws Aug 9, 2023
876f4f7
Merge branch 'develop' into develop
heitorlessa Aug 10, 2023
fe37c50
Applied patch to update lock to latest range deps
seshubaws Aug 10, 2023
2eab50b
Made unit tests more legible, removed parameterization
seshubaws Aug 11, 2023
57a5a3a
Adding E2E tests (wip)
seshubaws Aug 21, 2023
8aabc7f
Added data_masking constants, made into BaseProvider and added types
seshubaws Aug 21, 2023
bbeaa4e
Add check for encryption_context in Encryption SDK
seshubaws Aug 22, 2023
5b794f7
Fixing enc_context e2e tests
seshubaws Aug 22, 2023
2955c9c
Added test to encrypt&decrypt from logs in e2e tests
seshubaws Aug 22, 2023
b15b866
Added custom exception for enc_context mismatch, used pytest fixtures…
seshubaws Sep 3, 2023
ee3dddc
Added some docstrings and typing
seshubaws Sep 4, 2023
a79f3df
Added test for using DataMasking in a lambda handler, wip due to inco…
seshubaws Sep 5, 2023
7483d46
Merge remote-tracking branch 'upstream/develop' into develop
seshubaws Sep 6, 2023
7883a48
Revised singleton class to allow for one instance per different confi…
seshubaws Sep 8, 2023
7127c9c
Removed itsdangerous dependencies
seshubaws Sep 8, 2023
01885a5
Added serializer for aws enc sdk
seshubaws Sep 8, 2023
5b83b66
chore: fix merge conflict, remove itsdangerous leftovers (#2)
heitorlessa Sep 8, 2023
371ea05
Building data within func tests instead of using setup.py
seshubaws Sep 9, 2023
b3d123d
Updated json serializer for aws encrypt sdk to return original data type
seshubaws Sep 11, 2023
c0c3f2f
Added ability for user input custom json de/serializer in base class
seshubaws Sep 12, 2023
c5233af
Apply patch for use latest manylinux
seshubaws Sep 12, 2023
bcc735a
Added KMS permissions to lambda handler for e2e tests
seshubaws Sep 13, 2023
eee4c86
Clarified variable names and documented logic (wip still need to disc…
seshubaws Sep 14, 2023
ab15acd
Polished var names, error strings, documentation, etc
seshubaws Sep 19, 2023
73ae382
Added a stack for load testing data masking and added artillery confi…
seshubaws Sep 22, 2023
39a835e
Added 1024MB funcs and load tested with them
seshubaws Sep 22, 2023
da24bcf
Removed orchestrator function and test since same test in E2E
seshubaws Sep 26, 2023
970df5c
Removed singleton class from code and load and e2e tests
seshubaws Sep 26, 2023
487dc0e
Merge from upstream
seshubaws Sep 26, 2023
069aa94
Fix linting errors
seshubaws Sep 26, 2023
ee325f4
Fix mypy errors
seshubaws Sep 26, 2023
49afeed
Modified data masking test names
seshubaws Sep 26, 2023
73df808
Fix dummy KMS key for correct parsing
seshubaws Sep 26, 2023
1ea59f0
Bumping cryptography library
leandrodamascena Sep 26, 2023
ba534ed
Setting default region to avoid HTTP connection
leandrodamascena Sep 26, 2023
ceb6131
Removing user agent tracking
leandrodamascena Sep 26, 2023
bf0e4ed
Reverting
leandrodamascena Sep 26, 2023
6a064b1
Creating a specific provider instead a client to avoid any http call …
leandrodamascena Sep 27, 2023
c01ea35
Merge branch 'develop' into develop
leandrodamascena Sep 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ target:
dev:
pip install --upgrade pip pre-commit poetry
@$(MAKE) dev-version-plugin
poetry install --extras "all"
poetry install --extras "all datamasking-aws-sdk"
pre-commit install

dev-gitpod:
pip install --upgrade pip poetry
@$(MAKE) dev-version-plugin
poetry install --extras "all"
poetry install --extras "all datamasking-aws-sdk"
pre-commit install

format:
Expand Down
5 changes: 5 additions & 0 deletions aws_lambda_powertools/utilities/data_masking/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from aws_lambda_powertools.utilities.data_masking.base import DataMasking

__all__ = [
"DataMasking",
]
170 changes: 170 additions & 0 deletions aws_lambda_powertools/utilities/data_masking/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
import json
from typing import Optional, Union

from aws_lambda_powertools.utilities.data_masking.provider import BaseProvider


class DataMasking:
"""
A utility class for masking sensitive data within various data types.

This class provides methods for masking sensitive information, such as personal
identifiers or confidential data, within different data types such as strings,
dictionaries, lists, and more. It helps protect sensitive information while
preserving the structure of the original data.

Usage:
Instantiate an object of this class and use its methods to mask sensitive data
based on the data type. Supported data types include strings, dictionaries,
and more.

Example:
```
from aws_lambda_powertools.utilities.data_masking.base import DataMasking

def lambda_handler(event, context):
masker = DataMasking()

data = {
"project": "powertools",
"sensitive": "xxxxxxxxxx"
}

masked = masker.mask(data,fields=["sensitive"])

return masked

```
"""

def __init__(self, provider: Optional[BaseProvider] = None):
self.provider = provider or BaseProvider()

def encrypt(self, data, fields=None, **provider_options):
return self._apply_action(data, fields, self.provider.encrypt, **provider_options)

def decrypt(self, data, fields=None, **provider_options):
return self._apply_action(data, fields, self.provider.decrypt, **provider_options)

def mask(self, data, fields=None, **provider_options):
return self._apply_action(data, fields, self.provider.mask, **provider_options)

def _apply_action(self, data, fields, action, **provider_options):
"""
Helper method to determine whether to apply a given action to the entire input data
or to specific fields if the 'fields' argument is specified.

Parameters
----------
data : any
The input data to process.
fields : Optional[List[any]] = None
A list of fields to apply the action to. If 'None', the action is applied to the entire 'data'.
action : Callable
The action to apply to the data. It should be a callable that performs an operation on the data
and returns the modified value.

Returns
-------
any
The modified data after applying the action.
"""

if fields is not None:
return self._apply_action_to_fields(data, fields, action, **provider_options)
else:
return action(data, **provider_options)

def _apply_action_to_fields(
self,
data: Union[dict, str],
fields: list,
action,
**provider_options,
) -> Union[dict, str]:
"""
This method takes the input data, which can be either a dictionary or a JSON string,
and applies a mask, an encryption, or a decryption to the specified fields.

Parameters
----------
data : Union[dict, str])
The input data to process. It can be either a dictionary or a JSON string.
fields : List
A list of fields to apply the action to. Each field can be specified as a string or
a list of strings representing nested keys in the dictionary.
action : Callable
The action to apply to the fields. It should be a callable that takes the current
value of the field as the first argument and any additional arguments that might be required
for the action. It performs an operation on the current value using the provided arguments and
returns the modified value.
**provider_options:
Additional keyword arguments to pass to the 'action' function.

Returns
-------
dict
The modified dictionary after applying the action to the
specified fields.

Raises
-------
ValueError
If 'fields' parameter is None.
TypeError
If the 'data' parameter is not a traversable type

Example
-------
```python
>>> data = {'a': {'b': {'c': 1}}, 'x': {'y': 2}}
>>> fields = ['a.b.c', 'a.x.y']
# The function will transform the value at 'a.b.c' (1) and 'a.x.y' (2)
# and store the result as:
new_dict = {'a': {'b': {'c': 'transformed_value'}}, 'x': {'y': 'transformed_value'}}
```
"""

if fields is None:
raise ValueError("No fields specified.")

if isinstance(data, str):
# Parse JSON string as dictionary
my_dict_parsed = json.loads(data)
elif isinstance(data, dict):
# In case their data has keys that are not strings (i.e. ints), convert it all into a JSON string
my_dict_parsed = json.dumps(data)
# Turn back into dict so can parse it
my_dict_parsed = json.loads(my_dict_parsed)
else:
raise TypeError(
f"Unsupported data type for 'data' parameter. Expected a traversable type, but got {type(data)}.",
)

# For example: ['a.b.c'] in ['a.b.c', 'a.x.y']
for nested_key in fields:
# Prevent overriding loop variable
curr_nested_key = nested_key

# If the nested_key is not a string, convert it to a string representation
if not isinstance(curr_nested_key, str):
curr_nested_key = json.dumps(curr_nested_key)

# Split the nested key string into a list of nested keys
# ['a.b.c'] -> ['a', 'b', 'c']
keys = curr_nested_key.split(".")

# Initialize a current dictionary to the root dictionary
curr_dict = my_dict_parsed

# Traverse the dictionary hierarchy by iterating through the list of nested keys
for key in keys[:-1]:
curr_dict = curr_dict[key]

# Retrieve the final value of the nested field
valtochange = curr_dict[(keys[-1])]

# Apply the specified 'action' to the target value
curr_dict[keys[-1]] = action(valtochange, **provider_options)

return my_dict_parsed
5 changes: 5 additions & 0 deletions aws_lambda_powertools/utilities/data_masking/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
DATA_MASKING_STRING: str = "*****"
CACHE_CAPACITY: int = 100
MAX_CACHE_AGE_SECONDS: float = 300.0
MAX_MESSAGES_ENCRYPTED: int = 200
# NOTE: You can also set max messages/bytes per data key
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from aws_lambda_powertools.utilities.data_masking.provider.base import BaseProvider

__all__ = [
"BaseProvider",
]
34 changes: 34 additions & 0 deletions aws_lambda_powertools/utilities/data_masking/provider/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import json
from typing import Any

from aws_lambda_powertools.utilities.data_masking.constants import DATA_MASKING_STRING


class BaseProvider:
"""
When you try to create an instance of a subclass that does not implement the encrypt method,
you will get a NotImplementedError with a message that says the method is not implemented:
"""

def __init__(self, json_serializer=None, json_deserializer=None) -> None:
self.json_serializer = json_serializer or self.default_json_serializer
self.json_deserializer = json_deserializer or self.default_json_deserializer

def default_json_serializer(self, data):
return json.dumps(data).encode("utf-8")

def default_json_deserializer(self, data):
return json.loads(data.decode("utf-8"))

def encrypt(self, data) -> str:
raise NotImplementedError("Subclasses must implement encrypt()")

def decrypt(self, data) -> Any:
raise NotImplementedError("Subclasses must implement decrypt()")

def mask(self, data) -> Any:
if isinstance(data, (str, dict, bytes)):
return DATA_MASKING_STRING
elif isinstance(data, (list, tuple, set)):
return type(data)([DATA_MASKING_STRING] * len(data))
return DATA_MASKING_STRING
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AwsEncryptionSdkProvider

__all__ = [
"AwsEncryptionSdkProvider",
]
Loading