Skip to content

Introduction of core functionality and tests #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Mar 23, 2018
Merged

Introduction of core functionality and tests #1

merged 29 commits into from
Mar 23, 2018

Conversation

mattsb42-aws
Copy link
Member

@mattsb42-aws mattsb42-aws commented Mar 7, 2018

This is the core functionality required for the client to work, including all necessary configuration structures, serialization/deserialization, and cryptographic components.

The highest level interface currently provided in this codebase is found in dynamodb_encryption_sdk.encrypted.item, which contains found functions that will encrypt DynamoDB items and Python dictionaries.

All operations are performed on DynamoDB JSON items. The Python dictionary functions simply use the transformers provided in boto3.dynamodb.types to convert back and forth.

Notes

  • The base namespace is up for discussion. I don't love dynamodb_encryption_sdk, but it seemed as good a starting point as any.
  • The test vectors used in the acceptance tests were extracted from TransformerHolisticTests
  • I'm working on building more tests to hit the edge cases not touched yet by the current tests (currently at 87% coverage).
  • I'm also working on the higher level interfaces that will provide drop-in replacements for boto3 DynamoDB client, resources, and Table classes.

Places wanting specific attention

  • @SalusaSecondus : Please pay special attention to the cryptographic materials providers (material_providers), cryptographic materials (materials), and delegated keys (delegated_keys). I'm pretty sure these are aligned with the intended concepts now, but I'd like your eyes on those.
    • @david-koenig : It would be helpful to have someone with little or no background with the existing DDB encryption client look over these as well, to get an outside perspective on them.
  • @bdonlan : The JCE translation layer (internal.crypto.jce_bridge and delegated_keys.jce) should provide close analogues to the JCE interfaces because we have to work with JCE StandardNames and transformations as the core primitives of configuration in the material description. Having someone who is much more familiar with the JCE structure than I would be very helpful to validate that the way the structures are built makes sense.
    • @SalusaSecondus : As our other resident JCE expert, it would be good to have you look at this part as well.
  • @david-koenig : Other areas of interest are the non-JCE crypto (internal.crypto) and formatting (internal.formatting) components.

Copy link
Contributor

@bdonlan bdonlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level review of API structure around the JCE emulation only.


@attr.s(hash=False)
class JceNameLocalDelegatedKey(DelegatedKey):
"""Delegated key that JCE StandardName algorithm values to determine behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't parse this sentence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be "Delegated key that uses JCE StandardName algorithm values to determine behavior."

fixed

:returns: Encrypted ciphertext
:rtype: bytes
"""
encryptor = encryption.JavaCipher.from_transformation(algorithm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be cached?

Copy link
Member Author

@mattsb42-aws mattsb42-aws Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformation won't necessarily be consistent for all calls to this method across the life of the delegated key instance, so it should not be retained permanently. As for caching JavaCipher instances based on the transformation, I haven't run performance tests, but from past experience I don't think that the JavaCipher class constructor will affect performance enough to justify the additional complexity that caching would add.


@property
def allowed_for_raw_materials(self):
"""Only ``JceNameLocalDelegatedKey`` backed by ``JavaSymmetricEncryptionAlgorithm``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the first class you're looking at, this is a lot of gibberish. Why not just say "Returns true if this key can be used with raw cryptographic materials (i.e. random bytes), rather than a structured key (e.g. an EC or RSA key)." or similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add reference links to the mentioned classes, but it's more complex than that . RawCryptographicMaterials is a specific class in this client[1] that encrypts attributes directly with a delegated key. This is a pattern that we want to heavily discourage, but need to support for legacy compatibility, so the default behavior of all delegated keys is to not allow this.

[1] https://github.com/awslabs/aws-dynamodb-encryption-python/pull/1/files#diff-f4f15b02a6ad66eb2f84473ea538d731R1

# TODO: should we support these?
# HmacMD5
# HmacSHA1
HMAC_SHA224 = ('HmacSHA224', hmac.HMAC, hashes.SHA224)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... does this somehow instantiate the containing class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.java_name = algorithm_name
self.algorithm_type = algorithm_type
self.hash_type = hash_type
self.register()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having globally visible side-effects in a constructor seems a bit surprising and dangerous.

Why not just have a list or dict of supported algorithms in jce_bridge/__init__.py? This is analogous to how JCE does algorithm registration, and means all supported algorithms are in one place for easy inspection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum constructors are run on first module import, so this will only run once, and will run before any of these object can be accessed. The registration adds them to the lookup table of the first class up in the class hierarchy that defines the class attribute __rlookup__. In this case (JavaMac), this is JavaAuthenticator.

This is the same as saying self.__rlookup__[self.java_name] = self, just less to screw up each time it is written and clearer about what it is doing. Having a static list of values would (IMO) just introduce another set of values to maintain and another opportunity for drift.

https://github.com/awslabs/aws-dynamodb-encryption-python/pull/1/files#diff-e39db2a6f8774aaa1f85411c09cdfde1R25

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more concerned about the possibility that something, somewhere else will construct an instance of e.g. JavaMac and have it silently be registered systemwide. Why can't we just put it all in one place as:

macAlgos = {
  'HmacSHA256': JavaMac(hmac.HMAC, hashes.SHA256),
  # ...
}

)


class JavaMode(JavaBridge):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all of these components seem to be registered in the same namespace - what happens if I start asking for strange things like "AES/PKCS5Padding/GCM" or "AES/NoPadding/PKCS5Padding"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; they are registered to the lowest class in the hierarchy that defines __rlookup__. So in the case of JavaMode, all of its members will register with JavaMode and not affect any other JavaBridge classes.

  • JavaMode only knows about JCE Cipher Algorithm Mode names
  • JavaPadding only knows about JCE Cipher Algorithm Padding names
  • JavaEncryptionAlgorithm only knows about JCE Cipher Algorithm names

If you want to load an encryption transformation, you want JavaCipher.

https://github.com/awslabs/aws-dynamodb-encryption-python/pull/1/files#diff-9a2c578c4e4362f654cfa62698eda1a0R22

# First try for encryption ciphers
# https://docs.oracle.com/javase/8/docs/api/javax/crypto/Cipher.html
try:
key_transformer = primitives.JavaEncryptionAlgorithm.from_name(self.algorithm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this handle compound algorithms (e.g. "AES/GCM/NoPadding") or just raw algorithms ("AES")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JceNameLocalDelegatedKey does not know anything about mode or padding, only algorithm. It only accepts algorithm names. This is intentional, because the mode and padding can potentially change for operations over the life of a delegated key instance.

"""
return self.__key_handler_class is primitives.JavaSymmetricEncryptionAlgorithm

def _encrypt(self, algorithm, plaintext, additional_associated_data=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of supporting all possible operations on the same class makes me a bit nervous about attacks like this: https://auth0.com/blog/critical-vulnerabilities-in-json-web-token-libraries/

Where is the algorithm string and key coming from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, and this is one of the things I would like to change with v2.0 of the spec. FWIW, this specific delegated key class explicitly only enables those APIs that the loaded key is likely to support[1][2] (encrypt/decrypt/wrap/unwrap for encryption keys, sign/verify for mac/signature keys).

The algorithm identifier comes from the material description[3].

The key could be anything or come from anywhere: only the cryptographic materials provider knows.

[1] https://github.com/awslabs/aws-dynamodb-encryption-python/pull/1/files#diff-b899d421fd3daa12d0fb4551745d310dR117
[2] https://github.com/awslabs/aws-dynamodb-encryption-python/pull/1/files#diff-b899d421fd3daa12d0fb4551745d310dR130
[3] https://github.com/awslabs/aws-dynamodb-encryption-python/pull/1/files#diff-04101d472c1b1ffa6f7f73dbb6e16bb7R53

Copy link
Contributor

@david-koenig david-koenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed authentication code and serialization so far.

data='TABLE>{}<TABLE'.format(table_name).encode('utf-8')
))
for key in sorted(item.keys()):
action = attribute_actions.action(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that attribute_actions.action will always have the same full list of keys as item, or do we have to handle the case of a KeyError here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled by the AttributeActions class[1]. AttributeActions know two things: the default action and any override actions that have been requested for a specific attribute name. The AttributeActions().action(name) method returns the override action if defined for the requested attribute and otherwise returns the default action.

[1] https://github.com/awslabs/aws-dynamodb-encryption-python/pull/1/files#diff-5f2e39fbf77e7ae30c8e7113f38d6cf2R73

)


def _string_to_sign(item, table_name, attribute_actions):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to me that the string to sign is in accordance with AwsCryptoToolsSpecs section on item signing

@@ -0,0 +1,251 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable.

@@ -0,0 +1,220 @@
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, recommend using exact same full set of test vectors for both Java and Python implementations.

Copy link
Contributor

@david-koenig david-koenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments on material_providers/aws_kms.py only

.. note::

This cryptographic materials provider makes one AWS KMS API call each time encryption
or decryption materials are requested. This means that one request will be made for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: remove "for"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way that you fixed this is fine too.

validator=attr.validators.instance_of(botocore.session.Session),
default=attr.Factory(botocore.session.Session)
)
_grant_tokens = attr.ib(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For attributes that are tuples or dicts, does it make sense to do a deeper validation that the elements of the tuple or dict have the right format? It looks like from the attrs documentation it is easy enough to write your own validator functions. ( https://attrs.readthedocs.io/en/stable/examples.html#decorator )

Not blocking on this comment. I consider it going above and beyond. But the more we can do to help users catch errors at class creation time, the more user-friendly we make the library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I forget when they added that functionality, but we're already requiring 17.4.0 (...which, checking, it looks like I forgot to update the requirements.txt, so I'll do that), which I'm fairly confident is newer than the decorators.

)
return hkdf.derive(initial_material)

def _encryption_key(self, initial_material, key_info):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am reading correctly, _encryption_key and _mac_key methods of this class are identical except for the final argument given to the self._hkdf() method call. I would consider making a single underlying method to avoid the repetition in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Consolidating.

Copy link
Contributor

@david-koenig david-koenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up review on material_providers/aws_kms.py

This file passes my review now. Optional extra changes are mentioned in the comments.

.. note::

This cryptographic materials provider makes one AWS KMS API call each time encryption
or decryption materials are requested. This means that one request will be made for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way that you fixed this is fine too.

)
_grant_tokens = attr.ib(default=attr.Factory(tuple))

@_grant_tokens.validator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only place we need to do such validations, this is fine, but it strikes me that checking whether something is a tuple of strings or a string->string dictionary is a common enough kind of validation that I might consider moving the validator functions that do that to their own module. Then you can simply list them as callables with the validator keyword on the attr.ib command, per https://attrs.readthedocs.io/en/stable/examples.html#callables . (This is closer to how the code was written before, but you don't need to use the attr.validators pre-made callables; you can just use your own.) The regional clients validator is very specific (a string->botocore client dict) so I would probably leave that inline with the decorator as you've done so here.

Again, not blocking on this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See, this is why it's good to get new eyes on things. That has significantly changed since the last time I looked at it, and is much simpler to use now. I had actually just finished chopping this logic off into a stand-alone module (in a commit currently stacked on top of the high-level client work I've been doing), but tweaking it to work as a straightforward callable would definitely be worthwhile.

Copy link
Contributor

@david-koenig david-koenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the materials, material providers, and delegated keys sections of the code. Code is very good and I have no suggestions for improvement but I tried to put comments at the top of each file to indicate which ones I read. My only questions are for my own understanding only, not for any needed code changes.

@@ -0,0 +1,114 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,150 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,202 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,147 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,281 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,147 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,281 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,114 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,150 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -0,0 +1,202 @@
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

"""
self._raise_not_implemented('encrypt')

def decrypt(self, algorithm, ciphertext, additional_associated_data=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both decrypt and encrypt, please also take in the fieldName as a parameter. (It can be ignored by existing implementations.) This is something I deeply regret not having in Java and will need to add to support some current feature requests.

"""
try:
signature_attribute = item.pop(ReservedAttributes.SIGNATURE.value)
except KeyError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no signature is found, then there is no valid signature and it must be rejected just as if a signature were present but incorrect. Otherwise, someone can silently inject plaintext data (without a signature) into a datastore and the application will never notice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here was to support partially migrated data in a table and/or use with an unencrypted/unsigned table. I thought I remembered seeing the Java client have similar behavior somewhere. Not providing some way to support this behavior seems to me like an extremely sharp edge...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior's existence is also extremely sharp. It is purposefully not exposed in Java due to the risk it poses to users. I think that a migration story may need to be in a different (higher-level) object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah; I see what I was remembering. The Java client skips the item if no attribute flags are set[1]. I must have crossed wires at some point and read that as "if the item wasn't encrypted/signed" rather than "if we are configured to ignore this item".

[1] https://github.com/awslabs/aws-dynamodb-encryption-java/blob/master/src/main/java/com/amazonaws/services/dynamodbv2/datamodeling/encryption/DynamoDBEncryptor.java#L225-L227

:rtype: dict
"""
serialized_attribute = serialize_attribute(attribute)
encrypted_attribute = encryption_key.encrypt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that here we need to also pass in the fieldName.

:rtype: dict
"""
encrypted_attribute = attribute[Tag.BINARY.dynamodb_tag]
decrypted_attribute = decryption_key.decrypt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that here we need to also pass in the fieldName.

"""
# TODO: I'm pretty sure these are sane defaults, but verify with someone more familiar with JCE.
if cipher_transformation == 'AESWrap':
return cls.from_transformation('AES/GCM/NoPadding')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AESWrap is the actual name and it doesn't correspond with GCM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a weird sharp edge that I'm not really comfortable with. Java defines AESWrap as a Cipher transformation, but it doesn't actually make a Cipher, and I was trying to map AESWrap to something that would provide similar characteristics without having to define a custom mode that doesn't actually exist in JCE names to try and emulate the behavior of AESWrap. Maybe AES/ECB/NoPadding would be a better mapping?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk in person to straighten this out. I don't understand the issue here.

JAVA_PADDING = {
'NoPadding': SimplePadding('NoPadding', _NoPadding),
'PKCS1Padding': SimplePadding('PKCS1Padding', asymmetric_padding.PKCS1v15),
'PKCS5Padding': BlockSizePadding('PKCS5Padding', symmetric_padding.PKCS7),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why "PKCS5" when it really is PKCS7

'NoPadding': SimplePadding('NoPadding', _NoPadding),
'PKCS1Padding': SimplePadding('PKCS1Padding', asymmetric_padding.PKCS1v15),
'PKCS5Padding': BlockSizePadding('PKCS5Padding', symmetric_padding.PKCS7),
'OAEPWithSHA-1AndMGF1Padding': OaepPadding(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add another comment here warning about Java's idiosyncrasy with the MGF hash function.

:returns: Decoded tag
:rtype: bytes
"""
(_, tag) = unpack_value('>cc', stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember, we've defined the first byte to be RESERVED and not part of the tag. So decode this as two 1 byte values and ensure the first is zero.

attribute_type, attribute_value = list(attribute.items())[0]
if attribute_type == 'B':
return base64.b64encode(attribute_value.value)
return attribute_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if attribute_type isn't 'S'? I think you can reject it.

@mattsb42-aws mattsb42-aws merged commit 131b251 into aws:master Mar 23, 2018
@mattsb42-aws mattsb42-aws deleted the core-functionality branch March 23, 2018 20:54
@mattsb42-aws mattsb42-aws mentioned this pull request Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants