Skip to content

Updated ObjectField to allow for unmapped keys #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 15, 2021
Merged

Updated ObjectField to allow for unmapped keys #302

merged 6 commits into from
Feb 15, 2021

Conversation

oehrlein
Copy link
Contributor

I adjusted the ObjectField class to allow for data with unmapped keys. Previously, if an ObjectField (or NestedField) was created without mapping, it'd show up empty in the index.

For example:

# Source data
{"make": "BMW", "model": "M5"}

# documents.py
class Car(Document):
    data = fields.ObjectField()

# Index before fix
"_source" : {
    "data": [
        {}
    ]
}

# Index after fix
"_source" : {
    "data": {
        "make": "BMW",
        "model": "M5"
    }
}

This prevents the need for the workaround mentioned in #36, which uses a prepare_ function such as this:

def prepare_data(self, instance):
    return instance.data

This fixes issues highlighted in #13, #36, and #235.

@safwanrahman
Copy link
Collaborator

LGTM. @oehrlein Is it possible for you to add some tests? if not possible, we can merge as it is.

@safwanrahman
Copy link
Collaborator

@oehrlein BTW, Thanks for your PR. I was just trying to figure out what was happening here!

@oehrlein
Copy link
Contributor Author

@safwanrahman Re: tests, I can take a crack at it, but not sure when I'd be able to get it done. I could potentially try to start working on it later today, but I can't make any guarantees. I think it'd be best to merge now and get tests added in later, if that's fine with you.

And no problem! Looks like the main issue at hand was dict data objects being treated as iterables, so when no mapping was provided, then it'd just return a list of empty dicts for each key in the dictionary, e.g. [{"key_1": "value_1"}, {"key_2": "value_2"}, {"key_3": "value_3"}] would return [{}, {}, {}]

@@ -125,7 +128,7 @@ def get_value_from_instance(self, instance, field_value_to_ignore=None):
if objs is None:
return {}
try:
is_iterable = bool(iter(objs))
is_iterable = bool(iter(objs)) and not isinstance(objs, dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than changing is_iterable, change to if is_iterable and not isinstance(objs, dict): below.

@@ -125,7 +128,7 @@ def get_value_from_instance(self, instance, field_value_to_ignore=None):
if objs is None:
return {}
try:
is_iterable = bool(iter(objs))
is_iterable = bool(iter(objs)) and not isinstance(objs, dict)
except TypeError:
is_iterable = False

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this.

Suggested change
if is_iterable and not isinstance(objs, dict)::
....

@safwanrahman
Copy link
Collaborator

I am concerned that if anyone use JsonField in DB and add ObjectField with properties, will it filter out the other values?

@oehrlein
Copy link
Contributor Author

@safwanrahman Good call on moving the iterable dict check, it makes more sense with what you suggested. I committed the change.

@oehrlein
Copy link
Contributor Author

I am concerned that if anyone use JsonField in DB and add ObjectField with properties, will it filter out the other values?

@safwanrahman If there are properties in the ObjectField and a field isn't defined, it will not show up. Although, if no properties are defined, then all fields will show up. Similarly, if there is an ObjectField in a NestedField (which is my personal use case), it works the same.

For example, using an ObjectField and this source data:

# Source data
{"make": "BMW", "model": "M5"}

Without defined properties:

# documents.py
class Car(Document):
    data = fields.ObjectField()

# Index
"_source" : {
    "data": {
        "make": "BMW",
        "model": "M5"
    }
}

With partially defined properties:

# documents.py
class Car(Document):
    data = fields.ObjectField(
        properties={
            'make': fields.TextField()
        }
    )

# Index
"_source" : {
    "data": {
        "make": "BMW"
    }
}

With all defined properties:

# documents.py
class Car(Document):
    data = fields.ObjectField(
        properties={
            'make': fields.TextField(),
            'model': fields.TextField()
        }
    )

# Index
"_source" : {
    "data": {
        "make": "BMW",
        "model": "M5"
    }
}

For example, using a NestedField and this source data:

# Source data
[{"make": "BMW", "model": "M3"}, {"make": "BMW", "model": "M5"}]

Without defined properties:

# documents.py
class Car(Document):
    data = fields.NestedField()

# Index
"_source" : {
    "data": [
        {
            "make": "BMW",
            "model": "M3"
        },
        {
            "make": "BMW",
            "model": "M5"
        }
    ]
}

With partially defined properties:

# documents.py
class Car(Document):
    data = fields.NestedField(
        properties={
            'make': fields.TextField()
        }
    )

# Index
"_source" : {
    "data": [
        {
            "make": "BMW"
        },
        {
            "make": "BMW"
        }
    ]
}

With all defined properties:

# documents.py
class Car(Document):
    data = fields.NestedField(
        properties={
            'make': fields.TextField(),
            'model': fields.TextField()
        }
    )

# Index
"_source" : {
    "data": [
        {
            "make": "BMW",
            "model": "M3"
        },
        {
            "make": "BMW",
            "model": "M5"
        }
    ]
}

@oehrlein
Copy link
Contributor Author

oehrlein commented Nov 1, 2020

@safwanrahman Following up on tests, I was able to add some. See commit 31c8857.

@safwanrahman
Copy link
Collaborator

@oehrlein That looks excellent! 🤩🚀
I will test it locally before merging it. Thanks a lot for adding the tests!

@@ -128,8 +131,8 @@ def get_value_from_instance(self, instance, field_value_to_ignore=None):
is_iterable = bool(iter(objs))
except TypeError:
is_iterable = False

if is_iterable:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here why we are excluding the dictionary?
Something like

Suggested change
# Need to exclude dictionary, as it is iterable, but the full data need to be indexed.

@oehrlein
Copy link
Contributor Author

oehrlein commented Nov 1, 2020

@safwanrahman No problem!

I also just added in comments for the two changes!

@oehrlein
Copy link
Contributor Author

@safwanrahman just wanted to follow up on this - is there anything else you need or can it be merged?

@safwanrahman safwanrahman merged commit 230607b into django-es:master Feb 15, 2021
@safwanrahman
Copy link
Collaborator

@oehrlein I am so sorry that I could merge it earlier! Was quite busy dealing with life things! 😓

@safwanrahman
Copy link
Collaborator

@oehrlein I have rebased and merged it to master. Thanks a lot! 👍 💯

@oehrlein
Copy link
Contributor Author

@safwanrahman No worries, I appreciate you taking the time to do this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants