Skip to content

[Fix #1091] Index.document() does not set index on Document class #1099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

safwanrahman
Copy link
Contributor

This will fix #1091

In the previous versions, the Index.document() method added the Index to the Document class. But it was removed in 69f5ca2 in order to implement the inheritance document.

This patch will add Index to the Document class moreover also set None to Document._index property if Index Innesclass is not in set.

@honzakral r?

@safwanrahman safwanrahman changed the title [Fix #1091] document method of Index does not set index on Document class [Fix #1091] Index.document() does not set index on Document class Dec 30, 2018
@honzakral
Copy link
Contributor

Thank you for working on this, I was hoping to get to it next week but this is much better.

I think the ideal solution here would be to detect somehow that the Index instance stored in ._index was just the default one and have .document() replace it only if so. A check for name == '*' might do it or maybe tag it with some attr? What do you think?

I would like to avoid the situation where _index is ever None as that would lead to too many exception in both our and our users' code.

Does that make sense?

Thank you again for picking this up!

@safwanrahman
Copy link
Contributor Author

safwanrahman commented Dec 30, 2018

Thanks a lot @honzakral for taking it in the vacation time.
I think, if the user does not set a index, we should not set that implicitly. Either they should set it in the inner Index class or they should set it with the Index.document.

The get_index method also validates it. If we set name=='*' or something like that, it should also raise error when anyone want to run some query or anything.

From my prespective, I think we should handle the document inheritance like django handle model inheritance. We can add abstract=True in the Meta inner class. By doing that, we can avoid having a user declaring a document class without any index.

What do you think?

I would like to avoid the situation where _index is ever None as that would lead to too many exception in both our and our users' code.

Thats true. But I think we need to fix this from our end as well as from user end.

@honzakral
Copy link
Contributor

Thank you for the discussion, very useful!

The get_index method also validates it. If we set name=='*' or something like that, it should also raise error when anyone want to run some query or anything.

There is no problem with running a query on *, it works by querying on everything which seems to be what the user asked for. Many times, in simple use cases, people would only have essentially a single Document type in their cluster and then just using * is perfectly fine assuming they create their index explicitly.

You can always have a Document class with a wildcard index name as that is a very common pattern (time- or project- based indices for example) so we have to be able to deal with that situation anyway and it adds no extra complexity to be able to handle * properly.

I am also a huge fan of the simplicity of defining a Document class and then using it on any index, even if the index is not managed by elasticsearch-dsl, for example I often do it for existing indices just to get the deserialization for datetimes or add a custom method to my results, it seems like specifying class Index should not be required there either.

In other cases I like to just be able to access the _index property to get access to the index level operations without having to check for None.

Overall I really think that having _index always be an instance of Index is a good thing.

That leaves us with several options:

  1. Use some detection (name == '*' or explicit tagging with cls._index.is_default) and only override it in that case
  2. give up on Index.document overriding the ._index of a Document and just have it copy the Document mappings onto itself
  3. always have Index.document override the ._index of a Document to self
  4. a hybrid where we introduce Index.document(override=True) to switch between options 2 and 3

I am leaning towards 1 but am certainly open to any of these.

What do you think?

@safwanrahman
Copy link
Contributor Author

I understand your point.
But if you want to query on all the indexes, you can just create a document class with Index innerclass having name=*.
Like

class AllDocument(Document):
    class Index:
        name = '*'

Do you think having it implicit is better than explicit?

If you dont want user to add the wildcard index name explicitly, I also think going with option 1 is better idea. But also having the option 4 makes sense as it keeps the method flexible and more explicit!

@safwanrahman
Copy link
Contributor Author

Overall I really think that having _index always be an instance of Index is a good thing.

I also agree with this. In that case we can have a index with name None?
what do you think?
Index(name=None)

@safwanrahman
Copy link
Contributor Author

@honzakral I think having Index(name=None) will be better if user does not set anything.
If we set Index(name=None), it will set _all while making the query.

By doing this, we can differenciate in the Index.document() method as well as the user experience will not break!

What do you think?

@safwanrahman
Copy link
Contributor Author

@honzakral possible to have a review again?

@lost-person
Copy link

Hi! @safwanrahman I met the similar problem when i use Index.document like this:

test_index1 = Index('test_index1')
test_index2 = Index('test_index2')

ascii_fold = analyzer(
    'ascii_fold',
    # we don't want to split O'Brian or Toulouse-Lautrec
    tokenizer='whitespace',
    filter=[
        'lowercase',
        token_filter('ascii_fold', 'asciifolding')
    ]
)

@test_index1.document
@test_index2.document
class testDoc(Document):
      user_name = Completion(analyzer = ascii_fold)
          
      def save(self, ** kwargs):
             self.update_time = int(time.time() * 1000)
             kwargs['index'] = 'test_index1'
             return super().save(** kwargs)

......

if __name__ == '__main__':
      testDoc.init()

Could you tell me how to resolve it? Thanks! And my elasticsearch-dsl version is 6.3.1

@honzakral
Copy link
Contributor

Sorry for the delays, finally pushed as 29863b3

Thank you for working on this, much appreciated!

@honzakral honzakral closed this Apr 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index.document() doesn't set index on Document on elasticsearch-dsl-py>=6.3.0
3 participants