Skip to content

Add Async I/O Support #1435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

Add Async I/O Support #1435

wants to merge 11 commits into from

Conversation

jamesbrewerdev
Copy link

@jamesbrewerdev jamesbrewerdev commented Oct 5, 2020

NOTE: This commit message is outdated and will be updated before merging.

This commit adds support for async I/O that mirrors the support added to
elasticsearch-py in elastic/elasticsearch-py#1203.

Changes:

  • A new client argument was added to elasticsearch_dsl.connections.create_connection to allow users to provide the elasticsearch._async.AsyncElasticsearch as their preferred client class. Passing AsyncElasticsearch will enable asynchronous behavior in elasticsearch_dsl.
  • Async versions of the FacetedSearch, Index, Mapping, Search, and UpdateByQuery classes have been added to elasticsearch_dsl._async. The paths for these classes mirror the paths for their sync versions. These classes defer to their respective sync classes for all methods that don't perform I/O.
  • Async versions of Document.delete, .get, .init, .mget, .save, and .update have been added to the AsyncDocument class:
    • Document.delete -> AsyncDocument.delete_async
    • Document.get -> AsyncDocument.get_async
    • Document.init -> AsyncDocument.init_async
    • Document.mget -> AsyncDocument.mget_async
    • Document.save -> AsyncDocument.save_async
    • Document.update -> AsyncDocument.update_async
    • NOTE: Why did I choose delete_async over async def delete? Because I felt that async calls should be optional, even when using AsyncDocument. Ideally, these functions would exist directly on the Document class, but the async/await syntax was introduced in Python 3.6 and causes problems for lower versions. Putting the async/await features in a separate file and including that file conditionally based on the Python version solves these problems.
  • Where possible, the existing methods have been refactored to re-use their existing implementation instead of creating duplication.

Closes #1355.

This commit adds support for async I/O that mirrors the support added to
elasticsearch-py in elastic/elasticsearch-py#1203.

Changes:

* A new `client` argument was added to
  `elasticsearch_dsl.connections.create_connection` to allow users to
  provide the `elasticsearch._async.AsyncElasticsearch` as their
  preferred client class. Passing `AsyncElasticsearch` will enable
  asynchronous behavior in `elasticsearch_dsl`.
* Async versions of the `FacetedSearch`, `Index`, `Mapping`, `Search`,
  and `UpdateByQuery` classes have been added to
  elasticsearch_dsl._async. The paths for these classes mirror the paths
  for their sync versions. These classes defer to their respective sync
  classes for all methods that don't perform I/O.
* Async versions of `Document.get`, `Document.init`, `Document.mget`,
  `Document.delete`, `Document.save`, and `Document.update` have been
  added to the `Document` class with the following names:
    * `Document.delete` -> `Document.delete_async`
    * `Document.get` -> `Document.get_async`
    * `Document.init` -> `Document.init_async`
    * `Document.mget` -> `Document.mget_async`
    * `Document.save` -> `Document.save_async`
    * `Document.update` -> `Document.update_async`
* Where possible, the existing methods have been refactored to re-use
  their existing implementation instead of creating duplication.

Closes #1355.
@cla-checker-service
Copy link

cla-checker-service bot commented Oct 5, 2020

💚 CLA has been signed

@jamesbrewerdev
Copy link
Author

@sethmlarson This commit is missing tests and maybe some other stuff too, which I'm happy to add, but first I wanted to get a 👍 or 👎 on whether this approach seems like it's going in the right direction and see if you have any early feedback.

@sethmlarson
Copy link
Contributor

Just an FYI I've seen this PR and have reviewing it on my todo list. Thanks for getting this started.

@jamesbrewerdev
Copy link
Author

@sethmlarson Would it help if I broke this PR into a series of smaller PRs?

@zikphil
Copy link

zikphil commented Oct 22, 2020

Hello, I have read this PR and I approve it.... :D

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! Going to prefix this review with letting you know I'll have a lot more time to allocate to Elasticsearch-DSL after Elastic Stack 7.10 has been released. For now I have the following comments:

  • I'd like to integrate unasync along with clever unasync.Rule.additional_replacements usage somehow into this PR if possible so that supporting sync and async together only requires changing code in one place.
  • If we're doing AsyncDocument, etc then we don't need the async_ prefix on functions. The class name namespaces all functions within the class to potentially be async.

More detailed review of this PR will come eventually.

@jamesbrewerdev
Copy link
Author

Sounds good. Just to clarify a few things:

  1. By suggesting unasync, are you also suggesting that elasticsearch-dsl-py should be async-by-default? E.g. elasticsearch_dsl.document.Document becomes async and unasync generates something like elasticsearch_dsl._sync.document.Document?
  2. If so, do you have any concerns about backwards compatibility? If elasticsearch-dsl-py is async-by-default when this is merged then everyone using it will have to update their imports until they can move to support the async versions.
  3. Any thoughts on how to test the generated sync code or whether that is necessary? What about generating documentation? The docs for the sync class docs can essentially copy the async version with minor automated changes, but of course the sync classes should still appear in the docs.

Other thoughts:

  1. Agree that changing code in one place is obviously better.
  2. Agree that async_ prefixes are unnecessary. I added those before I had to create things like AsyncDocument to deal with the syntax issues in 2.7 and forgot to remove them.

@sethmlarson
Copy link
Contributor

@lolripgg I actually want to use unasync closer to how elasticsearch-py uses it, not as a part of the dist build in setup.py but instead a tool that is run as a part of linting that makes sure that the sync and async code are in line before merges.

We keep the existing sync code where it is, make a copy as you've done here with everything under _async/ as you've done, add unasync to generate the elasticsearch_dsl.* code from elasticsearch_dsl._async.* code, and in the PR we'll be able to see that the synchronous code shouldn't change too much.

For testing I'd like to test each class the same way we do now, maybe we can do something with unasync there too if it becomes too unruly?

@jamesbrewerdev
Copy link
Author

I actually want to use unasync closer to how elasticsearch-py uses it, not as a part of the dist build in setup.py but instead a tool that is run as a part of linting

Can you provide an example of how generate_api.py is run in elasticsearch-py? It's the only place I see unasync used and it seems to be run manually. I'm trying to understand how it fits into the linting process.

@zikphil
Copy link

zikphil commented Oct 23, 2020

@lolripgg When I try to add ur repo in requirements.txt using git+https://github.com/lolripgg/elasticsearch-dsl-py.git@lolripgg/async#egg=elasticsearch-dsl, it seems the _async folder does not follow and the import paths are failing.

Am I doing this wrong?

@jamesbrewerdev
Copy link
Author

@zikphil Try running from elasticsearch import _async in a console. This branch is set up to only import _async if we can import the _async package from elasticsearch. If that throws an error then my guess is you're using Python < 3.6 (which is when the async syntax was introduced) or you haven't installed aiohttp. If that's the case you can try installing elasticsearch-dsl[async] or installing aiohttp directly.

@zikphil
Copy link

zikphil commented Oct 23, 2020

@lolripgg The _async package for ES is here. When I explore site-packages and I go to elasticsearch I see the _async folder there. After installing ur repo, I see all the changes you have made to init.py of elasticsearch-dsl but the _async folder is not in the main folder. https://imgur.com/a/ceF0kZq

@jamesbrewerdev
Copy link
Author

Oh. That's probably because setup.py isn't finding the elasticsearch_dsl._async package because it doesn't have an __init__.py. I ran into that too. I'll push some changes shortly.

@jamesbrewerdev
Copy link
Author

jamesbrewerdev commented Oct 23, 2020

  • Update commit message before merging

@jamesbrewerdev
Copy link
Author

@zikphil Give it a try now and let me know how it goes.

@zikphil
Copy link

zikphil commented Oct 23, 2020

@lolripgg Is it normal that there are no more AsyncDocument class?

from elasticsearch_dsl import Date, Integer, Keyword, Text, Document


class SMDataPoint(Document):
    test = Keyword()

    class Index:
        name = 'test'
es = AsyncElasticsearch(hosts=[config.ES_HOST + ':' + str(config.ES_PORT)])
await SMDataPoint.init(using=es)

Running this gives me:

  File "************.py", line 25, in test
    await SMDataPoint.init(using=es)
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.8/site-packages/elasticsearch_dsl/document.py", line 158, in init
    i.save(using=using)
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.8/site-packages/elasticsearch_dsl/index.py", line 307, in save
    if not self.exists(using=using):
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.8/site-packages/elasticsearch_dsl/index.py", line 445, in exists
    ensure_sync_connection(es, "Index.exists")
  File "/Users/phil/.pyenv/versions/nvp/lib/python3.8/site-packages/elasticsearch_dsl/utils.py", line 554, in ensure_sync_connection
    raise TypeError(
TypeError: Index.exists can only be used with the elasticsearch.Elasticsearch client

@jamesbrewerdev
Copy link
Author

jamesbrewerdev commented Oct 23, 2020

@zikphil Yeah. That was intentional, but I'm on the fence about the decision.

They're still available as elasticsearch_dsl._async.document.Document, etc. I could be convinced to rename them to AsyncDocument, etc. The downside to using the name Document is that it conflicts with elasticsearch_dsl.document.Document so it can't be exposed through __all__.

@sethmlarson
Copy link
Contributor

sethmlarson commented Oct 23, 2020

Imo having the Async prefix on the classes is best so they can be imported in __all__.

@jamesbrewerdev
Copy link
Author

jamesbrewerdev commented Oct 23, 2020 via email

@zikphil
Copy link

zikphil commented Oct 26, 2020

I believe we should stay as close as possible to the method used by elasticsearch-py for implementing async support.
It will avoid a lot of confusion with people working with both libraries.

@sethmlarson
Copy link
Contributor

Closing this in favor of #1480

@sla-te sla-te mentioned this pull request Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for async I/O
4 participants