-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Remove Dask and Modin sections in scale.rst in favor of linking to ecosystem docs. #57843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
environment.yml
Outdated
@@ -62,7 +62,6 @@ dependencies: | |||
# downstream packages | |||
- dask-core<=2024.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- dask-core<=2024.2.1 | |
- dask-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke Thanks for your instructions. This reveals that I didn't understand version pinning, but I learned something.
requirements-dev.txt
Outdated
@@ -49,7 +49,6 @@ xlsxwriter>=3.0.5 | |||
zstandard>=0.19.0 | |||
dask<=2024.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dask<=2024.2.1 | |
dask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of details, but changes look good, thanks @yukikitayama for taking care of this.
@@ -217,190 +217,10 @@ require too sophisticated of operations. Some operations, like :meth:`pandas.Dat | |||
much harder to do chunkwise. In these cases, you may be better switching to a | |||
different library that implements these out-of-core algorithms for you. | |||
|
|||
.. _scale.other_libraries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave this label please, so we can link to this section if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we have warning undefined label because I removed it. Makes sense. Thanks for letting me know.
doc/source/user_guide/scale.rst
Outdated
Use Dask | ||
-------- | ||
Use Other Libraries | ||
------------------- | ||
|
||
pandas is just one library offering a DataFrame API. Because of its popularity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph was probably ok when Dask was discussed in detail here, but I think now it does a very poor job at pointing out to the ecosystem.
A bit of context: This is the documentation for scaling pandas (using pandas with data too big to fit in memory, or to process with a single computer). Besides what's explained above of this section, we want users to know that there are a set of libraries such as PySpask, Dask and Modin that implement an API almost identical to the pandas one, but run in clusters. And that they can find more information in the ecosystem page.
Do you mind trying to rephrase this section in a way that is helpful for users to understand this @yukikitayama ?
Thank you very much for the work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update and the work on this @yukikitayama.
I added couple of minor suggestions, but looks good to me.
doc/source/user_guide/scale.rst
Outdated
.. _`MPI through unidist`: https://github.com/modin-project/unidist | ||
.. _HDK: https://github.com/intel-ai/hdk | ||
.. _dask.dataframe: https://docs.dask.org/en/latest/dataframe.html | ||
There are many other libraries which provide similar APIs to pandas and work nicely with pandas DataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many other libraries which provide similar APIs to pandas and work nicely with pandas DataFrame, | |
There are other libraries which provide similar APIs to pandas and work nicely with pandas DataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing and giving me suggestions @datapythonista . I saw the unit test ASAN/UBSAN failed, but is it okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, our CI seems to be failing now, you can ignore that failure.
doc/source/user_guide/scale.rst
Outdated
.. _HDK: https://github.com/intel-ai/hdk | ||
.. _dask.dataframe: https://docs.dask.org/en/latest/dataframe.html | ||
There are many other libraries which provide similar APIs to pandas and work nicely with pandas DataFrame, | ||
but can give you the ability to scale your large dataset processing and analytics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but can give you the ability to scale your large dataset processing and analytics | |
and can give you the ability to scale your large dataset processing and analytics |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
Thanks @yukikitayama (Backporting since the dask dependency changes were backported too) |
…scale.rst in favor of linking to ecosystem docs.
Thank you for reviewing @mroeschke ! |
…in favor of linking to ecosystem docs. (#57861) Co-authored-by: Yuki Kitayama <[email protected]>
…to ecosystem docs. (pandas-dev#57843) * remove Use Dask adn Use Modin sections * add a new section: Use Other Libraries and link to Out-of-core section in Ecosystem web page * remove dask-expr * remove version pinning from dask and dask-core * put other libraries label back in * update use other libraries description to have a better transfer to ecosystem page * change minor sentences for suggestions * remove unnecessary characters
scale.rst
in favor of linking to ecosystem docs. #57831doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.