Skip to content

Commit c9f876c

Browse files
YarShevpre-commit-ci[bot]datapythonista
authored
DOC-#57585: Add Use Modin section on Scaling to large datasets page (#57586)
* DOC-#57585: Add `Use Modin` section on `Scaling to large datasets` page Signed-off-by: Igoshev, Iaroslav <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address comments Signed-off-by: Igoshev, Iaroslav <[email protected]> * Address comments Signed-off-by: Igoshev, Iaroslav <[email protected]> * Revert some changes Signed-off-by: Igoshev, Iaroslav <[email protected]> * Address comments Signed-off-by: Igoshev, Iaroslav <[email protected]> --------- Signed-off-by: Igoshev, Iaroslav <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Garcia <[email protected]>
1 parent 7977a37 commit c9f876c

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

doc/source/user_guide/scale.rst

+28
Original file line numberDiff line numberDiff line change
@@ -374,5 +374,33 @@ datasets.
374374

375375
You see more dask examples at https://examples.dask.org.
376376

377+
Use Modin
378+
---------
379+
380+
Modin_ is a scalable dataframe library, which aims to be a drop-in replacement API for pandas and
381+
provides the ability to scale pandas workflows across nodes and CPUs available. It is also able
382+
to work with larger than memory datasets. To start working with Modin you just need
383+
to replace a single line of code, namely, the import statement.
384+
385+
.. code-block:: ipython
386+
387+
# import pandas as pd
388+
import modin.pandas as pd
389+
390+
After you have changed the import statement, you can proceed using the well-known pandas API
391+
to scale computation. Modin distributes computation across nodes and CPUs available utilizing
392+
an execution engine it runs on. At the time of Modin 0.27.0 the following execution engines are supported
393+
in Modin: Ray_, Dask_, `MPI through unidist`_, HDK_. The partitioning schema of a Modin DataFrame partitions it
394+
along both columns and rows because it gives Modin flexibility and scalability in both the number of columns and
395+
the number of rows.
396+
397+
For more information refer to `Modin's documentation`_ or the `Modin's tutorials`_.
398+
399+
.. _Modin: https://github.com/modin-project/modin
400+
.. _`Modin's documentation`: https://modin.readthedocs.io/en/latest
401+
.. _`Modin's tutorials`: https://github.com/modin-project/modin/tree/master/examples/tutorial/jupyter/execution
402+
.. _Ray: https://github.com/ray-project/ray
377403
.. _Dask: https://dask.org
404+
.. _`MPI through unidist`: https://github.com/modin-project/unidist
405+
.. _HDK: https://github.com/intel-ai/hdk
378406
.. _dask.dataframe: https://docs.dask.org/en/latest/dataframe.html

0 commit comments

Comments
 (0)