DOC-pandas-dev#57585: Add Use Modin section on Scaling to large datasets page (pandas-dev#57586)

YarShev · pre-commit-ci[bot] · datapythonista · pmhatre1 · commit 07c9c34d12eb · 2024-05-06T23:13:14.000-07:00
* DOC-pandas-dev#57585: Add `Use Modin` section on `Scaling to large datasets` page Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address comments Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com> * Address comments Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com> * Revert some changes Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com> * Address comments Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com> --------- Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Garcia <garcia.marc@gmail.com>
diff --git a/doc/source/user_guide/scale.rst b/doc/source/user_guide/scale.rst
@@ -374,5 +374,33 @@ datasets.
 
 You see more dask examples at https://examples.dask.org.
 
+Use Modin
+---------
+
+Modin_ is a scalable dataframe library, which aims to be a drop-in replacement API for pandas and
+provides the ability to scale pandas workflows across nodes and CPUs available. It is also able
+to work with larger than memory datasets. To start working with Modin you just need
+to replace a single line of code, namely, the import statement.
+
+.. code-block:: ipython
+
+   # import pandas as pd
+   import modin.pandas as pd
+
+After you have changed the import statement, you can proceed using the well-known pandas API
+to scale computation. Modin distributes computation across nodes and CPUs available utilizing
+an execution engine it runs on. At the time of Modin 0.27.0 the following execution engines are supported
+in Modin: Ray_, Dask_, `MPI through unidist`_, HDK_. The partitioning schema of a Modin DataFrame partitions it
+along both columns and rows because it gives Modin flexibility and scalability in both the number of columns and
+the number of rows.
+
+For more information refer to `Modin's documentation`_ or the `Modin's tutorials`_.
+
+.. _Modin: https://github.com/modin-project/modin
+.. _`Modin's documentation`: https://modin.readthedocs.io/en/latest
+.. _`Modin's tutorials`: https://github.com/modin-project/modin/tree/master/examples/tutorial/jupyter/execution
+.. _Ray: https://github.com/ray-project/ray
 .. _Dask: https://dask.org
+.. _`MPI through unidist`: https://github.com/modin-project/unidist
+.. _HDK: https://github.com/intel-ai/hdk
 .. _dask.dataframe: https://docs.dask.org/en/latest/dataframe.html