Skip to content

DOC: Replace pandas on Ray in ecosystem.rst with Modin #37249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 20, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 17 additions & 10 deletions doc/source/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,23 @@ Dask-ML enables parallel and distributed machine learning using Dask alongside e

Koalas provides a familiar pandas DataFrame interface on top of Apache Spark. It enables users to leverage multi-cores on one machine or a cluster of machines to speed up or scale their DataFrame code.

`Modin <https://github.com/modin-project/modin>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``modin.pandas`` DataFrame is a parallel and distributed drop-in replacement
for pandas. This means that you can use Modin with existing pandas code or write
new code with the existing pandas API. Modin can leverage your entire machine or
cluster to speed up and scale your pandas workloads, including traditionally
time-consuming tasks like ingesting data (``read_csv``, ``read_excel``,
``read_parquet``, etc.).

.. code:: python

# import pandas as pd
import modin.pandas as pd

df = pd.read_csv("big.csv") # use all your cores!

`Odo <http://odo.pydata.org>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -400,16 +417,6 @@ If also displays progress bars.
# df.apply(func)
df.parallel_apply(func)

`Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

pandas on Ray is an early stage DataFrame library that wraps pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous pandas notebooks while experiencing a considerable speedup from pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use pandas on Ray just like you would pandas.

.. code:: python

# import pandas as pd
import ray.dataframe as pd


`Vaex <https://docs.vaex.io/>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down