From 2ad6f4d374aad331271fb32c66158fc5dd88d15e Mon Sep 17 00:00:00 2001
From: Chathura Widanage <7312649+chathurawidanage@users.noreply.github.com>
Date: Sun, 9 May 2021 12:48:23 -0400
Subject: [PATCH 1/7] Adding Cylon under out of core
---
doc/source/ecosystem.rst | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
index d53d0556dca04..03cec831169e2 100644
--- a/doc/source/ecosystem.rst
+++ b/doc/source/ecosystem.rst
@@ -405,6 +405,11 @@ Blaze provides a standard API for doing computations with various
in-memory and on-disk backends: NumPy, pandas, SQLAlchemy, MongoDB, PyTables,
PySpark.
+`Cylon `__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Cylon is a fast, scalable, distributed memory parallel runtime with a Pandas like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache Arrow format to represent the data in-memory. Cylon DataFrame API implements most of the core operators of Pandas such as merge, filter, join, concat, group-by, drop_duplicates, etc. These operators are designed to work across thousands of cores to scale applications. It can interoperate with Pandas DataFrame by reading data from Pandas or convert data to Pandas so users can selectively scale parts of their Pandas DataFrame applications.
+
`Dask `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From df2ed43ae3823fcf2610666167bc8ab6b620727b Mon Sep 17 00:00:00 2001
From: Chathura Widanage <7312649+chathurawidanage@users.noreply.github.com>
Date: Sun, 9 May 2021 13:17:17 -0400
Subject: [PATCH 2/7] DOC: Fixed a type in cylon decscription
---
doc/source/ecosystem.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
index 03cec831169e2..19676a5177a60 100644
--- a/doc/source/ecosystem.rst
+++ b/doc/source/ecosystem.rst
@@ -408,7 +408,7 @@ PySpark.
`Cylon `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Cylon is a fast, scalable, distributed memory parallel runtime with a Pandas like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache Arrow format to represent the data in-memory. Cylon DataFrame API implements most of the core operators of Pandas such as merge, filter, join, concat, group-by, drop_duplicates, etc. These operators are designed to work across thousands of cores to scale applications. It can interoperate with Pandas DataFrame by reading data from Pandas or convert data to Pandas so users can selectively scale parts of their Pandas DataFrame applications.
+Cylon is a fast, scalable, distributed memory parallel runtime with a Pandas like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache Arrow format to represent the data in-memory. Cylon DataFrame API implements most of the core operators of Pandas such as merge, filter, join, concat, group-by, drop_duplicates, etc. These operators are designed to work across thousands of cores to scale applications. It can interoperate with Pandas DataFrame by reading data from Pandas or converting data to Pandas so users can selectively scale parts of their Pandas DataFrame applications.
`Dask `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From d75e3b09169ba988e611cf0665d884dc0c2213eb Mon Sep 17 00:00:00 2001
From: Chathura Widanage <7312649+chathurawidanage@users.noreply.github.com>
Date: Tue, 11 May 2021 20:02:14 -0400
Subject: [PATCH 3/7] DOC: Adding a Cylon example and Style fixes
---
doc/source/ecosystem.rst | 29 +++++++++++++++++++++++++++--
1 file changed, 27 insertions(+), 2 deletions(-)
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
index 19676a5177a60..a2dc05d1dd660 100644
--- a/doc/source/ecosystem.rst
+++ b/doc/source/ecosystem.rst
@@ -406,9 +406,34 @@ in-memory and on-disk backends: NumPy, pandas, SQLAlchemy, MongoDB, PyTables,
PySpark.
`Cylon `__
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
+like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache
+Arrow format to represent the data in-memory. Cylon DataFrame API implements
+most of the core operators of pandas such as merge, filter, join, concat,
+group-by, drop_duplicates, etc. These operators are designed to work across
+thousands of cores to scale applications. It can interoperate with pandas
+DataFrame by reading data from pandas or converting data to pandas so users
+can selectively scale parts of their pandas DataFrame applications.
+
+.. code:: python
+
+ from pycylon import read_csv, DataFrame, CylonEnv
+ from pycylon.net import MPIConfig
+
+ # Initialize Cylon distributed environment
+ config: MPIConfig = MPIConfig()
+ env: CylonEnv = CylonEnv(config=config, distributed=True)
+
+ df1: DataFrame = read_csv('/tmp/csv1.csv')
+ df2: DataFrame = read_csv('/tmp/csv2.csv')
+
+ # Using 1000s of cores across the cluster to compute the join
+ df3: Table = df1.join(other=df2, on=[0], algorithm="hash", env=env)
+
+ print(df3)
-Cylon is a fast, scalable, distributed memory parallel runtime with a Pandas like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache Arrow format to represent the data in-memory. Cylon DataFrame API implements most of the core operators of Pandas such as merge, filter, join, concat, group-by, drop_duplicates, etc. These operators are designed to work across thousands of cores to scale applications. It can interoperate with Pandas DataFrame by reading data from Pandas or converting data to Pandas so users can selectively scale parts of their Pandas DataFrame applications.
`Dask `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From ba959d5f9e38a06a7a467358676bc3abd4ea8d3c Mon Sep 17 00:00:00 2001
From: Chathura Widanage <7312649+chathurawidanage@users.noreply.github.com>
Date: Tue, 11 May 2021 20:09:19 -0400
Subject: [PATCH 4/7] DOC: Removed extra line break
---
doc/source/ecosystem.rst | 1 -
1 file changed, 1 deletion(-)
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
index a2dc05d1dd660..c439260f20602 100644
--- a/doc/source/ecosystem.rst
+++ b/doc/source/ecosystem.rst
@@ -434,7 +434,6 @@ can selectively scale parts of their pandas DataFrame applications.
print(df3)
-
`Dask `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From de0e0c7010011cd50d3fd8a46e9084695cedccec Mon Sep 17 00:00:00 2001
From: Chathura Widanage <7312649+chathurawidanage@users.noreply.github.com>
Date: Tue, 11 May 2021 20:37:19 -0400
Subject: [PATCH 5/7] DOC: Removed spaces in blank lines
---
doc/source/ecosystem.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
index c439260f20602..e33ecf7dd0236 100644
--- a/doc/source/ecosystem.rst
+++ b/doc/source/ecosystem.rst
@@ -428,10 +428,10 @@ can selectively scale parts of their pandas DataFrame applications.
df1: DataFrame = read_csv('/tmp/csv1.csv')
df2: DataFrame = read_csv('/tmp/csv2.csv')
-
+
# Using 1000s of cores across the cluster to compute the join
df3: Table = df1.join(other=df2, on=[0], algorithm="hash", env=env)
-
+
print(df3)
`Dask `__
From a6b3db79fdaf4aab0d40a21714d231dda52043e6 Mon Sep 17 00:00:00 2001
From: Chathura Widanage <7312649+chathurawidanage@users.noreply.github.com>
Date: Tue, 11 May 2021 22:53:04 -0400
Subject: [PATCH 6/7] DOC: Removed a whitespace
---
doc/source/ecosystem.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
index e33ecf7dd0236..b5248935d7514 100644
--- a/doc/source/ecosystem.rst
+++ b/doc/source/ecosystem.rst
@@ -422,7 +422,7 @@ can selectively scale parts of their pandas DataFrame applications.
from pycylon import read_csv, DataFrame, CylonEnv
from pycylon.net import MPIConfig
- # Initialize Cylon distributed environment
+ # Initialize Cylon distributed environment
config: MPIConfig = MPIConfig()
env: CylonEnv = CylonEnv(config=config, distributed=True)
From b4c9ba377ee5164bc7ed2ecf39638fb4e554da14 Mon Sep 17 00:00:00 2001
From: Chathura Widanage
Date: Wed, 12 May 2021 00:18:01 -0400
Subject: [PATCH 7/7] remove trailing spaces
---
doc/source/ecosystem.rst | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
index b5248935d7514..bc2325f15852c 100644
--- a/doc/source/ecosystem.rst
+++ b/doc/source/ecosystem.rst
@@ -408,13 +408,13 @@ PySpark.
`Cylon `__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
-like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache
-Arrow format to represent the data in-memory. Cylon DataFrame API implements
-most of the core operators of pandas such as merge, filter, join, concat,
-group-by, drop_duplicates, etc. These operators are designed to work across
-thousands of cores to scale applications. It can interoperate with pandas
-DataFrame by reading data from pandas or converting data to pandas so users
+Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
+like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache
+Arrow format to represent the data in-memory. Cylon DataFrame API implements
+most of the core operators of pandas such as merge, filter, join, concat,
+group-by, drop_duplicates, etc. These operators are designed to work across
+thousands of cores to scale applications. It can interoperate with pandas
+DataFrame by reading data from pandas or converting data to pandas so users
can selectively scale parts of their pandas DataFrame applications.
.. code:: python