Skip to content

Commit f90aa44

Browse files
westurnerjorisvandenbossche
authored andcommitted
DOC: ecosystem: Vaex, Pandas on Ray, alphabetization, pandas-datareader (pandas-dev#20345)
1 parent 44c08d4 commit f90aa44

File tree

1 file changed

+85
-46
lines changed

1 file changed

+85
-46
lines changed

doc/source/ecosystem.rst

+85-46
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,13 @@ build powerful and more focused data tools.
1212
The creation of libraries that complement pandas' functionality also allows pandas
1313
development to remain focused around it's original requirements.
1414

15-
This is an in-exhaustive list of projects that build on pandas in order to provide
16-
tools in the PyData space.
15+
This is an inexhaustive list of projects that build on pandas in order to provide
16+
tools in the PyData space. For a list of projects that depend on pandas,
17+
see the
18+
`libraries.io usage page for pandas <https://libraries.io/pypi/pandas/usage>`_
19+
or `search pypi for pandas <https://pypi.org/search/?q=pandas>`_.
1720

18-
We'd like to make it easier for users to find these project, if you know of other
21+
We'd like to make it easier for users to find these projects, if you know of other
1922
substantial projects that you feel should be on this list, please let us know.
2023

2124

@@ -48,6 +51,17 @@ Featuretools is a Python library for automated feature engineering built on top
4851
Visualization
4952
-------------
5053

54+
`Altair <https://altair-viz.github.io/>`__
55+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56+
57+
Altair is a declarative statistical visualization library for Python.
58+
With Altair, you can spend more time understanding your data and its
59+
meaning. Altair's API is simple, friendly and consistent and built on
60+
top of the powerful Vega-Lite JSON specification. This elegant
61+
simplicity produces beautiful and effective visualizations with a
62+
minimal amount of code. Altair works with Pandas DataFrames.
63+
64+
5165
`Bokeh <http://bokeh.pydata.org>`__
5266
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5367

@@ -68,31 +82,22 @@ also goes beyond matplotlib and pandas with the option to perform statistical
6882
estimation while plotting, aggregating across observations and visualizing the
6983
fit of statistical models to emphasize patterns in a dataset.
7084

71-
`yhat/ggplot <https://github.com/yhat/ggplot>`__
72-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
85+
`yhat/ggpy <https://github.com/yhat/ggpy>`__
86+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7387

7488
Hadley Wickham's `ggplot2 <http://ggplot2.org/>`__ is a foundational exploratory visualization package for the R language.
7589
Based on `"The Grammar of Graphics" <http://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html>`__ it
7690
provides a powerful, declarative and extremely general way to generate bespoke plots of any kind of data.
7791
It's really quite incredible. Various implementations to other languages are available,
7892
but a faithful implementation for Python users has long been missing. Although still young
79-
(as of Jan-2014), the `yhat/ggplot <https://github.com/yhat/ggplot>`__ project has been
93+
(as of Jan-2014), the `yhat/ggpy <https://github.com/yhat/ggpy>`__ project has been
8094
progressing quickly in that direction.
8195

82-
`Vincent <https://github.com/wrobstory/vincent>`__
83-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84-
85-
The `Vincent <https://github.com/wrobstory/vincent>`__ project leverages `Vega <https://github.com/trifacta/vega>`__
86-
(that in turn, leverages `d3 <http://d3js.org/>`__) to create
87-
plots. Although functional, as of Summer 2016 the Vincent project has not been updated
88-
in over two years and is `unlikely to receive further updates <https://github.com/wrobstory/vincent#2015-08-12-update>`__.
89-
9096
`IPython Vega <https://github.com/vega/ipyvega>`__
9197
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9298

93-
Like Vincent, the `IPython Vega <https://github.com/vega/ipyvega>`__ project leverages `Vega
94-
<https://github.com/trifacta/vega>`__ to create plots, but primarily
95-
targets the IPython Notebook environment.
99+
`IPython Vega <https://github.com/vega/ipyvega>`__ leverages `Vega
100+
<https://github.com/trifacta/vega>`__ to create plots within Jupyter Notebook.
96101

97102
`Plotly <https://plot.ly/python>`__
98103
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -115,20 +120,28 @@ IDE
115120
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
116121

117122
IPython is an interactive command shell and distributed computing
118-
environment.
119-
IPython Notebook is a web application for creating IPython notebooks.
120-
An IPython notebook is a JSON document containing an ordered list
123+
environment. IPython tab completion works with Pandas methods and also
124+
attributes like DataFrame columns.
125+
126+
`Jupyter Notebook / Jupyter Lab <https://jupyter.org>`__
127+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
128+
Jupyter Notebook is a web application for creating Jupyter notebooks.
129+
A Jupyter notebook is a JSON document containing an ordered list
121130
of input/output cells which can contain code, text, mathematics, plots
122131
and rich media.
123-
IPython notebooks can be converted to a number of open standard output formats
132+
Jupyter notebooks can be converted to a number of open standard output formats
124133
(HTML, HTML presentation slides, LaTeX, PDF, ReStructuredText, Markdown,
125-
Python) through 'Download As' in the web interface and ``ipython nbconvert``
134+
Python) through 'Download As' in the web interface and ``jupyter convert``
126135
in a shell.
127136

128-
Pandas DataFrames implement ``_repr_html_`` methods
129-
which are utilized by IPython Notebook for displaying
130-
(abbreviated) HTML tables. (Note: HTML tables may or may not be
131-
compatible with non-HTML IPython output formats.)
137+
Pandas DataFrames implement ``_repr_html_``and ``_repr_latex`` methods
138+
which are utilized by Jupyter Notebook for displaying
139+
(abbreviated) HTML or LaTeX tables. LaTeX output is properly escaped.
140+
(Note: HTML tables may or may not be
141+
compatible with non-HTML Jupyter output formats.)
142+
143+
See :ref:`Options and Settings <options>` and :ref:`<options.available>`
144+
for pandas ``display.`` settings.
132145

133146
`quantopian/qgrid <https://github.com/quantopian/qgrid>`__
134147
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -144,11 +157,10 @@ editing, testing, debugging, and introspection features.
144157
Spyder can now introspect and display Pandas DataFrames and show
145158
both "column wise min/max and global min/max coloring."
146159

147-
148160
.. _ecosystem.api:
149161

150162
API
151-
-----
163+
---
152164

153165
`pandas-datareader <https://github.com/pydata/pandas-datareader>`__
154166
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -159,14 +171,22 @@ See more in the `pandas-datareader docs <https://pandas-datareader.readthedocs.
159171

160172
The following data feeds are available:
161173

162-
* Yahoo! Finance
163-
* Google Finance
164-
* FRED
165-
* Fama/French
166-
* World Bank
167-
* OECD
168-
* Eurostat
169-
* EDGAR Index
174+
* Google Finance
175+
* Tiingo
176+
* Morningstar
177+
* IEX
178+
* Robinhood
179+
* Enigma
180+
* Quandl
181+
* FRED
182+
* Fama/French
183+
* World Bank
184+
* OECD
185+
* Eurostat
186+
* TSP Fund Data
187+
* Nasdaq Trader Symbol Definitions
188+
* Stooq Index Data
189+
* MOEX Data
170190

171191
`quandl/Python <https://github.com/quandl/Python>`__
172192
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -227,25 +247,24 @@ dimensional arrays, rather than the tabular data for which pandas excels.
227247
Out-of-core
228248
-------------
229249

250+
`Blaze <http://blaze.pydata.org/>`__
251+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
252+
253+
Blaze provides a standard API for doing computations with various
254+
in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables,
255+
PySpark.
256+
230257
`Dask <https://dask.readthedocs.io/en/latest/>`__
231-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
258+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
232259

233260
Dask is a flexible parallel computing library for analytics. Dask
234261
provides a familiar ``DataFrame`` interface for out-of-core, parallel and distributed computing.
235262

236263
`Dask-ML <https://dask-ml.readthedocs.io/en/latest/>`__
237-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
264+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
238265

239266
Dask-ML enables parallel and distributed machine learning using Dask alongside existing machine learning libraries like Scikit-Learn, XGBoost, and TensorFlow.
240267

241-
242-
`Blaze <http://blaze.pydata.org/>`__
243-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
244-
245-
Blaze provides a standard API for doing computations with various
246-
in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables,
247-
PySpark.
248-
249268
`Odo <http://odo.pydata.org>`__
250269
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
251270

@@ -255,6 +274,26 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph
255274
based approach is also extensible by end users for custom formats that may be
256275
too specific for the core of odo.
257276

277+
`Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html>`__
278+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
279+
280+
Pandas on Ray is an early stage DataFrame library that wraps Pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous Pandas notebooks while experiencing a considerable speedup from Pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Pandas on Ray just like you would Pandas.
281+
282+
.. code:: python
283+
284+
# import pandas as pd
285+
import ray.dataframe as pd
286+
287+
288+
`Vaex <https://docs.vaex.io/>`__
289+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
290+
291+
Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup:`9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).
292+
293+
* vaex.from_pandas
294+
* vaex.to_pandas_df
295+
296+
258297
.. _ecosystem.data_validation:
259298

260299
Data validation

0 commit comments

Comments
 (0)