@@ -12,10 +12,13 @@ build powerful and more focused data tools.
12
12
The creation of libraries that complement pandas' functionality also allows pandas
13
13
development to remain focused around it's original requirements.
14
14
15
- This is an in-exhaustive list of projects that build on pandas in order to provide
16
- tools in the PyData space.
15
+ This is an inexhaustive list of projects that build on pandas in order to provide
16
+ tools in the PyData space. For a list of projects that depend on pandas,
17
+ see the
18
+ `libraries.io usage page for pandas <https://libraries.io/pypi/pandas/usage >`_
19
+ or `search pypi for pandas <https://pypi.org/search/?q=pandas >`_.
17
20
18
- We'd like to make it easier for users to find these project , if you know of other
21
+ We'd like to make it easier for users to find these projects , if you know of other
19
22
substantial projects that you feel should be on this list, please let us know.
20
23
21
24
@@ -48,6 +51,17 @@ Featuretools is a Python library for automated feature engineering built on top
48
51
Visualization
49
52
-------------
50
53
54
+ `Altair <https://altair-viz.github.io/ >`__
55
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56
+
57
+ Altair is a declarative statistical visualization library for Python.
58
+ With Altair, you can spend more time understanding your data and its
59
+ meaning. Altair's API is simple, friendly and consistent and built on
60
+ top of the powerful Vega-Lite JSON specification. This elegant
61
+ simplicity produces beautiful and effective visualizations with a
62
+ minimal amount of code. Altair works with Pandas DataFrames.
63
+
64
+
51
65
`Bokeh <http://bokeh.pydata.org >`__
52
66
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53
67
@@ -68,31 +82,22 @@ also goes beyond matplotlib and pandas with the option to perform statistical
68
82
estimation while plotting, aggregating across observations and visualizing the
69
83
fit of statistical models to emphasize patterns in a dataset.
70
84
71
- `yhat/ggplot <https://github.com/yhat/ggplot >`__
72
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
85
+ `yhat/ggpy <https://github.com/yhat/ggpy >`__
86
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73
87
74
88
Hadley Wickham's `ggplot2 <http://ggplot2.org/ >`__ is a foundational exploratory visualization package for the R language.
75
89
Based on `"The Grammar of Graphics" <http://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html >`__ it
76
90
provides a powerful, declarative and extremely general way to generate bespoke plots of any kind of data.
77
91
It's really quite incredible. Various implementations to other languages are available,
78
92
but a faithful implementation for Python users has long been missing. Although still young
79
- (as of Jan-2014), the `yhat/ggplot <https://github.com/yhat/ggplot >`__ project has been
93
+ (as of Jan-2014), the `yhat/ggpy <https://github.com/yhat/ggpy >`__ project has been
80
94
progressing quickly in that direction.
81
95
82
- `Vincent <https://github.com/wrobstory/vincent >`__
83
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84
-
85
- The `Vincent <https://github.com/wrobstory/vincent >`__ project leverages `Vega <https://github.com/trifacta/vega >`__
86
- (that in turn, leverages `d3 <http://d3js.org/ >`__) to create
87
- plots. Although functional, as of Summer 2016 the Vincent project has not been updated
88
- in over two years and is `unlikely to receive further updates <https://github.com/wrobstory/vincent#2015-08-12-update >`__.
89
-
90
96
`IPython Vega <https://github.com/vega/ipyvega >`__
91
97
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
92
98
93
- Like Vincent, the `IPython Vega <https://github.com/vega/ipyvega >`__ project leverages `Vega
94
- <https://github.com/trifacta/vega> `__ to create plots, but primarily
95
- targets the IPython Notebook environment.
99
+ `IPython Vega <https://github.com/vega/ipyvega >`__ leverages `Vega
100
+ <https://github.com/trifacta/vega> `__ to create plots within Jupyter Notebook.
96
101
97
102
`Plotly <https://plot.ly/python >`__
98
103
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -115,20 +120,28 @@ IDE
115
120
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
116
121
117
122
IPython is an interactive command shell and distributed computing
118
- environment.
119
- IPython Notebook is a web application for creating IPython notebooks.
120
- An IPython notebook is a JSON document containing an ordered list
123
+ environment. IPython tab completion works with Pandas methods and also
124
+ attributes like DataFrame columns.
125
+
126
+ `Jupyter Notebook / Jupyter Lab <https://jupyter.org >`__
127
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
128
+ Jupyter Notebook is a web application for creating Jupyter notebooks.
129
+ A Jupyter notebook is a JSON document containing an ordered list
121
130
of input/output cells which can contain code, text, mathematics, plots
122
131
and rich media.
123
- IPython notebooks can be converted to a number of open standard output formats
132
+ Jupyter notebooks can be converted to a number of open standard output formats
124
133
(HTML, HTML presentation slides, LaTeX, PDF, ReStructuredText, Markdown,
125
- Python) through 'Download As' in the web interface and ``ipython nbconvert ``
134
+ Python) through 'Download As' in the web interface and ``jupyter convert ``
126
135
in a shell.
127
136
128
- Pandas DataFrames implement ``_repr_html_ `` methods
129
- which are utilized by IPython Notebook for displaying
130
- (abbreviated) HTML tables. (Note: HTML tables may or may not be
131
- compatible with non-HTML IPython output formats.)
137
+ Pandas DataFrames implement ``_repr_html_``and ``_repr_latex `` methods
138
+ which are utilized by Jupyter Notebook for displaying
139
+ (abbreviated) HTML or LaTeX tables. LaTeX output is properly escaped.
140
+ (Note: HTML tables may or may not be
141
+ compatible with non-HTML Jupyter output formats.)
142
+
143
+ See :ref: `Options and Settings <options >` and :ref: `<options.available >`
144
+ for pandas ``display. `` settings.
132
145
133
146
`quantopian/qgrid <https://github.com/quantopian/qgrid >`__
134
147
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -144,11 +157,10 @@ editing, testing, debugging, and introspection features.
144
157
Spyder can now introspect and display Pandas DataFrames and show
145
158
both "column wise min/max and global min/max coloring."
146
159
147
-
148
160
.. _ecosystem.api :
149
161
150
162
API
151
- -----
163
+ ---
152
164
153
165
`pandas-datareader <https://github.com/pydata/pandas-datareader >`__
154
166
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -159,14 +171,22 @@ See more in the `pandas-datareader docs <https://pandas-datareader.readthedocs.
159
171
160
172
The following data feeds are available:
161
173
162
- * Yahoo! Finance
163
- * Google Finance
164
- * FRED
165
- * Fama/French
166
- * World Bank
167
- * OECD
168
- * Eurostat
169
- * EDGAR Index
174
+ * Google Finance
175
+ * Tiingo
176
+ * Morningstar
177
+ * IEX
178
+ * Robinhood
179
+ * Enigma
180
+ * Quandl
181
+ * FRED
182
+ * Fama/French
183
+ * World Bank
184
+ * OECD
185
+ * Eurostat
186
+ * TSP Fund Data
187
+ * Nasdaq Trader Symbol Definitions
188
+ * Stooq Index Data
189
+ * MOEX Data
170
190
171
191
`quandl/Python <https://github.com/quandl/Python >`__
172
192
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -227,25 +247,24 @@ dimensional arrays, rather than the tabular data for which pandas excels.
227
247
Out-of-core
228
248
-------------
229
249
250
+ `Blaze <http://blaze.pydata.org/ >`__
251
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
252
+
253
+ Blaze provides a standard API for doing computations with various
254
+ in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables,
255
+ PySpark.
256
+
230
257
`Dask <https://dask.readthedocs.io/en/latest/ >`__
231
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
258
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
232
259
233
260
Dask is a flexible parallel computing library for analytics. Dask
234
261
provides a familiar ``DataFrame `` interface for out-of-core, parallel and distributed computing.
235
262
236
263
`Dask-ML <https://dask-ml.readthedocs.io/en/latest/ >`__
237
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
264
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
238
265
239
266
Dask-ML enables parallel and distributed machine learning using Dask alongside existing machine learning libraries like Scikit-Learn, XGBoost, and TensorFlow.
240
267
241
-
242
- `Blaze <http://blaze.pydata.org/ >`__
243
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
244
-
245
- Blaze provides a standard API for doing computations with various
246
- in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB, PyTables,
247
- PySpark.
248
-
249
268
`Odo <http://odo.pydata.org >`__
250
269
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
251
270
@@ -255,6 +274,26 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph
255
274
based approach is also extensible by end users for custom formats that may be
256
275
too specific for the core of odo.
257
276
277
+ `Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html >`__
278
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
279
+
280
+ Pandas on Ray is an early stage DataFrame library that wraps Pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous Pandas notebooks while experiencing a considerable speedup from Pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Pandas on Ray just like you would Pandas.
281
+
282
+ .. code :: python
283
+
284
+ # import pandas as pd
285
+ import ray.dataframe as pd
286
+
287
+
288
+ `Vaex <https://docs.vaex.io/ >`__
289
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
290
+
291
+ Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. Vaex is a python library for Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion (10\ :sup: `9`) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).
292
+
293
+ * vaex.from_pandas
294
+ * vaex.to_pandas_df
295
+
296
+
258
297
.. _ecosystem.data_validation :
259
298
260
299
Data validation
0 commit comments