@@ -21,10 +21,6 @@ please let us know.
21
21
22
22
## Statistics and machine learning
23
23
24
- ### [ pandas-tfrecords] ( https://pypi.org/project/pandas-tfrecords/ )
25
-
26
- Easy saving pandas dataframe to tensorflow tfrecords format and reading tfrecords to pandas.
27
-
28
24
### [ Statsmodels] ( https://www.statsmodels.org/ )
29
25
30
26
Statsmodels is the prominent Python "statistics and econometrics
@@ -34,11 +30,6 @@ modeling functionality that is out of pandas' scope. Statsmodels
34
30
leverages pandas objects as the underlying data container for
35
31
computation.
36
32
37
- ### [ sklearn-pandas] ( https://github.com/scikit-learn-contrib/sklearn-pandas )
38
-
39
- Use pandas DataFrames in your [ scikit-learn] ( https://scikit-learn.org/ )
40
- ML pipeline.
41
-
42
33
### [ Featuretools] ( https://github.com/alteryx/featuretools/ )
43
34
44
35
Featuretools is a Python library for automated feature engineering built
@@ -150,13 +141,6 @@ df # discover interesting insights!
150
141
151
142
By printing out a dataframe, Lux automatically [ recommends a set of visualizations] ( https://raw.githubusercontent.com/lux-org/lux-resources/master/readme_img/demohighlight.gif ) that highlights interesting trends and patterns in the dataframe. Users can leverage any existing pandas commands without modifying their code, while being able to visualize their pandas data structures (e.g., DataFrame, Series, Index) at the same time. Lux also offers a [ powerful, intuitive language] ( https://lux-api.readthedocs.io/en/latest/source/guide/vis.html> ) that allow users to create Altair, matplotlib, or Vega-Lite visualizations without having to think at the level of code.
152
143
153
- ### [ QtPandas] ( https://github.com/draperjames/qtpandas )
154
-
155
- Spun off from the main pandas library, the
156
- [ qtpandas] ( https://github.com/draperjames/qtpandas ) library enables
157
- DataFrame visualization and manipulation in PyQt4 and PySide
158
- applications.
159
-
160
144
### [ D-Tale] ( https://github.com/man-group/dtale )
161
145
162
146
D-Tale is a lightweight web client for visualizing pandas data structures. It
@@ -210,12 +194,6 @@ or may not be compatible with non-HTML Jupyter output formats.)
210
194
See [ Options and Settings] ( https://pandas.pydata.org/docs/user_guide/options.html )
211
195
for pandas ` display. ` settings.
212
196
213
- ### [ modin-project/modin-spreadsheet] ( https://github.com/modin-project/modin-spreadsheet )
214
-
215
- modin-spreadsheet is an interactive grid for sorting and filtering DataFrames in IPython Notebook.
216
- It is a fork of qgrid and is actively maintained by the modin project.
217
- modin-spreadsheet provides similar functionality to qgrid and allows for easy data exploration and manipulation in a tabular format.
218
-
219
197
### [ Spyder] ( https://www.spyder-ide.org/ )
220
198
221
199
Spyder is a cross-platform PyQt-based IDE combining the editing,
@@ -271,18 +249,6 @@ The following data feeds are available:
271
249
- Stooq Index Data
272
250
- MOEX Data
273
251
274
- ### [ quandl/Python] ( https://github.com/quandl/Python )
275
-
276
- Quandl API for Python wraps the Quandl REST API to return Pandas
277
- DataFrames with timeseries indexes.
278
-
279
- ### [ pydatastream] ( https://github.com/vfilimonov/pydatastream )
280
-
281
- PyDatastream is a Python interface to the [ Thomson Dataworks Enterprise
282
- (DWE/Datastream)] ( http://dataworks.thomson.com/Dataworks/Enterprise/1.0/ )
283
- SOAP API to return indexed Pandas DataFrames with financial data. This
284
- package requires valid credentials for this API (non free).
285
-
286
252
### [ pandaSDMX] ( https://pandasdmx.readthedocs.io )
287
253
288
254
pandaSDMX is a library to retrieve and acquire statistical data and
@@ -305,13 +271,6 @@ point-in-time data from ALFRED. fredapi makes use of pandas and returns
305
271
data in a Series or DataFrame. This module requires a FRED API key that
306
272
you can obtain for free on the FRED website.
307
273
308
- ### [ dataframe_sql] ( https://github.com/zbrookle/dataframe_sql )
309
-
310
- `` dataframe_sql `` is a Python package that translates SQL syntax directly into
311
- operations on pandas DataFrames. This is useful when migrating from a database to
312
- using pandas or for users more comfortable with SQL looking for a way to interface
313
- with pandas.
314
-
315
274
## Domain specific
316
275
317
276
### [ Geopandas] ( https://github.com/geopandas/geopandas )
@@ -384,12 +343,6 @@ any Delta table into Pandas dataframe.
384
343
385
344
## Out-of-core
386
345
387
- ### [ Blaze] ( https://blaze.pydata.org/ )
388
-
389
- Blaze provides a standard API for doing computations with various
390
- in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB,
391
- PyTables, PySpark.
392
-
393
346
### [ Cylon] ( https://cylondata.org/ )
394
347
395
348
Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
@@ -457,14 +410,6 @@ import modin.pandas as pd
457
410
df = pd.read_csv(" big.csv" ) # use all your cores!
458
411
```
459
412
460
- ### [ Odo] ( http://odo.pydata.org )
461
-
462
- Odo provides a uniform API for moving data between different formats. It
463
- uses pandas own ` read_csv ` for CSV IO and leverages many existing
464
- packages such as PyTables, h5py, and pymongo to move data between non
465
- pandas formats. Its graph based approach is also extensible by end users
466
- for custom formats that may be too specific for the core of odo.
467
-
468
413
### [ Pandarallel] ( https://github.com/nalepae/pandarallel )
469
414
470
415
Pandarallel provides a simple way to parallelize your pandas operations on all your CPUs by changing only one line of code.
@@ -479,23 +424,6 @@ pandarallel.initialize(progress_bar=True)
479
424
df.parallel_apply(func)
480
425
```
481
426
482
- ### [ Ray] ( https://docs.ray.io/en/latest/data/modin/index.html )
483
-
484
- Pandas on Ray is an early stage DataFrame library that wraps Pandas and
485
- transparently distributes the data and computation. The user does not
486
- need to know how many cores their system has, nor do they need to
487
- specify how to distribute the data. In fact, users can continue using
488
- their previous Pandas notebooks while experiencing a considerable
489
- speedup from Pandas on Ray, even on a single machine. Only a
490
- modification of the import statement is needed, as we demonstrate below.
491
- Once you've changed your import statement, you're ready to use Pandas on
492
- Ray just like you would Pandas.
493
-
494
- ```
495
- # import pandas as pd
496
- import ray.dataframe as pd
497
- ```
498
-
499
427
### [ Vaex] ( https://vaex.io/docs/ )
500
428
501
429
Increasingly, packages are being built on top of pandas to address
@@ -540,11 +468,6 @@ to make data processing pipelines more readable and robust.
540
468
Dataframes contain information that pandera explicitly validates at runtime. This is useful in
541
469
production-critical data pipelines or reproducible research settings.
542
470
543
- ### [ Engarde] ( https://engarde.readthedocs.io/en/latest/ )
544
-
545
- Engarde is a lightweight library used to explicitly state your
546
- assumptions about your datasets and check that they're * actually* true.
547
-
548
471
## Extension data types
549
472
550
473
Pandas provides an interface for defining
@@ -559,12 +482,6 @@ Arrays](https://awkward-array.org/) inside pandas' Series and
559
482
DataFrame. It also provides an accessor for using awkward functions
560
483
on Series that are of awkward type.
561
484
562
- ### [ cyberpandas] ( https://cyberpandas.readthedocs.io/en/latest )
563
-
564
- Cyberpandas provides an extension type for storing arrays of IP
565
- Addresses. These arrays can be stored inside pandas' Series and
566
- DataFrame.
567
-
568
485
### [ Pandas-Genomics] ( https://pandas-genomics.readthedocs.io/en/latest/ )
569
486
570
487
Pandas-Genomics provides an extension type and extension array for working
@@ -599,15 +516,11 @@ authors to coordinate on the namespace.
599
516
| Library | Accessor | Classes |
600
517
| -------------------------------------------------------------------- | ---------- | --------------------- |
601
518
| [ awkward-pandas] ( https://awkward-pandas.readthedocs.io/en/latest/ ) | ` ak ` | ` Series ` |
602
- | [ cyberpandas] ( https://cyberpandas.readthedocs.io/en/latest ) | ` ip ` | ` Series ` |
603
519
| [ pdvega] ( https://altair-viz.github.io/pdvega/ ) | ` vgplot ` | ` Series ` , ` DataFrame ` |
604
520
| [ pandas-genomics] ( https://pandas-genomics.readthedocs.io/en/latest/ ) | ` genomics ` | ` Series ` , ` DataFrame ` |
605
- | [ pandas_path] ( https://github.com/drivendataorg/pandas-path/ ) | ` path ` | ` Index ` , ` Series ` |
606
521
| [ pint-pandas] ( https://github.com/hgrecco/pint-pandas ) | ` pint ` | ` Series ` , ` DataFrame ` |
607
522
| [ physipandas] ( https://github.com/mocquin/physipandas ) | ` physipy ` | ` Series ` , ` DataFrame ` |
608
523
| [ composeml] ( https://github.com/alteryx/compose ) | ` slice ` | ` DataFrame ` |
609
- | [ datatest] ( https://datatest.readthedocs.io/en/stable/ ) | ` validate ` | ` Series ` , ` DataFrame ` |
610
- | [ composeml] ( https://github.com/alteryx/compose ) | ` slice ` | ` DataFrame ` |
611
524
| [ gurobipy-pandas] ( https://github.com/Gurobi/gurobipy-pandas ) | ` gppd ` | ` Series ` , ` DataFrame ` |
612
525
| [ staircase] ( https://www.staircase.dev/ ) | ` sc ` | ` Series ` , ` DataFrame ` |
613
526
| [ woodwork] ( https://github.com/alteryx/woodwork ) | ` slice ` | ` Series ` , ` DataFrame ` |
0 commit comments