Skip to content

WEB: Remove unmaintained projects from Ecosystem #57675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 5, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 0 additions & 87 deletions web/pandas/community/ecosystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,6 @@ please let us know.

## Statistics and machine learning

### [pandas-tfrecords](https://pypi.org/project/pandas-tfrecords/)

Easy saving pandas dataframe to tensorflow tfrecords format and reading tfrecords to pandas.

### [Statsmodels](https://www.statsmodels.org/)

Statsmodels is the prominent Python "statistics and econometrics
Expand All @@ -34,11 +30,6 @@ modeling functionality that is out of pandas' scope. Statsmodels
leverages pandas objects as the underlying data container for
computation.

### [sklearn-pandas](https://github.com/scikit-learn-contrib/sklearn-pandas)

Use pandas DataFrames in your [scikit-learn](https://scikit-learn.org/)
ML pipeline.

### [Featuretools](https://github.com/alteryx/featuretools/)

Featuretools is a Python library for automated feature engineering built
Expand Down Expand Up @@ -150,13 +141,6 @@ df # discover interesting insights!

By printing out a dataframe, Lux automatically [recommends a set of visualizations](https://raw.githubusercontent.com/lux-org/lux-resources/master/readme_img/demohighlight.gif) that highlights interesting trends and patterns in the dataframe. Users can leverage any existing pandas commands without modifying their code, while being able to visualize their pandas data structures (e.g., DataFrame, Series, Index) at the same time. Lux also offers a [powerful, intuitive language](https://lux-api.readthedocs.io/en/latest/source/guide/vis.html>) that allow users to create Altair, matplotlib, or Vega-Lite visualizations without having to think at the level of code.

### [QtPandas](https://github.com/draperjames/qtpandas)

Spun off from the main pandas library, the
[qtpandas](https://github.com/draperjames/qtpandas) library enables
DataFrame visualization and manipulation in PyQt4 and PySide
applications.

### [D-Tale](https://github.com/man-group/dtale)

D-Tale is a lightweight web client for visualizing pandas data structures. It
Expand Down Expand Up @@ -210,12 +194,6 @@ or may not be compatible with non-HTML Jupyter output formats.)
See [Options and Settings](https://pandas.pydata.org/docs/user_guide/options.html)
for pandas `display.` settings.

### [modin-project/modin-spreadsheet](https://github.com/modin-project/modin-spreadsheet)

modin-spreadsheet is an interactive grid for sorting and filtering DataFrames in IPython Notebook.
It is a fork of qgrid and is actively maintained by the modin project.
modin-spreadsheet provides similar functionality to qgrid and allows for easy data exploration and manipulation in a tabular format.

### [Spyder](https://www.spyder-ide.org/)

Spyder is a cross-platform PyQt-based IDE combining the editing,
Expand Down Expand Up @@ -271,18 +249,6 @@ The following data feeds are available:
- Stooq Index Data
- MOEX Data

### [quandl/Python](https://github.com/quandl/Python)

Quandl API for Python wraps the Quandl REST API to return Pandas
DataFrames with timeseries indexes.

### [pydatastream](https://github.com/vfilimonov/pydatastream)

PyDatastream is a Python interface to the [Thomson Dataworks Enterprise
(DWE/Datastream)](http://dataworks.thomson.com/Dataworks/Enterprise/1.0/)
SOAP API to return indexed Pandas DataFrames with financial data. This
package requires valid credentials for this API (non free).

### [pandaSDMX](https://pandasdmx.readthedocs.io)

pandaSDMX is a library to retrieve and acquire statistical data and
Expand All @@ -305,13 +271,6 @@ point-in-time data from ALFRED. fredapi makes use of pandas and returns
data in a Series or DataFrame. This module requires a FRED API key that
you can obtain for free on the FRED website.

### [dataframe_sql](https://github.com/zbrookle/dataframe_sql)

``dataframe_sql`` is a Python package that translates SQL syntax directly into
operations on pandas DataFrames. This is useful when migrating from a database to
using pandas or for users more comfortable with SQL looking for a way to interface
with pandas.

## Domain specific

### [Geopandas](https://github.com/geopandas/geopandas)
Expand Down Expand Up @@ -384,12 +343,6 @@ any Delta table into Pandas dataframe.

## Out-of-core

### [Blaze](https://blaze.pydata.org/)

Blaze provides a standard API for doing computations with various
in-memory and on-disk backends: NumPy, Pandas, SQLAlchemy, MongoDB,
PyTables, PySpark.

### [Cylon](https://cylondata.org/)

Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
Expand Down Expand Up @@ -457,14 +410,6 @@ import modin.pandas as pd
df = pd.read_csv("big.csv") # use all your cores!
```

### [Odo](http://odo.pydata.org)

Odo provides a uniform API for moving data between different formats. It
uses pandas own `read_csv` for CSV IO and leverages many existing
packages such as PyTables, h5py, and pymongo to move data between non
pandas formats. Its graph based approach is also extensible by end users
for custom formats that may be too specific for the core of odo.

### [Pandarallel](https://github.com/nalepae/pandarallel)

Pandarallel provides a simple way to parallelize your pandas operations on all your CPUs by changing only one line of code.
Expand All @@ -479,23 +424,6 @@ pandarallel.initialize(progress_bar=True)
df.parallel_apply(func)
```

### [Ray](https://docs.ray.io/en/latest/data/modin/index.html)

Pandas on Ray is an early stage DataFrame library that wraps Pandas and
transparently distributes the data and computation. The user does not
need to know how many cores their system has, nor do they need to
specify how to distribute the data. In fact, users can continue using
their previous Pandas notebooks while experiencing a considerable
speedup from Pandas on Ray, even on a single machine. Only a
modification of the import statement is needed, as we demonstrate below.
Once you've changed your import statement, you're ready to use Pandas on
Ray just like you would Pandas.

```
# import pandas as pd
import ray.dataframe as pd
```

### [Vaex](https://vaex.io/docs/)

Increasingly, packages are being built on top of pandas to address
Expand Down Expand Up @@ -540,11 +468,6 @@ to make data processing pipelines more readable and robust.
Dataframes contain information that pandera explicitly validates at runtime. This is useful in
production-critical data pipelines or reproducible research settings.

### [Engarde](https://engarde.readthedocs.io/en/latest/)

Engarde is a lightweight library used to explicitly state your
assumptions about your datasets and check that they're *actually* true.

## Extension data types

Pandas provides an interface for defining
Expand All @@ -559,12 +482,6 @@ Arrays](https://awkward-array.org/) inside pandas' Series and
DataFrame. It also provides an accessor for using awkward functions
on Series that are of awkward type.

### [cyberpandas](https://cyberpandas.readthedocs.io/en/latest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would leave this one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll have a look to see if it still works with the latest pandas. I checked that not only didn't have a commit in many years, but it also didn't have any bug report, so I guess it's not used much.

Copy link
Member

@MarcoGorelli MarcoGorelli Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's used at all https://pypistats.org/packages/cyberpandas

I always thought it was just a POC to showcase extension arrays

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I tried, and installing cyberpandas from conda-forge gives an installation error (I guess the recipe is too old) and installing with pip works but fails on import:

>>> import pandas
>>> from cyberpandas import IPArray
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mgarcia/.mambaforge/envs/cyberpandas/lib/python3.12/site-packages/cyberpandas/__init__.py", line 10, in <module>
    from .mac_array import MACType, MACArray
  File "/home/mgarcia/.mambaforge/envs/cyberpandas/lib/python3.12/site-packages/cyberpandas/mac_array.py", line 1, in <module>
    from collections import Iterable
ImportError: cannot import name 'Iterable' from 'collections' (/home/mgarcia/.mambaforge/envs/cyberpandas/lib/python3.12/collections/__init__.py)

I think it's worth removing from the Ecosystem.

CC: @TomAugspurger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to remove IMO. I don't think it's maintained these days.


Cyberpandas provides an extension type for storing arrays of IP
Addresses. These arrays can be stored inside pandas' Series and
DataFrame.

### [Pandas-Genomics](https://pandas-genomics.readthedocs.io/en/latest/)

Pandas-Genomics provides an extension type and extension array for working
Expand Down Expand Up @@ -599,15 +516,11 @@ authors to coordinate on the namespace.
| Library | Accessor | Classes |
| -------------------------------------------------------------------- | ---------- | --------------------- |
| [awkward-pandas](https://awkward-pandas.readthedocs.io/en/latest/) | `ak` | `Series` |
| [cyberpandas](https://cyberpandas.readthedocs.io/en/latest) | `ip` | `Series` |
| [pdvega](https://altair-viz.github.io/pdvega/) | `vgplot` | `Series`, `DataFrame` |
| [pandas-genomics](https://pandas-genomics.readthedocs.io/en/latest/) | `genomics` | `Series`, `DataFrame` |
| [pandas_path](https://github.com/drivendataorg/pandas-path/) | `path` | `Index`, `Series` |
| [pint-pandas](https://github.com/hgrecco/pint-pandas) | `pint` | `Series`, `DataFrame` |
| [physipandas](https://github.com/mocquin/physipandas) | `physipy` | `Series`, `DataFrame` |
| [composeml](https://github.com/alteryx/compose) | `slice` | `DataFrame` |
| [datatest](https://datatest.readthedocs.io/en/stable/) | `validate` | `Series`, `DataFrame` |
| [composeml](https://github.com/alteryx/compose) | `slice` | `DataFrame` |
| [gurobipy-pandas](https://github.com/Gurobi/gurobipy-pandas) | `gppd` | `Series`, `DataFrame` |
| [staircase](https://www.staircase.dev/) | `sc` | `Series`, `DataFrame` |
| [woodwork](https://github.com/alteryx/woodwork) | `slice` | `Series`, `DataFrame` |
Expand Down