Skip to content

POC of PDEP-9 (I/O plugins) #53005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
11 changes: 11 additions & 0 deletions pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,17 @@
del get_versions, v


# load I/O plugins
from importlib.metadata import entry_points
for dataframe_io_entry_point in entry_points().get("dataframe.io", []):
io_plugin = dataframe_io_entry_point.load()
if hasattr(io_plugin, "read"):
globals()[f"read_{dataframe_io_entry_point.name}"] = io_plugin.read
if hasattr(io_plugin, "write"):
setattr(DataFrame, f"to_{dataframe_io_entry_point.name}", io_plugin.write)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that dataframe_io_entry_point.name isn't overwriting something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Probably some other checks could be useful too. So far my goal was more to show what PDEP-9 could imply in terms of code, as I don't understand all the opposition for what in my opinion is a small change with huge benefits. So I guess a MVP implementation can help undertand understand what are the implications to the PDEP. But fully agree we should raise if two installed packages use the same entrypoint name, iirc it's mentioned in the PDEP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

to be clear, my only opposition to pdep9 was in renaming the hugely established pd.read_csv (which is also the most visited page in the docs, according to the analytics you sent me)

adding an optional plugin system like this which allows third-party authors to develop readers/writers sounds like a net positive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying your position @MarcoGorelli.

I think that was great feedback. While it'd be nice to have a better I/O API IMHO, probably not worth the change, and in any case, that can be discussed separately, as it's independent and adds noise to the discussion about plugins.

I was concerned of adding to much stuff to the pandas already huge namespaces, but in a second thought, if we eventually move connectors like SAS or BigQuery to third-party projects, most users will probably end up having less connectors than now, not more.

del entry_points, dataframe_io_entry_point, io_plugin


# module level doc-string
__doc__ = """
pandas - a powerful data analysis and manipulation library for Python
Expand Down