-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: update feather IO for pyarrow 0.17 / Feather V2 #33422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: update feather IO for pyarrow 0.17 / Feather V2 #33422
Conversation
@@ -102,8 +104,8 @@ def test_read_columns(self): | |||
|
|||
def test_unsupported_other(self): | |||
|
|||
# period | |||
df = pd.DataFrame({"a": pd.period_range("2013", freq="M", periods=3)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since feather now exactly maps to the Arrow memory, periods are now supported (since Period is supported in the pandas->pyarrow.Table conversion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that answers my question above, never mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a whatsnew note, and a small comment on the docs in io.rst. LGTM otherwise.
doc/source/user_guide/io.rst
Outdated
* The format will NOT write an ``Index``, or ``MultiIndex`` for the | ||
``DataFrame`` and will raise an error if a non-default one is provided. You | ||
can ``.reset_index()`` to store the index or ``.reset_index(drop=True)`` to | ||
ignore it. | ||
* Duplicate column names and non-string columns names are not supported | ||
* Non supported types include ``Period`` and actual Python object types. These will raise a helpful error message | ||
* Non supported types actual Python object types. These will raise a helpful error message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a word here. Maybe
* Non supported types actual Python object types. These will raise a helpful error message | |
* object-dtype columns are not supported. This will raise with a helpful error message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my edification, does this mean that PeriodIndex or Series[Period] is supported? If so, is that a change from the older version?
@@ -2058,18 +2058,24 @@ def to_stata( | |||
writer.write_file() | |||
|
|||
@deprecate_kwarg(old_arg_name="fname", new_arg_name="path") | |||
def to_feather(self, path) -> None: | |||
def to_feather(self, path, **kwargs) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to make these explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to make these explicit?
It might change with the pyarrow version, needing us to each time update if other keywords get added. Passing through kwargs makes this more "future-robust".
But I could make the ones that there are now explicit. However, that also means that we need to check the pyarrow version to give a nice error message to say which keyword is not yet supported with the older pyarrow versions (which is of course not that difficult)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. maybe a whatsnew note, otherwise merge when ready.
Hi, is feather v2 now supported by pandas? It seems tag 1.0.4 but I cannot find it in the release note, thanks! the current to_feather() in pandas 1.0.5 seems does not support the compression that was introduced in feather V2 as well. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_feather.html |
this is in 1.1 |
@noklam Feather V2 is already supported by pandas 1.0.4, as long as you have pyarrow>=0.17 installed. |
Thanks! Got it. Looking forward to release 1.1 😀 |
Upcoming pyarrow 0.17 release will include an upgraded feather format.
This PRs updates pandas for that, more specifically ensures the new keywords can be passed through (the basics should keep working out of the box, since the public API did not change), and small update to the tests