-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEPR: remove use of nan_as_null
from callers of __dataframe__
#54846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -893,8 +893,8 @@ def __dataframe__( | |
Parameters | ||
---------- | ||
nan_as_null : bool, default False | ||
Whether to tell the DataFrame to overwrite null values in the data | ||
with ``NaN`` (or ``NaT``). | ||
`nan_as_null` is DEPRECATED and has no effect. Please avoid using | ||
it; it will be removed in a future release (after Aug 2024). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The date is wrong, this will be removed in 3.0 which will happen before that. Also: This needs a FutureWarning if a value is passed by the user and we need tests for that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intent here was to keep it around for a year in all dataframe libraries, to avoid unnecessary breakage. This works best if we remove usage of the keyword from all callers of Is there a reason to not do this? Removing it sooner seems to not have any upsides, only more disruption. I can remove the word "deprecated" from the PR if it bothers you?
This is not a good idea. The end user can do nothing about this. This is a between-dataframes communication mechanism, so emitting user-facing warnings has no purpose. It's also not a behavior change but a removal of the keyword without any observable changes to how There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't do API breaking changes in minor versions, so we can remove it in 3.0 or 4.0 (I don't have a preference), but not in a minor version in between (the date and lack of a warning bothers me, deprecated itself is fine). You can start with a It is part of our API docs (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.__dataframe__.html), so there is no guarantee that it is only used by library developers and we have to consider this a public method. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey all I've looked at this a bit more - it's true that technically it may be meant to only be used between dataframe libraries in
So, I'd suggest treating this as any other public-facing pandas method, and going through the usual deprecation process In any case, I've had a look at https://github.com/search?q=__dataframe__%28&type=code&p=1, and it doesn't look like there's many users of
generic communication comment - could we not make such statements about maintainers' reviews please? thanks 🙏 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apologies for the too brief and too certain statement, I should have written "I don't think it is a good idea to emit ...". I will note that the whole premise of the upstream change and the proposal by @jorisvandenbossche at data-apis/dataframe-api#125 was that it can be removed because no one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks Yeah I think that might be best to be honest - remove it from implementations and the official spec, and then do a proper deprecation cycle in pandas. There's a general desire in pandas to be stricter about the process around breaking changes now that there's a yearly release cycle, and I don't think that making an exception for
I don't think it needs closing - I think we can already note that the argument is deprecated and will be removed, and then, once the above has been completed, we follow the pandas deprecation process. So as this lives in pandas, then for better or for worse, that is what we're bound by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @phofl gentle ping - OK to get this docs change in now and then do the usual deprecation once other implementations have been updated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, we do deviate from our standard backwards compatibility / deprecation policies for experimental features. While we didn't explicitly label this one as such (unfortunately), I think we could decide to treat this method as experimental, though. (we also already noted in our public docstring that this keyword has no effect) To be clear, I also don't mind doing a longer deprecation cycle if we really want to, on our side. In the end it's just keeping some code that does nothing around for a bit longer, so it's not that this takes much effort to do. But I think we can also just remove it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at the actual diff of the PR now: I think this is indeed fine to merge as is (maybe just remove the explicit date when it will be removed, for now). All it does is update the documentation to be clearer that this keyword doesn't do anything and is deprecated, and clean up the internals to already remove it internally (which has no user visible impact).
MarcoGorelli marked this conversation as resolved.
Show resolved
Hide resolved
|
||
allow_copy : bool, default True | ||
Whether to allow memory copying when exporting. If set to False | ||
it would cause non-zero-copy exports to fail. | ||
|
@@ -909,9 +909,6 @@ def __dataframe__( | |
Details on the interchange protocol: | ||
https://data-apis.org/dataframe-protocol/latest/index.html | ||
|
||
`nan_as_null` currently has no effect; once support for nullable extension | ||
dtypes is added, this value should be propagated to columns. | ||
|
||
Examples | ||
-------- | ||
>>> df_not_necessarily_pandas = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) | ||
|
@@ -931,7 +928,7 @@ def __dataframe__( | |
|
||
from pandas.core.interchange.dataframe import PandasDataFrameXchg | ||
|
||
return PandasDataFrameXchg(self, nan_as_null, allow_copy) | ||
return PandasDataFrameXchg(self, allow_copy=allow_copy) | ||
|
||
def __dataframe_consortium_standard__( | ||
self, *, api_version: str | None = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.