Skip to content

BUG: during interchanging from non-pandas tz-aware data #54287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented Jul 28, 2023

xref #54246

Added a test for interchanging from non-pandas tz-aware.

EDIT: fixed bug which was raising during interchanging from non-pandas tz-aware data.


arr = pa.array([datetime(2020, 1, 1), None, datetime(2020, 1, 2)])
arr = pc.assume_timezone(arr, "Asia/Kathmandu")
result = pa.table({"arr": arr}).to_pandas()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait we need to go via the interchange protocol - could you use result = from_dataframe(pa.table(...))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, my mistake. I'll do that via the interchange protocol.


arr = pa.array([datetime(2020, 1, 1), None, datetime(2020, 1, 2)])
arr = pc.assume_timezone(arr, "Asia/Kathmandu")
result = from_dataframe(pa.table({"arr": arr}).to_pandas())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interchange protocol means we can avoid to_pandas calls - so long as __dataframe__ is available (which it is for pyarrow tables) we can just do:

Suggested change
result = from_dataframe(pa.table({"arr": arr}).to_pandas())
result = from_dataframe(pa.table({"arr": arr}))

Copy link
Contributor Author

@natmokval natmokval Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange, it doesn't work for me. If I chage the line 307 to result = from_dataframe(pa.table({"arr": arr})), I get an error

pandas.errors.SettingWithCopyError: modifications to a method of a datetimelike object are not supported and are discarded. Change values on the original.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - that looks like a warning which should be suppressed somewhere

could you please:

  • in this test, just add with tm.assert_produces_warning to silence it and get the test to pass
  • open a new issue about how the snippet above produces a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'll do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems I can’t silence it, because it’s an error. Looks like it comes from the method _delegate_method. I corrected the test, now it checks this error.


def test_interchange_from_non_pandas_tz_aware():
# GH 54239
import pyarrow as pa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import pyarrow as pa
pa = pytest.importorskip("pyarrow")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I tried it, but still have the same error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will help CI builds that don't have pyarrow installed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, that is the reason why many checks failed

exchange_df = table.__dataframe__()

msg = "modifications to a method of a datetimelike object are not supported."
with pytest.raises(SettingWithCopyError, match=msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be raising an error though

In

        try:
            data[null_pos] = None
        except TypeError:
            # TypeError happens if the `data` dtype appears to be non-nullable
            # in numpy notation (bool, int, uint). If this happens,
            # cast the `data` to nullable float dtype.
            data = data.astype(float)
            data[null_pos] = None

, could we add an extra except, which catches SettingWithCopyError, and in that case makes a copy of data (data = data.copy()) and then sets the nulls (data[null_pos] = None)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I added the extra except, it works very well. I updated my test and added a line to whatsnew/v2.1.0.rst as well.

@natmokval natmokval changed the title TST: add a test for interchanging from non-pandas tz-aware BUG: during interchanging from non-pandas tz-aware data Jul 31, 2023

def test_interchange_from_non_pandas_tz_aware():
# GH 54239, 54287
pa = pytest.importorskip("pyarrow")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need to set a minimum version here

Suggested change
pa = pytest.importorskip("pyarrow")
pa = pytest.importorskip("pyarrow", "11.0.0")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I added the minimum version here

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, but the rest looks good to me!

@@ -682,6 +682,7 @@ Other
- Bug in :class:`DataFrame` and :class:`Series` raising for data of complex dtype when ``NaN`` values are present (:issue:`53627`)
- Bug in :class:`DatetimeIndex` where ``repr`` of index passed with time does not print time is midnight and non-day based freq(:issue:`53470`)
- Bug in :class:`FloatingArray.__contains__` with ``NaN`` item incorrectly returning ``False`` when ``NaN`` values are present (:issue:`52840`)
- Bug in :func:`api.interchange.from_dataframe` was raising during interchanging from non-pandas tz-aware data containing ``NaN`` values (:issue:`54287`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write

non-pandas tz-aware data containing null values

?
technically nan is different, and specifically refers to invalid mathematical operations like 0/0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, my mistake. I corrected that.

Comment on lines 518 to 519
# SettingWithCopyError happens if the `data` appears
# to have 'NaT'. If this happens, copy the `data`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"appears to" sounds a bit mysterious :) how about

`SettingWithCopyError` may happen for datetime-like with missing values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I changed the wording as you suggested.

@natmokval natmokval marked this pull request as ready for review July 31, 2023 18:45
@MarcoGorelli MarcoGorelli added this to the 2.1 milestone Jul 31, 2023
@MarcoGorelli MarcoGorelli added the Interchange Dataframe Interchange Protocol label Jul 31, 2023
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! looks good to me on green

@mroeschke mroeschke merged commit 329fe8f into pandas-dev:main Jul 31, 2023
@mroeschke
Copy link
Member

Thanks @natmokval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interchange Dataframe Interchange Protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants