Skip to content

BUG: during interchanging from non-pandas tz-aware data #54287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -682,6 +682,7 @@ Other
- Bug in :class:`DataFrame` and :class:`Series` raising for data of complex dtype when ``NaN`` values are present (:issue:`53627`)
- Bug in :class:`DatetimeIndex` where ``repr`` of index passed with time does not print time is midnight and non-day based freq(:issue:`53470`)
- Bug in :class:`FloatingArray.__contains__` with ``NaN`` item incorrectly returning ``False`` when ``NaN`` values are present (:issue:`52840`)
- Bug in :func:`api.interchange.from_dataframe` was raising during interchanging from non-pandas tz-aware data containing ``NaN`` values (:issue:`54287`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write

non-pandas tz-aware data containing null values

?
technically nan is different, and specifically refers to invalid mathematical operations like 0/0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, my mistake. I corrected that.

- Bug in :func:`api.interchange.from_dataframe` when converting an empty DataFrame object (:issue:`53155`)
- Bug in :func:`assert_almost_equal` now throwing assertion error for two unequal sets (:issue:`51727`)
- Bug in :func:`assert_frame_equal` checks category dtypes even when asked not to check index type (:issue:`52126`)
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/interchange/from_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import numpy as np

from pandas.compat._optional import import_optional_dependency
from pandas.errors import SettingWithCopyError

import pandas as pd
from pandas.core.interchange.dataframe_protocol import (
Expand Down Expand Up @@ -513,5 +514,10 @@ def set_nulls(
# cast the `data` to nullable float dtype.
data = data.astype(float)
data[null_pos] = None
except SettingWithCopyError:
# SettingWithCopyError happens if the `data` appears
# to have 'NaT'. If this happens, copy the `data`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"appears to" sounds a bit mysterious :) how about

`SettingWithCopyError` may happen for datetime-like with missing values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I changed the wording as you suggested.

data = data.copy()
data[null_pos] = None

return data
19 changes: 19 additions & 0 deletions pandas/tests/interchange/test_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -295,3 +295,22 @@ def test_datetimetzdtype(tz, unit):
)
df = pd.DataFrame({"ts_tz": tz_data})
tm.assert_frame_equal(df, from_dataframe(df.__dataframe__()))


def test_interchange_from_non_pandas_tz_aware():
# GH 54239, 54287
pa = pytest.importorskip("pyarrow")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need to set a minimum version here

Suggested change
pa = pytest.importorskip("pyarrow")
pa = pytest.importorskip("pyarrow", "11.0.0")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I added the minimum version here

import pyarrow.compute as pc

arr = pa.array([datetime(2020, 1, 1), None, datetime(2020, 1, 2)])
arr = pc.assume_timezone(arr, "Asia/Kathmandu")
table = pa.table({"arr": arr})
exchange_df = table.__dataframe__()
result = from_dataframe(exchange_df)

expected = pd.DataFrame(
["2020-01-01 00:00:00+05:45", "NaT", "2020-01-02 00:00:00+05:45"],
columns=["arr"],
dtype="datetime64[us, Asia/Kathmandu]",
)
tm.assert_frame_equal(expected, result)