Skip to content

REGR: roundtrip pandas<->pyarrow conversion for Period dtype broken for older pyarrow versions #45470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jan 19, 2022 · 2 comments · Fixed by #45524
Labels
Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

To have an issue tracking #44681 (comment)

Reproducible snippet:

import pyarrow as pa

df = pd.DataFrame({"col": pd.period_range("2012", periods=3)})
pa.table(df).to_pandas()

No longer works with pyarrow < 4

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Jan 19, 2022
@jorisvandenbossche jorisvandenbossche added this to the 1.4 milestone Jan 19, 2022
@mroeschke
Copy link
Member

Looks like we might have related xfailed tests already

@pytest.mark.xfail(
pa_version_under4p0, reason="pyarrow incorrectly uses pandas internals API"
)
def test_arrow_table_roundtrip():
from pandas.core.arrays._arrow_utils import ArrowPeriodType
arr = PeriodArray([1, 2, 3], freq="D")
arr[1] = pd.NaT
df = pd.DataFrame({"a": arr})
table = pa.table(df)
assert isinstance(table.field("a").type, ArrowPeriodType)
result = table.to_pandas()
assert isinstance(result["a"].dtype, PeriodDtype)
tm.assert_frame_equal(result, df)
table2 = pa.concat_tables([table, table])
result = table2.to_pandas()
expected = pd.concat([df, df], ignore_index=True)
tm.assert_frame_equal(result, expected)
@pytest.mark.xfail(
pa_version_under4p0, reason="pyarrow incorrectly uses pandas internals API"
)
def test_arrow_load_from_zero_chunks():
# GH-41040
from pandas.core.arrays._arrow_utils import ArrowPeriodType
arr = PeriodArray([], freq="D")
df = pd.DataFrame({"a": arr})
table = pa.table(df)
assert isinstance(table.field("a").type, ArrowPeriodType)
table = pa.table(
[pa.chunked_array([], type=table.column(0).type)], schema=table.schema
)
result = table.to_pandas()
assert isinstance(result["a"].dtype, PeriodDtype)
tm.assert_frame_equal(result, df)
@pytest.mark.xfail(
pa_version_under4p0, reason="pyarrow incorrectly uses pandas internals API"
)
def test_arrow_table_roundtrip_without_metadata():
arr = PeriodArray([1, 2, 3], freq="H")
arr[1] = pd.NaT
df = pd.DataFrame({"a": arr})
table = pa.table(df)
# remove the metadata
table = table.replace_schema_metadata()
assert table.schema.metadata is None
result = table.to_pandas()
assert isinstance(result["a"].dtype, PeriodDtype)
tm.assert_frame_equal(result, df)

@jorisvandenbossche
Copy link
Member Author

Yes, but that only silences the tests, we should still fix the regression itself :)

The fix should be relatively straightforward I think, see #44681 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants