TST: Test non-nanosecond datetimes in PyArrow Parquet dataframes #59393

EduardAkhmetshin · 2024-08-03T13:04:06Z

This PR tests that the current version of pandas supports non-nanosecond PyArrow dataframes without using timestamp_as_object param.

closes ENH: Expose to_pandas_kwargs in read_parquet for pyarrow engine #49236
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

EduardAkhmetshin · 2024-08-03T13:09:07Z

@natmokval @WillAyd Please review when you have time. Thank you!

EduardAkhmetshin · 2024-08-03T18:03:26Z

All tests pass in my Docker container, but the CI build fails. I'll look into it.
Update: I see other people's PRs fail with similar errors. Is there a problem with CI?

WillAyd

Thanks for the PR, though I'm wondering if this is that much different than the test_read_dtype_backend_pyarrow_config that already exists?

@jbrockmendel does the most with datetimes so may want to chime in here as well

pandas/tests/io/test_parquet.py

WillAyd · 2024-08-03T19:37:20Z

Also this does potentially solve the example from the OP in #49236 , although we may not want to make this as closing that. Not sure if there is a need for the requested keyword beyond this one example

EduardAkhmetshin · 2024-08-03T19:51:18Z

@WillAyd Thank you very much for your review!

I'm wondering if this is that much different than the test_read_dtype_backend_pyarrow_config that already exists?

While test_read_dtype_backend_pyarrow_config tests microseconds as well, it uses a datetime that is in the normal nanosecond date range (1678 - 2262 as mentioned in PyArrow docs). My test uses the date 1600-01-01 from the issue, which is outside the normal range.
Let me know if you think it's redundant and test_read_dtype_backend_pyarrow_config covers everything.

WillAyd · 2024-08-03T19:52:43Z

Sounds good and thanks for explaining. I think it's ok to add - generally more tests are good.

pandas/tests/io/test_parquet.py

WillAyd · 2024-08-05T18:16:02Z

Whoops sorry for the bad guidance on ensure_clean()

EduardAkhmetshin · 2024-08-06T20:14:21Z

@mroeschke thanks for the suggestions!

pandas/tests/io/test_parquet.py

Co-authored-by: Matthew Roeschke <[email protected]>

…as into pandas-devgh-49236

EduardAkhmetshin · 2024-08-09T15:58:21Z

In the failed environment, pandas 1.5.3 is used, which fails with the same error from the original issue ticket.
Is using an old version of pandas in that environment intended for testing backwards compatibility? Should I somehow isolate my test to run in pandas 2 environments only? @WillAyd

EduardAkhmetshin · 2024-08-19T16:58:08Z

@mroeschke could you please help with my question above? Thanks!

mroeschke · 2024-08-19T17:00:24Z

You'll probably need specify the minversion argument in pytest.importorskip("pyarrow") call. There's probably a version of pyarrow where this behavior was fixed

mroeschke · 2024-08-19T19:32:50Z

Thanks @EduardAkhmetshin

EduardAkhmetshin added 2 commits August 3, 2024 13:43

Add test_non_nanosecond_timestamps

38308e7

Fix lint errors

e3bc03e

WillAyd requested changes Aug 3, 2024

View reviewed changes

pandas/tests/io/test_parquet.py Outdated Show resolved Hide resolved

pandas/tests/io/test_parquet.py Outdated Show resolved Hide resolved

Remove detailed comment; use temp path helper

43b0560

WillAyd approved these changes Aug 3, 2024

View reviewed changes

mroeschke reviewed Aug 5, 2024

View reviewed changes

pandas/tests/io/test_parquet.py Outdated Show resolved Hide resolved

mroeschke reviewed Aug 5, 2024

View reviewed changes

pandas/tests/io/test_parquet.py Outdated Show resolved Hide resolved

mroeschke added the Testing pandas testing functions or related to the test suite label Aug 5, 2024

Use temp_file

cff22d4

EduardAkhmetshin requested a review from mroeschke August 6, 2024 20:13

mroeschke reviewed Aug 6, 2024

View reviewed changes

pandas/tests/io/test_parquet.py Outdated Show resolved Hide resolved

EduardAkhmetshin and others added 4 commits August 9, 2024 14:31

Merge remote-tracking branch 'upstream/main' into pandas-devgh-49236

c2ab34c

Apply suggestions from code review

7c70df4

Co-authored-by: Matthew Roeschke <[email protected]>

Merge branch 'pandas-devgh-49236' of github.com:EduardAkhmetshin/pand…

78e5b4e

…as into pandas-devgh-49236

Remove extra empty line

9256148

EduardAkhmetshin requested a review from mroeschke August 9, 2024 13:42

EduardAkhmetshin added 2 commits August 19, 2024 18:51

Merge remote-tracking branch 'upstream/main' into pandas-devgh-49236

5f462b7

Skip test in minimal version env

8ca1aaf

mroeschke added this to the 3.0 milestone Aug 19, 2024

mroeschke approved these changes Aug 19, 2024

View reviewed changes

mroeschke merged commit 3097190 into pandas-dev:main Aug 19, 2024
39 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Test non-nanosecond datetimes in PyArrow Parquet dataframes #59393

TST: Test non-nanosecond datetimes in PyArrow Parquet dataframes #59393

EduardAkhmetshin commented Aug 3, 2024

EduardAkhmetshin commented Aug 3, 2024

EduardAkhmetshin commented Aug 3, 2024 •

edited

Loading

WillAyd left a comment

WillAyd commented Aug 3, 2024

EduardAkhmetshin commented Aug 3, 2024

WillAyd commented Aug 3, 2024

WillAyd commented Aug 5, 2024

EduardAkhmetshin commented Aug 6, 2024

EduardAkhmetshin commented Aug 9, 2024 •

edited

Loading

EduardAkhmetshin commented Aug 19, 2024

mroeschke commented Aug 19, 2024

mroeschke commented Aug 19, 2024

TST: Test non-nanosecond datetimes in PyArrow Parquet dataframes #59393

TST: Test non-nanosecond datetimes in PyArrow Parquet dataframes #59393

Conversation

EduardAkhmetshin commented Aug 3, 2024

EduardAkhmetshin commented Aug 3, 2024

EduardAkhmetshin commented Aug 3, 2024 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd commented Aug 3, 2024

EduardAkhmetshin commented Aug 3, 2024

WillAyd commented Aug 3, 2024

WillAyd commented Aug 5, 2024

EduardAkhmetshin commented Aug 6, 2024

EduardAkhmetshin commented Aug 9, 2024 • edited Loading

EduardAkhmetshin commented Aug 19, 2024

mroeschke commented Aug 19, 2024

mroeschke commented Aug 19, 2024

EduardAkhmetshin commented Aug 3, 2024 •

edited

Loading

EduardAkhmetshin commented Aug 9, 2024 •

edited

Loading