BUG: avoid specifying default coerce_timestamps in to_parquet #31652

jorisvandenbossche · 2020-02-04T15:09:16Z

Looking into the usage question of #31572, I noticed that specifying the version to allow writing nanoseconds to parquet worked in plain pyarrow code, but not with pandas' to_parquet.
This is because we hardcode coerce_timestamps="ms" while the default is None, which has version-dependent behaviour (eg if version="2.0", actually write the nanosecond data)

TomAugspurger · 2020-02-04T15:34:21Z

pandas/tests/io/test_parquet.py

+    @td.skip_if_no("pyarrow", min_version="0.14")
+    def test_timestamp_nanoseconds(self, pa):
+        # with version 2.0, pyarrow defaults to writing the nanoseconds, so
+        # this should work with error


Should this say "work without error"?

yes, of course ... !

doc/source/whatsnew/v1.1.0.rst

Co-Authored-By: Tom Augspurger <[email protected]>

jreback · 2020-02-05T01:09:03Z

thanks @jorisvandenbossche

mnylen · 2020-10-01T09:34:19Z

For us this change caused a nasty bug after upgrading from pandas 1.0.3 to 1.1.x. Apparently AWS Athena/Presto doesn't support nanosecond precision, so our timestamps started appearing with year 52000 when we created the parquet files using pandas 1.1.x. Just FYI for anyone else having the same issue. Adding coerce_timestamps="ms" (the previous default) to the to_parquet() call fixes the issue.

findepi · 2020-10-01T20:28:55Z

@mnylen FWIW, since recently Presto does support nanosecond precision for date/time types, including timestamps. (see trinodb/trino#1284 for more info).
However, this seems to be a different problem. If you can time, would you be able to answer a question in trinodb/trino#4662 (comment) so that we know what needs to be improved?

mnylen · 2020-10-03T05:07:49Z

@findepi I mentioned Presto only because AWS Athena uses that underneath. Athena is based on Presto 0.172, so it's likely that even if Presto nowadays supports nanoseconds for timestamps, it'll take a long time before Athena upgrades to the newer version.

BUG: avoid specifying default coerce_timestamps in to_parquet

15181f9

jorisvandenbossche added Bug IO Parquet parquet, feather labels Feb 4, 2020

jorisvandenbossche added this to the 1.1 milestone Feb 4, 2020

TomAugspurger reviewed Feb 4, 2020

View reviewed changes

type + whatsnew

a0921b8

TomAugspurger reviewed Feb 4, 2020

View reviewed changes

doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved

TomAugspurger approved these changes Feb 4, 2020

View reviewed changes

Update doc/source/whatsnew/v1.1.0.rst

e659169

Co-Authored-By: Tom Augspurger <[email protected]>

jreback merged commit be9ee6d into pandas-dev:master Feb 5, 2020

jorisvandenbossche deleted the parquet-timestamp branch February 5, 2020 07:04

alimcmaster1 mentioned this pull request Apr 19, 2020

how to keep nanosecond timestamps? #32447

Closed

tooptoop4 mentioned this pull request Aug 1, 2020

presto returns wrong values for timestamp fields in parquet files trinodb/trino#4662

Open

jorisvandenbossche mentioned this pull request Sep 10, 2020

BUG: Reading from parquet throws UnknownTimeZoneError using timezone-aware date in index #35997

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: avoid specifying default coerce_timestamps in to_parquet #31652

BUG: avoid specifying default coerce_timestamps in to_parquet #31652

jorisvandenbossche commented Feb 4, 2020

TomAugspurger Feb 4, 2020

jorisvandenbossche Feb 4, 2020

jreback commented Feb 5, 2020

mnylen commented Oct 1, 2020 •

edited

Loading

findepi commented Oct 1, 2020

mnylen commented Oct 3, 2020

BUG: avoid specifying default coerce_timestamps in to_parquet #31652

BUG: avoid specifying default coerce_timestamps in to_parquet #31652

Conversation

jorisvandenbossche commented Feb 4, 2020

TomAugspurger Feb 4, 2020

Choose a reason for hiding this comment

jorisvandenbossche Feb 4, 2020

Choose a reason for hiding this comment

jreback commented Feb 5, 2020

mnylen commented Oct 1, 2020 • edited Loading

findepi commented Oct 1, 2020

mnylen commented Oct 3, 2020

mnylen commented Oct 1, 2020 •

edited

Loading