You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As per HyukjinKwon request on #38312 to backport fix into 3.3
### What changes were proposed in this pull request?
Handle `TimeUnit.NANOS` for parquet `Timestamps` addressing a regression in behaviour since 3.2
### Why are the changes needed?
Since version 3.2 reading parquet files that contain attributes with type `TIMESTAMP(NANOS,true)` is not possible as ParquetSchemaConverter returns
```
Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true))
```
https://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the `LogicalTypeAnnotation` which only covers Timestamp cases for `TimeUnit.MILLIS` and `TimeUnit.MICROS` meaning `TimeUnit.NANOS` would return `illegalType()`
Prior to 3.2 the matching used the `originalType` which for `TIMESTAMP(NANOS,true)` return `null` and therefore resulted to a `LongType`, the change proposed is too consider `TimeUnit.NANOS` and return `LongType` making behaviour the same as before.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added unit test covering this scenario.
Internally deployed to read parquet files that contain `TIMESTAMP(NANOS,true)`
Closes#39904 from awdavidson/ts-nanos-fix-3.3.
Lead-authored-by: alfreddavidson <[email protected]>
Co-authored-by: awdavidson <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Copy file name to clipboardExpand all lines: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
0 commit comments