-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: convert_dtypes(dtype_backend="pyarrow")
converts "timestamp[ms][pyarrow]"
to "timestamp[ns][pyarrow]"
#54191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am volunteering to fix the reported issue with |
take |
I can confirm the issue being found in pandas version 2.0.3, but in the current dev version (2.1.0) I could not reproduce the issue. Will be investigating how it gets fixed. |
You can try to convert the version to 2.0.3 (or other version which bug exists), maybe using git to checkout specific tag. Then add a test to make sure the issue doesn't exist anymore. |
So the issue in version 2.0.3 was a result of pyarrow interacting with to_pandas_dtype() incorrectly and always outputting ns for the timestamp datatype regardless of the associated metadata. In the current main branch this error no longer exists as if conditionals were added within the inference aspect of the convert_dtypes() function to work around the pyarrow issue. Code in question linked [here]( pandas/pandas/core/dtypes/dtypes.py Lines 2129 to 2142 in 4f55280
Pyarrow has since fixed and closed the issue in their current main branch (dev version 13.0) and I have tested it to make sure that using version dev 13 those if conditions could be removed without affecting program execution outcome. They could, and so when pyarrow version 13 is released it should be good to remove those if conditions. I also suggest this issue be tagged with "Arrow". Will be writing the pytest tests for this. |
…timestamp/duration to and from pyarrow (pandas-dev#54356) * add pyarrow conversion pytests * make test match GH issue and move test to proper location * fix naming of expected and result variables for assertion * optimize test * make test mirror GH pandas-dev#54191 * fix test logic
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
raises
Issue Description
"timestamp[ms][pyarrow]"
is already a valid nullable pyarrow-type, thereforeconvert_dtypes(dtype_backend="pyarrow")
should leave it unaffected.Expected Behavior
convert_dtypes(dtype_backend="pyarrow")
should probably be NOOP for series that are alreadypyarrow
data types.Installed Versions
The text was updated successfully, but these errors were encountered: