Skip to content

BUG: Assigning datetime to pa.date32() type array should raise #58420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
WillAyd opened this issue Apr 25, 2024 · 5 comments
Closed
3 tasks done

BUG: Assigning datetime to pa.date32() type array should raise #58420

WillAyd opened this issue Apr 25, 2024 · 5 comments
Assignees
Labels
Arrow pyarrow functionality Bug Datetime Datetime data dtype

Comments

@WillAyd
Copy link
Member

WillAyd commented Apr 25, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pyarrow as pa

pa_arr = pa.array([
    datetime.date(2024, 1, 1),
    datetime.date(2024, 1, 2),
    datetime.date(2024, 1, 3),
])
ser = pd.Series(pa_arr, dtype=pd.ArrowDtype(pa.date32()))
ser.iloc[0] =  datetime.datetime(2024, 12, 31, 12, 20, 0)

Issue Description

Assigning a datetime value to a pyarrow date array type seems to implicitly drop the time components

Expected Behavior

Should raise TypeError

Installed Versions

In [24]: pd.version
Out[24]: '3.0.0.dev0+681.g434fda08cf'

In [25]: pa.version
Out[25]: '15.0.0'

@WillAyd WillAyd added Bug Datetime Datetime data dtype Arrow pyarrow functionality labels Apr 25, 2024
@udit5656
Copy link

take

@PedroVerardo
Copy link

take

@PedroVerardo
Copy link

Hi @WillAyd, I'm new to the pandas repository, and I think I've found the problem with this issue. However, I don't know exactly how to change this transformation in the code.

This happens in the ArrowExtensionArray, on the function _box_pa_scalar

pa_scalar = pa.scalar(value, type=pa_type, from_pandas=True)

The only possibility I see is to remove this and raise an error

@PedroVerardo
Copy link

I made a change that solves the issue specifically in the "Reproducible Example," but I think there are some similar problems that need to be solved. It is necessary to explore this problem more.

For example, if you do the opposite, as you do in this example, the same thing happens, but this time, 'ns' will be inferred.

import pandas as pd
import pyarrow as pa
import datetime

pa_arr = pa.array([
    datetime.datetime(2024, 1, 1, 1, 1, 1),
    datetime.datetime(2024, 4, 5, 1, 1, 1),
    datetime.datetime(2024, 6, 7, 1, 1, 1),
])
ser = pd.Series(pa_arr, dtype=pd.ArrowDtype(pa.timestamp('ns')))
ser.iloc[0] = datetime.datetime(2024, 12, 31)
print(ser)

If needed, I would love to help @WillAyd.

@WillAyd
Copy link
Member Author

WillAyd commented May 31, 2024

Looks like this is an issue upstream in arrow - see apache/arrow#41896

@WillAyd WillAyd closed this as completed May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants