-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame constructor reordering elements with ndarray from datetime dtype not datetime64[ns] #39442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…time dtype not datetime64[ns]
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -222,7 +222,7 @@ def ensure_datetime64ns(arr: ndarray, copy: bool=True): | |||
dtype = arr.dtype | |||
arr = arr.astype(dtype.newbyteorder("<")) | |||
|
|||
ivalues = arr.view(np.int64).ravel("K") | |||
ivalues = arr.view(np.int64).ravel("C") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to make a copy if arr has order="F" isnt it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah unfortunately, but not sure how we can avoid this here. Could transpose back before passing in, but I think that is not an ideal solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could arr.flat be useful here?
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -222,7 +222,7 @@ def ensure_datetime64ns(arr: ndarray, copy: bool=True): | |||
dtype = arr.dtype | |||
arr = arr.astype(dtype.newbyteorder("<")) | |||
|
|||
ivalues = arr.view(np.int64).ravel("K") | |||
ivalues = arr.view(np.int64).ravel("C") | |||
|
|||
result = np.empty(shape, dtype=DT64NS_DTYPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make a difference it we constructed this with result = np.empty_like(arr, dtype=...)
to ensure we keep the same contiguity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, fantastic. Thanks very much
@@ -1762,6 +1762,39 @@ def test_constructor_datetimes_with_nulls(self, arr): | |||
expected = Series([np.dtype("datetime64[ns]")]) | |||
tm.assert_series_equal(result, expected) | |||
|
|||
@pytest.mark.parametrize("order", ["K", "A", "C", "F"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do the same with timedelta? not sure if that is broken as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses numpy astype under the hood, is fine. Added tests.
Is there a way to create a df without a numpy or pandas array which has a dtype other than timedelta64[ns]?
I casted back to compare them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm not sure i understand what you are asking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, here an example:
expected = DataFrame(
[
[Timedelta(days=1), Timedelta(days=2)],
[Timedelta(days=3), Timedelta(days=4)]
], dtype="timedelta64[ms]"
)
returns
0 1
0 86400000.0 172800000.0
1 259200000.0 345600000.0
with dtype float
while
na = np.array(
[
[np.timedelta64(1, 'D'), np.timedelta64(2, 'D')],
[np.timedelta64(4, 'D'), np.timedelta64(5, 'D')]
],
dtype="timedelta64[ms]",
)
df = DataFrame(na)
returns
0 1
0 1 days 2 days
1 4 days 5 days
where dtype is timedelta64[ms]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question is if we can create the expected in a way to have timedelta64[ms] without using numpy or pandas array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think so, no. if it were float or int we could use a memoryview directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting to timedelta64[ns] may seem a bit odd, but does what we want here, so I would keep the current layout. Thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah agree, i think what you did is fine for the tests.
thanks @phofl |
This was a regression, so I think worth backporting to 1.2.2. If we do that, we only need a small PR moving the whatsnew note to v1.2.2.rst (apart from the actual backport) |
sure we could backport @phofl if u wouldn't mind putting up a backport PR (need to edit the note) and then separate PR to move the note in master |
…time dtype not datetime64[ns] (pandas-dev#39442) (cherry picked from commit 624621b)
Thanks @phofl ! |
The ravel("K") was a problem when passing in a view on the underlying object, which was done here.