Skip to content

BUG: DataFrame constructor reordering elements with ndarray from datetime dtype not datetime64[ns] #39442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ Datetimelike
- Bug in constructing a :class:`DataFrame` or :class:`Series` with mismatched ``datetime64`` data and ``timedelta64`` dtype, or vice-versa, failing to raise ``TypeError`` (:issue:`38575`, :issue:`38764`, :issue:`38792`)
- Bug in constructing a :class:`Series` or :class:`DataFrame` with a ``datetime`` object out of bounds for ``datetime64[ns]`` dtype or a ``timedelta`` object out of bounds for ``timedelta64[ns]`` dtype (:issue:`38792`, :issue:`38965`)
- Bug in :meth:`DatetimeIndex.intersection`, :meth:`DatetimeIndex.symmetric_difference`, :meth:`PeriodIndex.intersection`, :meth:`PeriodIndex.symmetric_difference` always returning object-dtype when operating with :class:`CategoricalIndex` (:issue:`38741`)
- Bug in :class:`DataFrame` constructor reordering element when construction from datetime ndarray with dtype not ``"datetime64[ns]"`` (:issue:`39422`)
- Bug in :meth:`Series.where` incorrectly casting ``datetime64`` values to ``int64`` (:issue:`37682`)
- Bug in :class:`Categorical` incorrectly typecasting ``datetime`` object to ``Timestamp`` (:issue:`38878`)
- Bug in comparisons between :class:`Timestamp` object and ``datetime64`` objects just outside the implementation bounds for nanosecond ``datetime64`` (:issue:`39221`)
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ def ensure_datetime64ns(arr: ndarray, copy: bool=True):

ivalues = arr.view(np.int64).ravel("K")

result = np.empty(shape, dtype=DT64NS_DTYPE)
result = np.empty_like(arr, dtype=DT64NS_DTYPE)
iresult = result.ravel("K").view(np.int64)

if len(iresult) == 0:
Expand Down
64 changes: 64 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -1762,6 +1762,70 @@ def test_constructor_datetimes_with_nulls(self, arr):
expected = Series([np.dtype("datetime64[ns]")])
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("order", ["K", "A", "C", "F"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do the same with timedelta? not sure if that is broken as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses numpy astype under the hood, is fine. Added tests.

Is there a way to create a df without a numpy or pandas array which has a dtype other than timedelta64[ns]?
I casted back to compare them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm not sure i understand what you are asking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, here an example:

expected = DataFrame(
    [
        [Timedelta(days=1), Timedelta(days=2)],
        [Timedelta(days=3), Timedelta(days=4)]
    ], dtype="timedelta64[ms]"
)

returns

             0            1
0   86400000.0  172800000.0
1  259200000.0  345600000.0

with dtype float
while

na = np.array(
    [
        [np.timedelta64(1, 'D'), np.timedelta64(2, 'D')],
        [np.timedelta64(4, 'D'), np.timedelta64(5, 'D')]
    ],
    dtype="timedelta64[ms]",
)
df = DataFrame(na)

returns

       0      1
0 1 days 2 days
1 4 days 5 days

where dtype is timedelta64[ms]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is if we can create the expected in a way to have timedelta64[ms] without using numpy or pandas array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think so, no. if it were float or int we could use a memoryview directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to timedelta64[ns] may seem a bit odd, but does what we want here, so I would keep the current layout. Thx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agree, i think what you did is fine for the tests.

@pytest.mark.parametrize(
"dtype",
[
"datetime64[M]",
"datetime64[D]",
"datetime64[h]",
"datetime64[m]",
"datetime64[s]",
"datetime64[ms]",
"datetime64[us]",
"datetime64[ns]",
],
)
def test_constructor_datetimes_non_ns(self, order, dtype):
na = np.array(
[
["2015-01-01", "2015-01-02", "2015-01-03"],
["2017-01-01", "2017-01-02", "2017-02-03"],
],
dtype=dtype,
order=order,
)
df = DataFrame(na)
expected = DataFrame(
[
["2015-01-01", "2015-01-02", "2015-01-03"],
["2017-01-01", "2017-01-02", "2017-02-03"],
]
)
expected = expected.astype(dtype=dtype)
tm.assert_frame_equal(df, expected)

@pytest.mark.parametrize("order", ["K", "A", "C", "F"])
@pytest.mark.parametrize(
"dtype",
[
"timedelta64[D]",
"timedelta64[h]",
"timedelta64[m]",
"timedelta64[s]",
"timedelta64[ms]",
"timedelta64[us]",
"timedelta64[ns]",
],
)
def test_constructor_timedelta_non_ns(self, order, dtype):
na = np.array(
[
[np.timedelta64(1, "D"), np.timedelta64(2, "D")],
[np.timedelta64(4, "D"), np.timedelta64(5, "D")],
],
dtype=dtype,
order=order,
)
df = DataFrame(na).astype("timedelta64[ns]")
expected = DataFrame(
[
[Timedelta(1, "D"), Timedelta(2, "D")],
[Timedelta(4, "D"), Timedelta(5, "D")],
],
)
tm.assert_frame_equal(df, expected)

def test_constructor_for_list_with_dtypes(self):
# test list of lists/ndarrays
df = DataFrame([np.arange(5) for x in range(5)])
Expand Down