Skip to content

BUG: DataFrame constructor reordering elements with ndarray from datetime dtype not datetime64[ns] #39442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 28, 2021

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 27, 2021

The ravel("K") was a problem when passing in a view on the underlying object, which was done here.

@phofl phofl added Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure labels Jan 27, 2021
@@ -222,7 +222,7 @@ def ensure_datetime64ns(arr: ndarray, copy: bool=True):
dtype = arr.dtype
arr = arr.astype(dtype.newbyteorder("<"))

ivalues = arr.view(np.int64).ravel("K")
ivalues = arr.view(np.int64).ravel("C")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to make a copy if arr has order="F" isnt it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately, but not sure how we can avoid this here. Could transpose back before passing in, but I think that is not an ideal solution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could arr.flat be useful here?

@@ -222,7 +222,7 @@ def ensure_datetime64ns(arr: ndarray, copy: bool=True):
dtype = arr.dtype
arr = arr.astype(dtype.newbyteorder("<"))

ivalues = arr.view(np.int64).ravel("K")
ivalues = arr.view(np.int64).ravel("C")

result = np.empty(shape, dtype=DT64NS_DTYPE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make a difference it we constructed this with result = np.empty_like(arr, dtype=...) to ensure we keep the same contiguity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, fantastic. Thanks very much

@jreback jreback added this to the 1.3 milestone Jan 28, 2021
@@ -1762,6 +1762,39 @@ def test_constructor_datetimes_with_nulls(self, arr):
expected = Series([np.dtype("datetime64[ns]")])
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("order", ["K", "A", "C", "F"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do the same with timedelta? not sure if that is broken as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses numpy astype under the hood, is fine. Added tests.

Is there a way to create a df without a numpy or pandas array which has a dtype other than timedelta64[ns]?
I casted back to compare them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm not sure i understand what you are asking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, here an example:

expected = DataFrame(
    [
        [Timedelta(days=1), Timedelta(days=2)],
        [Timedelta(days=3), Timedelta(days=4)]
    ], dtype="timedelta64[ms]"
)

returns

             0            1
0   86400000.0  172800000.0
1  259200000.0  345600000.0

with dtype float
while

na = np.array(
    [
        [np.timedelta64(1, 'D'), np.timedelta64(2, 'D')],
        [np.timedelta64(4, 'D'), np.timedelta64(5, 'D')]
    ],
    dtype="timedelta64[ms]",
)
df = DataFrame(na)

returns

       0      1
0 1 days 2 days
1 4 days 5 days

where dtype is timedelta64[ms]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is if we can create the expected in a way to have timedelta64[ms] without using numpy or pandas array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think so, no. if it were float or int we could use a memoryview directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to timedelta64[ns] may seem a bit odd, but does what we want here, so I would keep the current layout. Thx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agree, i think what you did is fine for the tests.

@jreback jreback merged commit 624621b into pandas-dev:master Jan 28, 2021
@jreback
Copy link
Contributor

jreback commented Jan 28, 2021

thanks @phofl

@phofl phofl deleted the 39422 branch January 28, 2021 20:14
@jorisvandenbossche
Copy link
Member

This was a regression, so I think worth backporting to 1.2.2. If we do that, we only need a small PR moving the whatsnew note to v1.2.2.rst (apart from the actual backport)

@jreback
Copy link
Contributor

jreback commented Jan 30, 2021

sure we could backport

@phofl if u wouldn't mind putting up a backport PR (need to edit the note)

and then separate PR to move the note in master

phofl added a commit to phofl/pandas that referenced this pull request Jan 30, 2021
…time dtype not datetime64[ns] (pandas-dev#39442)

(cherry picked from commit 624621b)
@phofl phofl modified the milestones: 1.3, 1.2.2 Jan 30, 2021
@phofl
Copy link
Member Author

phofl commented Jan 30, 2021

Done #39475 and #39476

jorisvandenbossche pushed a commit that referenced this pull request Jan 30, 2021
…h ndarray from datetime dtype not datetime64[ns (#39475)

(cherry picked from commit 624621b)
jorisvandenbossche pushed a commit that referenced this pull request Jan 30, 2021
@jorisvandenbossche
Copy link
Member

Thanks @phofl !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure
Projects
None yet
4 participants