-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix select_dtypes(include='int') for Windows. #36808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
05406dc
4b75067
fb8cae9
49826cb
bd6e724
288efe9
39ef89b
f94722b
810bbab
9769b9d
f45f271
e8b523a
ac82794
a913bd8
502093b
9f26e18
06b7077
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -82,6 +82,12 @@ def test_select_dtypes_exclude_include_using_list_like(self): | |
e = df[["b", "c", "e"]] | ||
tm.assert_frame_equal(r, e) | ||
|
||
exclude = (np.datetime64,) | ||
include = np.bool_, "int" | ||
r = df.select_dtypes(include=include, exclude=exclude) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you also add the case where inlclude='integer' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok |
||
e = df[["b", "e"]] | ||
tm.assert_frame_equal(r, e) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls separate this out into a dedicated test with a descriptive name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok |
||
|
||
exclude = ("datetime",) | ||
include = "bool", "int64", "int32" | ||
r = df.select_dtypes(include=include, exclude=exclude) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. call this result and expected There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, unfortunately, so is numpy array construction,
existing behaviour
this workflow would break on Windows with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it wouldn't work on windows with these changes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, pandas will interpret int as np.int64 and numpy as np.int32 on Windows. Their behavior on Linux will be identical. Though, based on the example in #36596, I see that pandas was mapping integers to np.int64 on Windows by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to change this routine at all which has far reaching effects. not averse to a tactical change in .select_dtypes itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I can roll back the changes here and add a warning that
int
is ambiguous andnp.int64
ornp.int32
should be used indf.select_dtypes
instead. Another option will be to see why pandas does not follow numpy's approach and, in some cases, treatsint
asint64
on all platforms (numpy
mapsint
tonp.int64
on Linux, and on Windows it will benp.int32
).