-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Use IntergerArray to avoid forced conversion from integer to float #27335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Use IntergerArray to avoid forced conversion from integer to float #27335
Conversation
Is there a way to make this work without adding a keyword to the DataFrame constructor? That approach seems off to me Also be sure to add tests with this - its not entirely clear that this does anything as is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangyue12392 not what I wanted here. sure fixing maybe_convert_numeric to allow return integer_array might close #27283 , but the rest is completely orthogonal and we cannot propagate a keyword thru DataFrame.
@WillAyd @jreback If the original implementation of using DataFrame to do the expansion of I will add tests later, but I want to know what is the more accepted methods here first. |
@jiangyue12392 we are likely going to do the inference for expanding a int dtype with nulls to integer-na (rather than float) at some point, but currently this is NOT the default. We need some more infrastructure first (see the linked issue). |
@jiangyue12392 actually part of this PR could partially satisfy that issue |
@jreback I suppose the part where the change in |
so what you do is make the return string integer-na (this is a small api change) that’s it |
@jiangyue12392 can you merge master |
Sorry, I have been busy with other stuff recently, will merge as soon as I get my hands free |
@jiangyue12392 if you can merge master; I think parts of this patch we would want to merge. |
4ac8859
to
6a5f33b
Compare
Master merged. Sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you just do the internal change in maybe_convert_objects and add a test for that we can get this in. pls merge master as wel.
pandas/core/frame.py
Outdated
@@ -397,6 +397,7 @@ def __init__( | |||
columns: Optional[Axes] = None, | |||
dtype: Optional[Dtype] = None, | |||
copy: bool = False, | |||
to_integer_array: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't have this keyword here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
External keywords removed
pandas/_libs/lib.pyx
Outdated
@@ -1951,7 +1953,7 @@ def maybe_convert_numeric(ndarray[object] values, set na_values, | |||
@cython.wraparound(False) | |||
def maybe_convert_objects(ndarray[object] objects, bint try_float=0, | |||
bint safe=0, bint convert_datetime=0, | |||
bint convert_timedelta=0): | |||
bint convert_timedelta=0, to_integer_array=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok with a keyword (here as this is in internal funtion), but can you name
bint convert_to_nullable_integer=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name refactored.
I have made the required changes and added the test. |
# GH27335 | ||
arr = np.array([2, np.NaN], dtype=object) | ||
result = lib.maybe_convert_objects(arr, convert_to_nullable_integer=1) | ||
from pandas.core.arrays import IntegerArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can import at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shifted import statement
result = lib.maybe_convert_objects(arr, convert_to_nullable_integer=1) | ||
from pandas.core.arrays import IntegerArray | ||
|
||
exp = IntegerArray(np.array([2, 0], dtype="i8"), np.array([False, True])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this test with an int64 array as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameterized as requested
from pandas.core.arrays import IntegerArray | ||
|
||
exp = IntegerArray(np.array([2, 0], dtype="i8"), np.array([False, True])) | ||
tm.assert_equal(result, exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tm.assert_extension_array_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion method changed
@@ -2085,11 +2094,19 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0, | |||
|
|||
if not seen.object_: | |||
if not safe: | |||
if seen.null_: | |||
if seen.null_ or seen.nan_: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate issue is we should change line 2083 to use a DatetimeArray (can be separate PR) or here if it works out.
can you also update the doc-string (well add it really :->) thanks for workign on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave line 2083 out as it is a separate issue.
Which version of the doc-string shall I modify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, pls rebase and ping on green.
@@ -1955,7 +1957,8 @@ def maybe_convert_numeric(ndarray[object] values, set na_values, | |||
@cython.wraparound(False) | |||
def maybe_convert_objects(ndarray[object] objects, bint try_float=0, | |||
bint safe=0, bint convert_datetime=0, | |||
bint convert_timedelta=0): | |||
bint convert_timedelta=0, | |||
bint convert_to_nullable_integer=0): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can update this doc-string here e.g. Returns / Parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc-string added
09c7168
to
e8591ef
Compare
Green now. |
thanks @jiangyue12392 nice patch. This should enable us to selectively convert to nullable integers with the user facing routines. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This is a new attempt to solve issue #16918 after an attempt with PR #27073