-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Infer integer-na in infer_dtype #27392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test or two would be good to have here I imagine.
6e252a6
to
f1b9408
Compare
@gfyoung Tests added |
26a9658
to
514a7c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. you can add a note on api-breaking changes that infer_dtype
will now return integer-na for mixed-floats
pandas/_libs/lib.pyx
Outdated
cdef inline bint is_value_typed(self, object value) except -1: | ||
return util.is_integer_object(value) or util.is_nan(value) | ||
|
||
cdef inline bint is_array_typed(self) except -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure you need this one, is it actually hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
pandas/_libs/lib.pyx
Outdated
@@ -1511,6 +1517,21 @@ cpdef bint is_integer_array(ndarray values): | |||
return validator.validate(values) | |||
|
|||
|
|||
cdef class IntegerNaValidator(Validator): | |||
cdef inline bint is_value_typed(self, object value) except -1: | |||
return util.is_integer_object(value) or util.is_nan(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a is_valid_null as well (e.g. see how PeriodValidator is done)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to util.is_nan(value) and util.is_float_object(value)
instead
pandas/core/indexes/base.py
Outdated
@@ -3196,7 +3197,10 @@ def is_int(v): | |||
self.get_loc(stop) | |||
is_positional = False | |||
except KeyError: | |||
if self.inferred_type == "mixed-integer-float": | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.inferred_type in [....]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -577,6 +577,16 @@ def test_integers(self): | |||
result = lib.infer_dtype(arr, skipna=True) | |||
assert result == "integer" | |||
|
|||
# GH 27392 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move to inside the definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -577,6 +577,16 @@ def test_integers(self): | |||
result = lib.infer_dtype(arr, skipna=True) | |||
assert result == "integer" | |||
|
|||
# GH 27392 | |||
def test_integer_na(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize over skipna?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameterization done
966ac20
to
55cb505
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatsnew change, otherwise lgtm.
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -825,6 +825,7 @@ Other API changes | |||
- :meth:`ExtensionArray.argsort` places NA values at the end of the sorted array. (:issue:`21801`) | |||
- :meth:`DataFrame.to_hdf` and :meth:`Series.to_hdf` will now raise a ``NotImplementedError`` when saving a :class:`MultiIndex` with extention data types for a ``fixed`` format. (:issue:`7775`) | |||
- Passing duplicate ``names`` in :meth:`read_csv` will now raise a ``ValueError`` (:issue:`17346`) | |||
- :meth:`infer_type` will now return integer-na for integer and np.nan mix (:issue:`27283`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put integer-na in quotes here; move to 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated and moved
55cb505
to
fdcaf6e
Compare
93cdef8
to
3f51da7
Compare
3f51da7
to
8123014
Compare
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -45,6 +45,7 @@ Backwards incompatible API changes | |||
Other API changes | |||
^^^^^^^^^^^^^^^^^ | |||
|
|||
- :meth:`infer_type` will now return "integer-na" for integer and np.nan mix (:issue:`27283`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be :meth:`pandas.api.types.infer_dtype`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. small change in the whatsnew, ping on green.
@jreback Green now |
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -45,6 +45,7 @@ Backwards incompatible API changes | |||
Other API changes | |||
^^^^^^^^^^^^^^^^^ | |||
|
|||
- :meth:`pandas.api.types.infer_type` will now return "integer-na" for integer and np.nan mix (:issue:`27283`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infer_type -> infer_dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated
@gfyoung ok here? |
16ce485
to
1c67efb
Compare
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -45,6 +45,7 @@ Backwards incompatible API changes | |||
Other API changes | |||
^^^^^^^^^^^^^^^^^ | |||
|
|||
- :meth:`pandas.api.types.infer_dtype` will now return "integer-na" for integer and np.nan mix (:issue:`27283`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double back-ticks around np.nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
1c67efb
to
f4b6869
Compare
@@ -577,6 +577,28 @@ def test_integers(self): | |||
result = lib.infer_dtype(arr, skipna=True) | |||
assert result == "integer" | |||
|
|||
@pytest.mark.parametrize( | |||
"arr, skipna, expected", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to parameterize expected
here.
Notice how it's integer-na
when skipa
is False
and integer
when skipna
is True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sure, updated
f4b6869
to
cfed693
Compare
cfed693
to
fad37c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! @jreback back over to you
thanks @jiangyue12392 keep em coming! you are tackling non-trivial issues and making very nice PRs! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
As mentioned in #27335, this PR serves as an infrastructure for future changes in the default int type