Skip to content

ENH: Infer integer-na in infer_dtype #27392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 31, 2019

Conversation

jiangyue12392
Copy link
Contributor

@jiangyue12392 jiangyue12392 commented Jul 15, 2019

As mentioned in #27335, this PR serves as an infrastructure for future changes in the default int type

@gfyoung gfyoung added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jul 15, 2019
Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test or two would be good to have here I imagine.

@jiangyue12392
Copy link
Contributor Author

@gfyoung Tests added

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. you can add a note on api-breaking changes that infer_dtype will now return integer-na for mixed-floats

cdef inline bint is_value_typed(self, object value) except -1:
return util.is_integer_object(value) or util.is_nan(value)

cdef inline bint is_array_typed(self) except -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure you need this one, is it actually hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -1511,6 +1517,21 @@ cpdef bint is_integer_array(ndarray values):
return validator.validate(values)


cdef class IntegerNaValidator(Validator):
cdef inline bint is_value_typed(self, object value) except -1:
return util.is_integer_object(value) or util.is_nan(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a is_valid_null as well (e.g. see how PeriodValidator is done)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to util.is_nan(value) and util.is_float_object(value) instead

@@ -3196,7 +3197,10 @@ def is_int(v):
self.get_loc(stop)
is_positional = False
except KeyError:
if self.inferred_type == "mixed-integer-float":
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.inferred_type in [....]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -577,6 +577,16 @@ def test_integers(self):
result = lib.infer_dtype(arr, skipna=True)
assert result == "integer"

# GH 27392
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to inside the definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -577,6 +577,16 @@ def test_integers(self):
result = lib.infer_dtype(arr, skipna=True)
assert result == "integer"

# GH 27392
def test_integer_na(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize over skipna?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterization done

@jiangyue12392 jiangyue12392 force-pushed the integer-na branch 2 times, most recently from 966ac20 to 55cb505 Compare July 23, 2019 10:06
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatsnew change, otherwise lgtm.

@@ -825,6 +825,7 @@ Other API changes
- :meth:`ExtensionArray.argsort` places NA values at the end of the sorted array. (:issue:`21801`)
- :meth:`DataFrame.to_hdf` and :meth:`Series.to_hdf` will now raise a ``NotImplementedError`` when saving a :class:`MultiIndex` with extention data types for a ``fixed`` format. (:issue:`7775`)
- Passing duplicate ``names`` in :meth:`read_csv` will now raise a ``ValueError`` (:issue:`17346`)
- :meth:`infer_type` will now return integer-na for integer and np.nan mix (:issue:`27283`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put integer-na in quotes here; move to 1.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and moved

@jreback jreback added this to the 1.0 milestone Jul 25, 2019
@@ -45,6 +45,7 @@ Backwards incompatible API changes
Other API changes
^^^^^^^^^^^^^^^^^

- :meth:`infer_type` will now return "integer-na" for integer and np.nan mix (:issue:`27283`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be :meth:`pandas.api.types.infer_dtype`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. small change in the whatsnew, ping on green.

@jiangyue12392
Copy link
Contributor Author

@jreback Green now

@@ -45,6 +45,7 @@ Backwards incompatible API changes
Other API changes
^^^^^^^^^^^^^^^^^

- :meth:`pandas.api.types.infer_type` will now return "integer-na" for integer and np.nan mix (:issue:`27283`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infer_type -> infer_dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated

@jreback
Copy link
Contributor

jreback commented Jul 27, 2019

@gfyoung ok here?

@@ -45,6 +45,7 @@ Backwards incompatible API changes
Other API changes
^^^^^^^^^^^^^^^^^

- :meth:`pandas.api.types.infer_dtype` will now return "integer-na" for integer and np.nan mix (:issue:`27283`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double back-ticks around np.nan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -577,6 +577,28 @@ def test_integers(self):
result = lib.infer_dtype(arr, skipna=True)
assert result == "integer"

@pytest.mark.parametrize(
"arr, skipna, expected",
Copy link
Member

@gfyoung gfyoung Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to parameterize expected here.

Notice how it's integer-na when skipa is False and integer when skipna is True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sure, updated

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! @jreback back over to you

@jreback jreback merged commit eb9a8e3 into pandas-dev:master Jul 31, 2019
@jreback
Copy link
Contributor

jreback commented Jul 31, 2019

thanks @jiangyue12392 keep em coming! you are tackling non-trivial issues and making very nice PRs!

quintusdias pushed a commit to quintusdias/pandas_dev that referenced this pull request Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: infer_dtype should infer integer-na
3 participants