Skip to content

Closes #36541 (BUG: ValueError: cannot convert float NaN to integer when resetting MultiIndex with NaT values) #36563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Oct 7, 2020

Conversation

ssche
Copy link
Contributor

@ssche ssche commented Sep 23, 2020

@@ -310,6 +310,7 @@ MultiIndex
^^^^^^^^^^

- Bug in :meth:`DataFrame.xs` when used with :class:`IndexSlice` raises ``TypeError`` with message `Expected label or tuple of labels` (:issue:`35301`)
- Bug in :func:`construct_1d_arraylike_from_scalar` when using :meth:`DataFrame.reset_index` with ``NaT`` values in index raises ``ValueError`` with message `cannot convert float NaN to integer` (:issue:`36541`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the first part - these are always user facing notes

@@ -1559,6 +1559,11 @@ def construct_1d_arraylike_from_scalar(
dtype = np.dtype("object")
if not isna(value):
value = ensure_str(value)
elif isinstance(dtype, np.dtype) and dtype.kind == "M" and value is NaT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_nat_for_dtype (is not exactly this same but similar)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_valid_nat_for_dtype is the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't just use is_valid_nat_for_dtype as it also returned true for is_valid_nat_for_dtype(nan, float64) (which was a problem for the test_modulo2 test case for instance (I didn't check others which also failed)).

result = result.reset_index()

expected = pd.DataFrame({"a": [pd.NaT, pd.NaT], "b": [1, 2], "x": [11, 12]})
pd.testing.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tm.assert...

* simplified whatsnew to reflect intended users
* use `is_valid_nat_for_dtype`
* use `tm.assert_...`
@ssche ssche requested a review from jreback September 23, 2020 12:02
@dsaxton dsaxton added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex labels Sep 23, 2020
@ssche
Copy link
Contributor Author

ssche commented Sep 25, 2020

I might need some help regarding the build error (Arm64, Xenial, Py3.7). Seems like there are some files missing, but I’m not sure what I could do to fix that...?

# Conflicts:
#	doc/source/whatsnew/v1.2.0.rst

def test_nat_multi_index():
# GH36541: that reset_index() does not raise ValueError
ix = pd.MultiIndex.from_tuples([(pd.NaT, 1), (pd.NaT, 2)], names=["a", "b"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try this with one element a Timestamp (and one a NaT)
also add a case where you have a Timedelta and a NaT (and I think we might need to change the code slightly)

can parameterize the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test cases. please have another look and advise.

@jreback jreback added this to the 1.2 milestone Oct 2, 2020
@@ -1559,6 +1559,11 @@ def construct_1d_arraylike_from_scalar(
dtype = np.dtype("object")
if not isna(value):
value = ensure_str(value)
elif dtype.kind == "M" and is_valid_nat_for_dtype(value, dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be dtype.kind in ['M', 'm'] (do the tests fail w/o this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, they didn't before with "M" only (except for the ARM tests which I believe have an unrelated problem).

Trying with timedelta "m" now...

@jreback
Copy link
Contributor

jreback commented Oct 2, 2020

pls merge master as well

@ssche ssche requested a review from jreback October 4, 2020 11:32
@ssche
Copy link
Contributor Author

ssche commented Oct 5, 2020

ready to merge @jreback

@jreback jreback merged commit 3402233 into pandas-dev:master Oct 7, 2020
@jreback
Copy link
Contributor

jreback commented Oct 7, 2020

thanks @ssche very nice!

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
… integer when resetting MultiIndex with NaT values) (pandas-dev#36563)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: ValueError: cannot convert float NaN to integer when resetting MultiIndex with NaT values
3 participants