Skip to content

BUG: get_level_values() on int level upcasts to Float64Index if needed #17930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 10, 2017

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Oct 20, 2017

Maybe the check could be more general... but I can't think of any case in which this is needed other than int -> float.

@codecov
Copy link

codecov bot commented Oct 20, 2017

Codecov Report

Merging #17930 into master will decrease coverage by 0.01%.
The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17930      +/-   ##
==========================================
- Coverage   91.58%   91.57%   -0.02%     
==========================================
  Files         153      153              
  Lines       51307    51313       +6     
==========================================
- Hits        46991    46988       -3     
- Misses       4316     4325       +9
Flag Coverage Δ
#multiple 89.43% <88.88%> (ø) ⬆️
#single 40.71% <55.55%> (-0.12%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/numeric.py 97.33% <88.88%> (+0.07%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2db1cc0...a8ad746. Read the comment docs.

@@ -970,6 +970,7 @@ Indexing
- Fixes ``DataFrame.loc`` for setting with alignment and tz-aware ``DatetimeIndex`` (:issue:`16889`)
- Avoids ``IndexError`` when passing an Index or Series to ``.iloc`` with older numpy (:issue:`17193`)
- Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
- Bug in ``MultiIndex.get_level_values()`` which would return an invalid index on level of ints with missing values (:issue:`17924`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could provide a link via :meth:`MultiIndex.get_level_values`

values = unique._shallow_copy(filled)

if not unique._can_hold_na and filled.dtype == 'float':
values = Index(filled, copy=False, **unique._get_attributes_dict())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for all this just use
_shallow_copy_with_infer

expected = np.array(['a', np.nan, 1], dtype=object)
tm.assert_numpy_array_equal(values.values, expected)
result = index.get_level_values(0)
# workaround for #17929 (should be 'float')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rip out this particular test case and xfail with the correct result
then when fixed it will xpass

values = index.get_level_values(0)
assert values.shape == (0, )
result = index.get_level_values(0)
assert result.shape == (0, )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u could assert index equal here as well

@toobaz toobaz force-pushed the get_level_values_cast branch from 10d7efa to 83a4b0f Compare October 21, 2017 00:07
@@ -906,7 +906,12 @@ def _get_level_values(self, level):
labels = self.labels[level]
filled = algos.take_1d(unique._values, labels,
fill_value=unique._na_value)
values = unique._shallow_copy(filled)

if not unique._can_hold_na and filled.dtype == 'float':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would always use the infer method no need to have an if

Copy link
Member Author

@toobaz toobaz Oct 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks several tests with dates (e.g. in pandas/tests/test_multilevel.py and pandas/tests/indexes/test_multi.py), in one case because "ValueError: Inferred frequency None from passed dates does not conform to passed frequency D", in other cases because DateTimeIndexes with time zone are _re_added the time zone (if I understand correctly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to see if you can find a general soln. we don't want to have if/then cases all over the place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "general" solution I can think is a property e.g. _na_upcast of Index subclasses (inherited from Index) which is the subclass itself if _can_hold_na = True (that is, all but Int64Index), and is overridden to Float64Index for Int64Index. Then I would just call unique._na_upcast._shallow_copy(filled).

Do you have in mind anything better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no idea ; you need to see why other tests are failing and work around

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 21, 2017
@jreback
Copy link
Contributor

jreback commented Nov 23, 2017

can you rebase and we can revisit

@toobaz toobaz force-pushed the get_level_values_cast branch from 83a4b0f to 328e175 Compare December 4, 2017 08:41
@pep8speaks
Copy link

pep8speaks commented Dec 4, 2017

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 10, 2017 at 18:36 Hours UTC

@toobaz toobaz force-pushed the get_level_values_cast branch 2 times, most recently from 6fe6775 to 2ffeaa5 Compare December 4, 2017 08:49
@toobaz
Copy link
Member Author

toobaz commented Dec 4, 2017

Sidenote: @pep8speaks highlighted an error ("E129 visually indented line with same indent as next logical line") which neither my local flake8 nor the LINT=true travis run catched. So apparently the two (online) checks run with different configurations (suppressed errors).

@@ -1061,6 +1061,7 @@ Indexing
- Fixes ``DataFrame.loc`` for setting with alignment and tz-aware ``DatetimeIndex`` (:issue:`16889`)
- Avoids ``IndexError`` when passing an Index or Series to ``.iloc`` with older numpy (:issue:`17193`)
- Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
- Bug in :meth:`MultiIndex.get_level_values` which would return an invalid index on level of ints with missing values (:issue:`17924`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.22

not is_integer_dtype(values) and
isna(values).any()):
attributes = self._get_attributes_dict()
attributes.update(kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too specific logic, rather call _shallow_copy_with_infer

@toobaz toobaz force-pushed the get_level_values_cast branch from 2ffeaa5 to 7ff2af4 Compare December 4, 2017 16:14
@toobaz
Copy link
Member Author

toobaz commented Dec 5, 2017

@jreback ping

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, do comments. Second would consider (new issue/PR) to split up test_multi.py into a sub-dir as getting pretty long.

@@ -62,6 +62,13 @@ def _maybe_cast_slice_bound(self, label, side, kind):
# we will try to coerce to integers
return self._maybe_cast_indexer(label)

@Appender(_index_shared_docs['_shallow_copy'])
def _shallow_copy(self, values=None, **kwargs):
if values is not None and not self._can_hold_na:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here on why we are doing this (yes its obvious to me, but not generally to future readers).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)
ping

tm.assert_index_equal(result, expected)

def test_get_level_values_all_na(self):
arrays = [[np.nan, np.nan, np.nan], ['a', np.nan, 1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add GH 17924 here as well as a comment

@toobaz toobaz force-pushed the get_level_values_cast branch from 7ff2af4 to 78d967a Compare December 6, 2017 10:21
@jreback jreback added this to the 0.22.0 milestone Dec 10, 2017
@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

resolved conflicts. ping on green.

@jreback jreback merged commit 37086a0 into pandas-dev:master Dec 10, 2017
@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

thanks!

@toobaz toobaz deleted the get_level_values_cast branch December 10, 2017 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiIndex.get_leves_values() returns inconsistent Index if there are NaNs
4 participants