Skip to content

Duplicate index with FloatIndex giving 'ValueError: Length mismatch' #7143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue May 16, 2014 · 6 comments · Fixed by #7149
Closed

Duplicate index with FloatIndex giving 'ValueError: Length mismatch' #7143

jorisvandenbossche opened this issue May 16, 2014 · 6 comments · Fixed by #7149
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Not certain if this is a bug or defined behaviour (but then the error message is not clear in any case).

In 0.13.1:

In [28]: df = pd.DataFrame(np.random.randn(9).reshape(3,3), index=[0.1,0.2,0.2],
 columns=['a','b','c'])
In [29]: df
Out[29]:
            a         b         c
0.1  1.711117  1.218853 -1.322363
0.2  0.956266  0.230374 -1.005935
0.2 -0.137729 -0.993931 -0.902793

In [30]: df.ix[0.2,'a']
Out[30]: array([ 0.95626607, -0.13772877])

In [31]: df.ix[0.2]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: Length mismatch: Expected axis has 0 elements, new values have 2 ele
ments

In master, both (df.loc[0.2] and df.loc[0.2, 'a']) give this error message. Wile for integer index, this works.

@jreback
Copy link
Contributor

jreback commented May 16, 2014

hmm...would say that's a bug (also your first example should return a Series).

@jreback jreback added this to the 0.14.1 milestone May 16, 2014
@jorisvandenbossche
Copy link
Member Author

about the Series, that is indeed how it is with integer indices

@cpcloud
Copy link
Member

cpcloud commented May 16, 2014

hm duplicate fun i'll take this ... @jreback is this a blocker?

@cpcloud
Copy link
Member

cpcloud commented May 16, 2014

oh nvm i c u marked as 0.14.1

@cpcloud cpcloud self-assigned this May 16, 2014
@jreback
Copy link
Contributor

jreback commented May 16, 2014

sure

@cpcloud
Copy link
Member

cpcloud commented May 17, 2014

this has been an enlightening experience. @jreback you are a hero for squashing all of these duplicate indexing bugs.

couple of things:

  1. i've fixed the ValueError bug, just needed an annoying copypaste of the IndexEngine._maybe_get_bool_indexer with a type change and a subtle slicing KeyError because of duplicates. This should be fused-typed at some point to eliminate the copypasting between the different engine types on this method

  2. doesn't look like ix really ever returned a Series for duplicate indices, and that's because it calls self.obj.get_value(*key) which is designed for single element access and directly pulls the ndarray from the underlying index engine.

    BUT

if you remove that line then _getitem_tuple does upcasting and the aptly named TestDataFrame.test_single_element_ix_dont_upcast breaks 😞

i see two ways to deal with this:

  1. break the test (bad)
  2. don't upcast (deeper into the rabbit hole)

I'll submit a PR for the ValueError and open a separate issue for the ix insanity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants