DataFrame.loc[] returns inconsistent types depending on row count #11224

jerryatmda · 2015-10-02T18:00:01Z

If a dataframe has a single row for a given index entry, it returns a Series. If it
has two rows for that index, it returns a DataFrame. I believe that it should return a
DataFrame in either case for consistency.

Image attached, small dataframe and notebook exhibiting the problem attached. OK, so
I can't attach either the dataframe or the notebook, even suffixing them with .txt (github barfs).
So I'm pasting the text fragment after the image..

dataframe = '''
Locus,Decision,Group,Var,Region,Gene,Rows,Mutation,Profile
chr01:0018961727,Homopolymer,VS,CA,exonic,PAX7,1.0,synonymous SNV,000000010010001000000000001101
chr01:0027057772,Bad,IR-PM-VS,CA,exonic,ARID1A,1.0,nonsynonymous SNV,000000000000000000001000100000
chr01:0027057772,Bad,IR-PM-VS,CA,exonic,ARID1A,1.0,nonsynonymous SNV,000000000000000000011001110100
chr01:0027057772,Bad,IR-PM-VS,CA,exonic,ARID1A,1.0,nonsynonymous SNV,100000000001010000010001110110
'''
df = [line.split(',') for line in txt.split('\n')]
tdf = pd.DataFrame.from_records(df, index=(0,))
tdf
type(tdf.loc['chr01:0018961727']), type(tdf.loc['chr01:0027057772'])

jerryatmda · 2015-10-02T18:04:16Z

Sorry, the [0:1] in the image might mislead (I was trying to return exactly the first matching line of the group), but note that the call in the text is simply for the .loc[]

A second thing I forgot to mention was tying this to this issue, because I think they may be related:
#5839

I say that because I first encountered the problem above in the context of .groupby() where a group of one row is a Series, and a group of two or more is a DataFrame.

Oh, and because I couldn't add the notebook, I forgot to mention this is 0.16.2 with 2.7.

shoyer · 2015-10-02T18:48:48Z

I agree this might be a good idea, but it would certainly be a major break in the API. So I think it's unlikely to be feasible for pandas.

TomAugspurger · 2015-10-02T18:53:10Z

Agreed that this would be too disruptive a change.

jerryatmda · 2015-10-02T19:22:17Z

It certainly warrants a modification to
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html#pandas.DataFrame.loc
describing the differential return values.
Right now that page does not even discuss return values.

jreback · 2015-10-02T19:46:18Z

@jerryatmda

this is solely due to the fact that you have duplicates in the index.

In a unique index, you will always get the same type of data.

so a doc-note is fine, but this is actually a very rare case.

jreback · 2015-10-02T19:48:09Z

It is much simpler to use a guaranteed syntax

eg.

df.loc[[....]] which will always return a frame.

If you would like to add a note about using duplicates and selection (and how to use the guaranteed syntax) that would be fine.

jerryatmda · 2015-10-02T19:54:48Z

OK, that works, but I guess I don't understand "guaranteed syntax."
I just searched it in the docs, and came up with a single reference to the word "syntax."
This is pretty clearly a pandas term of art that has somehow escaped documentation in the manual thus far.
Since I don't know what it means, I am not the person to write that, sorry.

TomAugspurger · 2015-10-02T19:57:04Z

Jeff meant that passing in a list as an indexer will always return a DataFrame. So in your case it's tdf.loc[['chr01:0018961727']]) (notice the two sets of square brackets).

jerryatmda · 2015-10-02T20:00:01Z

Oh, and yes, I agree, it's due to the duplicates in the index, and sure, the wise DBA normalizes his data to 4th normal form -- unless he wants to use it.

Thanks to all for your help. I will hope to propose a note for the docs, but I really don't know where to start.

Thanks again,
Jerry

jerryatmda mentioned this issue Oct 2, 2015

df.groupby().apply() with only one group returns wrong shape! #5839

Closed

TomAugspurger added the Indexing Related to indexing on series/frames, not to indexes themselves label Oct 2, 2015

TomAugspurger closed this as completed Oct 2, 2015

jreback added the Usage Question label Oct 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.loc[] returns inconsistent types depending on row count #11224

DataFrame.loc[] returns inconsistent types depending on row count #11224

jerryatmda commented Oct 2, 2015

jerryatmda commented Oct 2, 2015

shoyer commented Oct 2, 2015

TomAugspurger commented Oct 2, 2015

jerryatmda commented Oct 2, 2015

jreback commented Oct 2, 2015

jreback commented Oct 2, 2015

jerryatmda commented Oct 2, 2015

TomAugspurger commented Oct 2, 2015

jerryatmda commented Oct 2, 2015

DataFrame.loc[] returns inconsistent types depending on row count #11224

DataFrame.loc[] returns inconsistent types depending on row count #11224

Comments

jerryatmda commented Oct 2, 2015

jerryatmda commented Oct 2, 2015

shoyer commented Oct 2, 2015

TomAugspurger commented Oct 2, 2015

jerryatmda commented Oct 2, 2015

jreback commented Oct 2, 2015

jreback commented Oct 2, 2015

jerryatmda commented Oct 2, 2015

TomAugspurger commented Oct 2, 2015

jerryatmda commented Oct 2, 2015