-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.loc[] returns inconsistent types depending on row count #11224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry, the [0:1] in the image might mislead (I was trying to return exactly the first matching line of the group), but note that the call in the text is simply for the .loc[] A second thing I forgot to mention was tying this to this issue, because I think they may be related: I say that because I first encountered the problem above in the context of .groupby() where a group of one row is a Series, and a group of two or more is a DataFrame. Oh, and because I couldn't add the notebook, I forgot to mention this is 0.16.2 with 2.7. |
I agree this might be a good idea, but it would certainly be a major break in the API. So I think it's unlikely to be feasible for pandas. |
Agreed that this would be too disruptive a change. |
It certainly warrants a modification to |
this is solely due to the fact that you have duplicates in the index. In a unique index, you will always get the same type of data. so a doc-note is fine, but this is actually a very rare case. |
It is much simpler to use a guaranteed syntax eg.
If you would like to add a note about using duplicates and selection (and how to use the guaranteed syntax) that would be fine. |
OK, that works, but I guess I don't understand "guaranteed syntax." |
Jeff meant that passing in a list as an indexer will always return a DataFrame. So in your case it's |
Oh, and yes, I agree, it's due to the duplicates in the index, and sure, the wise DBA normalizes his data to 4th normal form -- unless he wants to use it. Thanks to all for your help. I will hope to propose a note for the docs, but I really don't know where to start. Thanks again, |
If a dataframe has a single row for a given index entry, it returns a Series. If it
has two rows for that index, it returns a DataFrame. I believe that it should return a
DataFrame in either case for consistency.
Image attached, small dataframe and notebook exhibiting the problem attached. OK, so
I can't attach either the dataframe or the notebook, even suffixing them with .txt (github barfs).
So I'm pasting the text fragment after the image..
dataframe = '''
Locus,Decision,Group,Var,Region,Gene,Rows,Mutation,Profile
chr01:0018961727,Homopolymer,VS,CA,exonic,PAX7,1.0,synonymous SNV,000000010010001000000000001101
chr01:0027057772,Bad,IR-PM-VS,CA,exonic,ARID1A,1.0,nonsynonymous SNV,000000000000000000001000100000
chr01:0027057772,Bad,IR-PM-VS,CA,exonic,ARID1A,1.0,nonsynonymous SNV,000000000000000000011001110100
chr01:0027057772,Bad,IR-PM-VS,CA,exonic,ARID1A,1.0,nonsynonymous SNV,100000000001010000010001110110
'''
df = [line.split(',') for line in txt.split('\n')]
tdf = pd.DataFrame.from_records(df, index=(0,))
tdf
type(tdf.loc['chr01:0018961727']), type(tdf.loc['chr01:0027057772'])
The text was updated successfully, but these errors were encountered: