Skip to content

loc on Series returning scalar when result size is 1 #31381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
omri374 opened this issue Jan 28, 2020 · 3 comments
Closed

loc on Series returning scalar when result size is 1 #31381

omri374 opened this issue Jan 28, 2020 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@omri374
Copy link

omri374 commented Jan 28, 2020

Code Sample

# Create a DataFrame with index and one Series
df = pd.DataFrame({"id":[1,2,3,3],"val":["abcde","bbcde","cbcde","dbcde"]})
df.set_index("id",inplace=True)

Calling .loc when the result length is 2 returns a Series (as expected):

type(df['val'].loc[3])

Output:

pandas.core.series.Series

Calling .loc when the result length is 1 returns a str:

# use .loc when the result length is 1:
type(df['val'].loc[1])

Output:

str

Problem description

The fact that the same code returns different types causes inconsistencies. If I run len on the result, I would get 2 for id=3 and 5 for id=1 or id=2, since the string length is 5:

len(df['val'].loc[3])

Output:

2

When calling loc[1]:

len(df['val'].loc[1])

Output:

5

Expected Output

.loc should return a Series regardless of the size. Otherwise this causes bugs when trying to evaluate the length of the output.

# Expected output:
type(df['val'].loc[1])

Output:

pandas.core.series.Series
len(df['val'].loc[1])

Output:

1

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 0.25.3 numpy : 1.17.4 pytz : 2019.1 dateutil : 2.8.0 pip : 19.3.1 setuptools : 42.0.2.post20191203 Cython : None pytest : 5.3.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.10.2 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.3.2 sqlalchemy : 1.3.12 tables : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None
@jreback
Copy link
Contributor

jreback commented Jan 28, 2020

this is a duplicate issue - pls do a detailed search

@omri374
Copy link
Author

omri374 commented Jan 28, 2020

Couldn't find the issue you refer to.

@omri374 omri374 closed this as completed Jan 28, 2020
@omri374 omri374 reopened this Jan 28, 2020
@jreback
Copy link
Contributor

jreback commented Jan 28, 2020

oldie but goodie
#9519

welcome to have go at impl
note this would be a breaking change so would need to simply deprecate things for now

@jreback jreback closed this as completed Jan 28, 2020
@jreback jreback added Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 28, 2020
@jreback jreback added this to the No action milestone Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants