loc on Series returning scalar when result size is 1 #31381

omri374 · 2020-01-28T07:38:23Z

Code Sample

# Create a DataFrame with index and one Series
df = pd.DataFrame({"id":[1,2,3,3],"val":["abcde","bbcde","cbcde","dbcde"]})
df.set_index("id",inplace=True)

Calling .loc when the result length is 2 returns a Series (as expected):

type(df['val'].loc[3])

Output:

pandas.core.series.Series

Calling .loc when the result length is 1 returns a str:

# use .loc when the result length is 1:
type(df['val'].loc[1])

Output:

str

Problem description

The fact that the same code returns different types causes inconsistencies. If I run len on the result, I would get 2 for id=3 and 5 for id=1 or id=2, since the string length is 5:

len(df['val'].loc[3])

Output:

When calling loc[1]:

len(df['val'].loc[1])

Output:

Expected Output

.loc should return a Series regardless of the size. Otherwise this causes bugs when trying to evaluate the length of the output.

# Expected output:
type(df['val'].loc[1])

Output:

pandas.core.series.Series

len(df['val'].loc[1])

Output:

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 0.25.3 numpy : 1.17.4 pytz : 2019.1 dateutil : 2.8.0 pip : 19.3.1 setuptools : 42.0.2.post20191203 Cython : None pytest : 5.3.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.10.2 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.3.2 sqlalchemy : 1.3.12 tables : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None

The text was updated successfully, but these errors were encountered:

jreback · 2020-01-28T09:48:06Z

this is a duplicate issue - pls do a detailed search

omri374 · 2020-01-28T11:47:27Z

Couldn't find the issue you refer to.

jreback · 2020-01-28T11:56:23Z

oldie but goodie
#9519

welcome to have go at impl
note this would be a breaking change so would need to simply deprecate things for now

omri374 closed this as completed Jan 28, 2020

omri374 reopened this Jan 28, 2020

jreback closed this as completed Jan 28, 2020

jreback added Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 28, 2020

jreback added this to the No action milestone Jan 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loc on Series returning scalar when result size is 1 #31381

loc on Series returning scalar when result size is 1 #31381

omri374 commented Jan 28, 2020 •

edited

Loading

jreback commented Jan 28, 2020

omri374 commented Jan 28, 2020 •

edited

Loading

jreback commented Jan 28, 2020

loc on Series returning scalar when result size is 1 #31381

loc on Series returning scalar when result size is 1 #31381

Comments

omri374 commented Jan 28, 2020 • edited Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

jreback commented Jan 28, 2020

omri374 commented Jan 28, 2020 • edited Loading

jreback commented Jan 28, 2020

omri374 commented Jan 28, 2020 •

edited

Loading

Output of `pd.show_versions()`

omri374 commented Jan 28, 2020 •

edited

Loading