Wrong slicing behavior for DataFrame loc #14900

Ajeet-Ganga · 2016-12-17T00:04:14Z

Code Sample, a copy-pastable example if possible

>>> df
   0   1   2   3
0  1  21  51  61
1  2  22  52  62
2  3  23  53  63
>>> df[0:0]
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []
>>> df.loc[0:0]
   0   1   2   3
0  1  21  51  61

For loc, slicing is incorrect

The behavior exhibited by slicing of loc is incosistent with python array slicing.
For [0:0] it should have returned empty, but it is returning a row.

Expected Output

Similar to code show below for python array, pd.DataFrame.loc slicing should produce empty

E.g. Following code slices empty for [0:0]

>>> l = [0,1,2]
>>> l[0:0]
[]

Output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.1.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 25.2.0
Cython: None
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.2.0-b1
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-12-17T00:10:25Z

http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label

label slicing always includes the endpoint

positional (iloc) slicing matches python semantics

Ajeet-Ganga · 2016-12-17T00:12:17Z

Isn't the example below producing empty slice ?
Shouldn't it be consistent with the following experience ?

>>> l = [0,1,2]
>>> l[0:0]
[]

jreback · 2016-12-17T00:20:26Z

read the docs

Ajeet-Ganga · 2016-12-17T00:31:25Z

What would we loose by being consistent?
Just because we put something in the documentation doesn't make it right.

Performance over consistent behavior is not even a contest.

jreback · 2016-12-17T00:42:58Z

you are missing the point

.iloc is positional indexing
.loc is label indexing

these are two separate and distinct ways of selecting data
numpy and python only have positional concepts and thus have only 1 way of indexing

pandas can also index by labels; since these can be for example strings (or datetimes or integers)
you now have different semantics

again the docs are very complete on these concepts

and if you chose not to use label indexing, then just use .iloc and it will feel like python/numpy

jreback closed this as completed Dec 17, 2016

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Usage Question labels Dec 17, 2016

jreback added this to the No action milestone Dec 17, 2016

ghost mentioned this issue Jun 26, 2019

Discussion: Why is loc label-based slicing right-inclusive? #27059

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong slicing behavior for DataFrame loc #14900

Wrong slicing behavior for DataFrame loc #14900

Ajeet-Ganga commented Dec 17, 2016

jreback commented Dec 17, 2016

Ajeet-Ganga commented Dec 17, 2016

jreback commented Dec 17, 2016

Ajeet-Ganga commented Dec 17, 2016

jreback commented Dec 17, 2016 •

edited

Loading

Wrong slicing behavior for DataFrame loc #14900

Wrong slicing behavior for DataFrame loc #14900

Comments

Ajeet-Ganga commented Dec 17, 2016

Code Sample, a copy-pastable example if possible

For loc, slicing is incorrect

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Dec 17, 2016

Ajeet-Ganga commented Dec 17, 2016

jreback commented Dec 17, 2016

Ajeet-Ganga commented Dec 17, 2016

jreback commented Dec 17, 2016 • edited Loading

Output of `pd.show_versions()`

jreback commented Dec 17, 2016 •

edited

Loading