Skip to content

Wrong slicing behavior for DataFrame loc #14900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Ajeet-Ganga opened this issue Dec 17, 2016 · 5 comments
Closed

Wrong slicing behavior for DataFrame loc #14900

Ajeet-Ganga opened this issue Dec 17, 2016 · 5 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Usage Question

Comments

@Ajeet-Ganga
Copy link

Code Sample, a copy-pastable example if possible

>>> df
   0   1   2   3
0  1  21  51  61
1  2  22  52  62
2  3  23  53  63
>>> df[0:0]
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []
>>> df.loc[0:0]
   0   1   2   3
0  1  21  51  61

For loc, slicing is incorrect

The behavior exhibited by slicing of loc is incosistent with python array slicing.
For [0:0] it should have returned empty, but it is returning a row.

Expected Output

Similar to code show below for python array, pd.DataFrame.loc slicing should produce empty

E.g. Following code slices empty for [0:0]

>>> l = [0,1,2]
>>> l[0:0]
[]

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.1.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 25.2.0
Cython: None
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.2.0-b1
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Dec 17, 2016

http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label

label slicing always includes the endpoint

positional (iloc) slicing matches python semantics

@jreback jreback closed this as completed Dec 17, 2016
@Ajeet-Ganga
Copy link
Author

Isn't the example below producing empty slice ?
Shouldn't it be consistent with the following experience ?

>>> l = [0,1,2]
>>> l[0:0]
[]

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Usage Question labels Dec 17, 2016
@jreback jreback added this to the No action milestone Dec 17, 2016
@jreback
Copy link
Contributor

jreback commented Dec 17, 2016

read the docs

@Ajeet-Ganga
Copy link
Author

What would we loose by being consistent?
Just because we put something in the documentation doesn't make it right.

Performance over consistent behavior is not even a contest.

@jreback
Copy link
Contributor

jreback commented Dec 17, 2016

you are missing the point

.iloc is positional indexing
.loc is label indexing

these are two separate and distinct ways of selecting data
numpy and python only have positional concepts and thus have only 1 way of indexing

pandas can also index by labels; since these can be for example strings (or datetimes or integers)
you now have different semantics

again the docs are very complete on these concepts

and if you chose not to use label indexing, then just use .iloc and it will feel like python/numpy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants