Skip to content

DOC: undocumented head(n), tail(n) accept negative values. But not on GroupBy #30192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
smcinerney opened this issue Dec 10, 2019 · 4 comments
Closed

Comments

@smcinerney
Copy link

DOC: undocumented behavior that works: head(n), tail(n)accept negative values both for DataFrame, Series. But not for GroupBy.head(n) ; doesn't correctly handle negative values.

This is quite useful and could be documented.

  • head(-n) returns all rows(/elements) other than the n first values. Similar to df[-6::].
  • tail(-n) returns all rows(/elements) other than the n last values
  • There is no error if the length n specified is too long for the dataframe(/series), just silently returns an empty result.

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'x': range(18,0,-1), 'y': [x % 6 for x in range(18)] })

df.head(-25)
Empty DataFrame
Columns: [x, y]
Index: []

df.head(-15)
    x  y
0  18  0
1  17  1
2  16  2

 df.tail(-15)
    x  y
15  3  3
16  2  4
17  1  5

df['x'].tail(-15)
15    3
16    2
17    1

# But GroupBy.head(n) doesn't
 df.groupby('y').head(-1)
Empty DataFrame
Columns: [x, y]
Index: []
df.groupby('y').tail(-1)
Empty DataFrame
Columns: [x, y]
Index: []

Problem description

Add to documented behavior.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 5.2.4
pip: 19.3.1
setuptools: 41.6.0.post20191030
Cython: 0.29.14
numpy: 1.17.3
scipy: 1.3.1
pyarrow: None
xarray: None
IPython: 7.9.0
sphinx: 2.2.1
patsy: 0.5.1
dateutil: 2.8.1
pytz: 2019.3
blosc: None
bottleneck: 1.3.1
tables: 3.6.1
numexpr: 2.7.0
feather: None
matplotlib: 3.1.1
openpyxl: 3.0.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.2.6
lxml.etree: 4.4.1
bs4: 4.8.1
html5lib: 1.0.1
sqlalchemy: 1.3.11
pymysql: None
psycopg2: 2.8.4 (dt dec pq3 ext lo64)
jinja2: 2.10.3
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@smcinerney
Copy link
Author

Subsequently found out this might be a duplicate of #9214, except its title and body don't make clear that it's suggesting the undocumented behavior is useful and should be documented.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 11, 2019

Happy to have this documented / implemented on groupby. Are you interested in working on it?

@smcinerney
Copy link
Author

smcinerney commented Dec 25, 2019

This is marked as a docbug; I was mainly interested in documenting (both in docstring and pandas-docs) that this works on DataFrame, Series; I'm happy to do that if you can tell me how. Don't care much about implementing it on groupby.

@MarcoGorelli
Copy link
Member

Seems that this was closed by #30556, feel free to reopen if you disagree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants