Skip to content

List required for percentiles kwarg in DataFrame.describe when median is not present as opposed to array-like #14908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pbreach opened this issue Dec 18, 2016 · 1 comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@pbreach
Copy link
Contributor

pbreach commented Dec 18, 2016

Code Sample, a copy-pastable example if possible


In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.random((1000, 4)))

In [4]: percentiles = np.linspace(0, 0.99, 10)

In [5]: df.describe(percentiles=percentiles)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-83616b318eba> in <module>()
----> 1 df.describe(percentiles=percentiles)

C:\Users\pbreach\Anaconda3\lib\site-packages\pandas\core\generic.py in describe(self, percentiles, include, exclude)
   5194             # median should always be included
   5195             if 0.5 not in percentiles:
-> 5196                 percentiles.append(0.5)
   5197             percentiles = np.asarray(percentiles)
   5198         else:

AttributeError: 'numpy.ndarray' object has no attribute 'append'

Problem description

In the documentation the kwarg percentiles is expecting an array-like input, however when passing in a numpy array, an attribute error is thrown as if it were expecting a list. If a list is being expected in the case that the median is not found should there be an explicit conversion to list before the median is appended?

Expected Output

In [6]: df.describe(percentiles=list(percentiles))
Out[6]:
                 0            1            2            3
count  1000.000000  1000.000000  1000.000000  1000.000000
mean      0.500730     0.501185     0.498594     0.498648
std       0.289616     0.286023     0.290509     0.292264
min       0.001290     0.000822     0.000459     0.001975
0%        0.001290     0.000822     0.000459     0.001975
11%       0.119319     0.124990     0.107683     0.114136
22%       0.211321     0.232740     0.209913     0.227046
33%       0.331405     0.325820     0.336409     0.311294
44%       0.439314     0.446085     0.443036     0.431923
50%       0.500374     0.505759     0.499125     0.491579
55.0%     0.553634     0.552899     0.552896     0.544990
66%       0.666159     0.647926     0.661797     0.661387
77%       0.777984     0.774892     0.776067     0.773342
88%       0.883761     0.874293     0.872860     0.884350
99%       0.985795     0.989234     0.991083     0.993646
max       0.998623     0.999924     0.999723     0.999185

Output of pd.show_versions()

In [7]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Dec 18, 2016

this is a bug. should have a list() conversion first.

pull-requests are welcome.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Difficulty Novice labels Dec 18, 2016
@jreback jreback added this to the 0.20.0 milestone Dec 18, 2016
ShaharBental pushed a commit to ShaharBental/pandas that referenced this issue Dec 26, 2016
Explicit conversion to list for `percentiles`. Fixes the case where
`percentiles` is passed as a numpy with no median (0.5) present.
Closes pandas-dev#14908.

Author: pbreach <[email protected]>

Closes pandas-dev#14914 from pbreach/df-describe-percentile-ndarray-no-median and squashes the following commits:

5c8199b [pbreach] Minor test fix
b5d09a6 [pbreach] Added test for median insertion with ndarray
72fe0cb [pbreach] Added what's new entry
f954392 [pbreach] Moved conversion to if percentiles is not None
d192ac7 [pbreach] Fixed whitespace issue
a06794d [pbreach] BUG: Fixed bug in DataFrame.describe when percentiles are passed as array with no median
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

2 participants