List required for `percentiles` kwarg in `DataFrame.describe` when median is not present as opposed to array-like #14908

pbreach · 2016-12-18T19:30:38Z

Code Sample, a copy-pastable example if possible


In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.random((1000, 4)))

In [4]: percentiles = np.linspace(0, 0.99, 10)

In [5]: df.describe(percentiles=percentiles)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-83616b318eba> in <module>()
----> 1 df.describe(percentiles=percentiles)

C:\Users\pbreach\Anaconda3\lib\site-packages\pandas\core\generic.py in describe(self, percentiles, include, exclude)
   5194             # median should always be included
   5195             if 0.5 not in percentiles:
-> 5196                 percentiles.append(0.5)
   5197             percentiles = np.asarray(percentiles)
   5198         else:

AttributeError: 'numpy.ndarray' object has no attribute 'append'

Problem description

In the documentation the kwarg percentiles is expecting an array-like input, however when passing in a numpy array, an attribute error is thrown as if it were expecting a list. If a list is being expected in the case that the median is not found should there be an explicit conversion to list before the median is appended?

Expected Output

In [6]: df.describe(percentiles=list(percentiles))
Out[6]:
                 0            1            2            3
count  1000.000000  1000.000000  1000.000000  1000.000000
mean      0.500730     0.501185     0.498594     0.498648
std       0.289616     0.286023     0.290509     0.292264
min       0.001290     0.000822     0.000459     0.001975
0%        0.001290     0.000822     0.000459     0.001975
11%       0.119319     0.124990     0.107683     0.114136
22%       0.211321     0.232740     0.209913     0.227046
33%       0.331405     0.325820     0.336409     0.311294
44%       0.439314     0.446085     0.443036     0.431923
50%       0.500374     0.505759     0.499125     0.491579
55.0%     0.553634     0.552899     0.552896     0.544990
66%       0.666159     0.647926     0.661797     0.661387
77%       0.777984     0.774892     0.776067     0.773342
88%       0.883761     0.874293     0.872860     0.884350
99%       0.985795     0.989234     0.991083     0.993646
max       0.998623     0.999924     0.999723     0.999185

Output of `pd.show_versions()`

In [7]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-12-18T20:20:17Z

this is a bug. should have a list() conversion first.

pull-requests are welcome.

Explicit conversion to list for `percentiles`. Fixes the case where `percentiles` is passed as a numpy with no median (0.5) present. Closes pandas-dev#14908. Author: pbreach <[email protected]> Closes pandas-dev#14914 from pbreach/df-describe-percentile-ndarray-no-median and squashes the following commits: 5c8199b [pbreach] Minor test fix b5d09a6 [pbreach] Added test for median insertion with ndarray 72fe0cb [pbreach] Added what's new entry f954392 [pbreach] Moved conversion to if percentiles is not None d192ac7 [pbreach] Fixed whitespace issue a06794d [pbreach] BUG: Fixed bug in DataFrame.describe when percentiles are passed as array with no median

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Difficulty Novice labels Dec 18, 2016

jreback added this to the 0.20.0 milestone Dec 18, 2016

pbreach mentioned this issue Dec 19, 2016

BUG: Fixed DataFrame.describe percentiles are ndarray w/ no median #14914

Closed

jreback closed this as completed in 8e630b6 Dec 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

List required for `percentiles` kwarg in `DataFrame.describe` when median is not present as opposed to array-like #14908

List required for `percentiles` kwarg in `DataFrame.describe` when median is not present as opposed to array-like #14908

pbreach commented Dec 18, 2016

INSTALLED VERSIONS

jreback commented Dec 18, 2016

Uh oh!

Uh oh!

List required for percentiles kwarg in DataFrame.describe when median is not present as opposed to array-like #14908

List required for percentiles kwarg in DataFrame.describe when median is not present as opposed to array-like #14908

Comments

pbreach commented Dec 18, 2016

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Dec 18, 2016

Uh oh!

List required for `percentiles` kwarg in `DataFrame.describe` when median is not present as opposed to array-like #14908

List required for `percentiles` kwarg in `DataFrame.describe` when median is not present as opposed to array-like #14908

Output of `pd.show_versions()`