BUG: pd.qcut doesn't seem to support ndarray #18173

pratapvardhan · 2017-11-08T16:23:40Z

pd.qcut doesn't seem to support ndarray type. However, docstrings points x : ndarray or Series

In [46]: d = range(5)

In [47]: d
Out[47]: [0, 1, 2, 3, 4]

In [48]: pd.qcut(d, [0, 1])
Out[48]:
[(-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0]]
Categories (1, interval[float64]): [(-0.001, 4.0]]

In [49]: d = np.array(range(5))

In [50]: d
Out[50]: array([0, 1, 2, 3, 4])

In [51]: pd.qcut(d, [0, 1])
Out[51]:
[(-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0]]
Categories (1, interval[float64]): [(-0.001, 4.0]]

In [52]: d = np.array([[x] for x in range(5)])

In [53]: type(d)
Out[53]: numpy.ndarray

In [54]: d
Out[54]:
array([[0],
       [1],
       [2],
       [3],
       [4]])

In [55]: pd.qcut(d, [0, 1])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-b172e74b0ffd> in <module>()
----> 1 pd.qcut(d, [0, 1])

e:\github\pandas\pandas\core\reshape\tile.pyc in qcut(x, q, labels, retbins, precision, duplicates)
    206     fac, bins = _bins_to_cuts(x, bins, labels=labels,
    207                               precision=precision, include_lowest=True,
--> 208                               dtype=dtype, duplicates=duplicates)
    209
    210     return _postprocess_for_cut(fac, bins, retbins, x_is_series,

e:\github\pandas\pandas\core\reshape\tile.pyc in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates)
    258
    259         np.putmask(ids, na_mask, 0)
--> 260         result = algos.take_nd(labels, ids - 1)
    261
    262     else:

e:\github\pandas\pandas\core\algorithms.pyc in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
   1318     if is_categorical(arr):
   1319         return arr.take_nd(indexer, fill_value=fill_value,
-> 1320                            allow_fill=allow_fill)
   1321     elif is_datetimetz(arr):
   1322         return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)

e:\github\pandas\pandas\core\categorical.pyc in take_nd(self, indexer, allow_fill, fill_value)
   1703         assert isna(fill_value)
   1704
-> 1705         codes = take_1d(self._codes, indexer, allow_fill=True, fill_value=-1)
   1706         result = self._constructor(codes, categories=self.categories,
   1707                                    ordered=self.ordered, fastpath=True)

e:\github\pandas\pandas\core\algorithms.pyc in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
   1381     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,
   1382                                  mask_info=mask_info)
-> 1383     func(arr, indexer, out, fill_value)
   1384
   1385     if flip_order:

e:\github\pandas\pandas\_libs\algos_take_helper.pxi in pandas._libs.algos.take_1d_int8_int8 (pandas\_libs\algos.c:75951)()
    560 @cython.boundscheck(False)
    561 def take_1d_int8_int8(ndarray[int8_t, ndim=1] values,
--> 562                               int64_t[:] indexer,
    563                               int8_t[:] out,
    564                               fill_value=np.nan):

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Problem description

pd.qcut doesn't seem to support ndarray type. However, docstrings points x : ndarray or Series.

Is this expected behavior or does it imply ndarray means for shape (n, ) only?

I can confirm this was working in earlier versions.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.22.0.dev0+16.gd70526b
pytest: 3.2.0
pip: 9.0.1
setuptools: 36.2.7
Cython: 0.24.1
numpy: 1.12.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999999999
sqlalchemy: 1.0.13
pymysql: 0.7.9.None
psycopg2: 2.7.3.1 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

chris-b1 · 2017-11-08T16:29:00Z

Docs could be clearer here, but qcut does accept a ndarray, but requires it be 1-dimensional

In [6]: pd.qcut(d[:, 0], [0, 1])
Out[6]: 
[(-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0]]
Categories (1, interval[float64]): [(-0.001, 4.0]]

jdoepfert · 2017-11-10T10:28:26Z

I am going to update the docstring today

chris-b1 added Difficulty Novice Docs Error Reporting Incorrect or improved errors from pandas labels Nov 8, 2017

chris-b1 added this to the Next Major Release milestone Nov 8, 2017

jdoepfert mentioned this issue Nov 10, 2017

Add requirement for a 1-dimensional ndarray in the pd.qcut docstring #18211

Merged

1 task

jreback modified the milestones: Next Major Release, 0.22.0, 0.21.1 Nov 10, 2017

jreback closed this as completed in #18211 Nov 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pd.qcut doesn't seem to support ndarray #18173

BUG: pd.qcut doesn't seem to support ndarray #18173

pratapvardhan commented Nov 8, 2017 •

edited

Loading

chris-b1 commented Nov 8, 2017

jdoepfert commented Nov 10, 2017

BUG: pd.qcut doesn't seem to support ndarray #18173

BUG: pd.qcut doesn't seem to support ndarray #18173

Comments

pratapvardhan commented Nov 8, 2017 • edited Loading

Problem description

Output of pd.show_versions()

chris-b1 commented Nov 8, 2017

jdoepfert commented Nov 10, 2017

pratapvardhan commented Nov 8, 2017 •

edited

Loading

Output of `pd.show_versions()`