Skip to content

BUG: pd.qcut doesn't seem to support ndarray #18173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pratapvardhan opened this issue Nov 8, 2017 · 2 comments · Fixed by #18211
Closed

BUG: pd.qcut doesn't seem to support ndarray #18173

pratapvardhan opened this issue Nov 8, 2017 · 2 comments · Fixed by #18211
Labels
Docs Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@pratapvardhan
Copy link
Contributor

pratapvardhan commented Nov 8, 2017

pd.qcut doesn't seem to support ndarray type. However, docstrings points x : ndarray or Series

In [46]: d = range(5)

In [47]: d
Out[47]: [0, 1, 2, 3, 4]

In [48]: pd.qcut(d, [0, 1])
Out[48]:
[(-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0]]
Categories (1, interval[float64]): [(-0.001, 4.0]]

In [49]: d = np.array(range(5))

In [50]: d
Out[50]: array([0, 1, 2, 3, 4])

In [51]: pd.qcut(d, [0, 1])
Out[51]:
[(-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0]]
Categories (1, interval[float64]): [(-0.001, 4.0]]

In [52]: d = np.array([[x] for x in range(5)])

In [53]: type(d)
Out[53]: numpy.ndarray

In [54]: d
Out[54]:
array([[0],
       [1],
       [2],
       [3],
       [4]])

In [55]: pd.qcut(d, [0, 1])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-b172e74b0ffd> in <module>()
----> 1 pd.qcut(d, [0, 1])

e:\github\pandas\pandas\core\reshape\tile.pyc in qcut(x, q, labels, retbins, precision, duplicates)
    206     fac, bins = _bins_to_cuts(x, bins, labels=labels,
    207                               precision=precision, include_lowest=True,
--> 208                               dtype=dtype, duplicates=duplicates)
    209
    210     return _postprocess_for_cut(fac, bins, retbins, x_is_series,

e:\github\pandas\pandas\core\reshape\tile.pyc in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates)
    258
    259         np.putmask(ids, na_mask, 0)
--> 260         result = algos.take_nd(labels, ids - 1)
    261
    262     else:

e:\github\pandas\pandas\core\algorithms.pyc in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
   1318     if is_categorical(arr):
   1319         return arr.take_nd(indexer, fill_value=fill_value,
-> 1320                            allow_fill=allow_fill)
   1321     elif is_datetimetz(arr):
   1322         return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)

e:\github\pandas\pandas\core\categorical.pyc in take_nd(self, indexer, allow_fill, fill_value)
   1703         assert isna(fill_value)
   1704
-> 1705         codes = take_1d(self._codes, indexer, allow_fill=True, fill_value=-1)
   1706         result = self._constructor(codes, categories=self.categories,
   1707                                    ordered=self.ordered, fastpath=True)

e:\github\pandas\pandas\core\algorithms.pyc in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
   1381     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,
   1382                                  mask_info=mask_info)
-> 1383     func(arr, indexer, out, fill_value)
   1384
   1385     if flip_order:

e:\github\pandas\pandas\_libs\algos_take_helper.pxi in pandas._libs.algos.take_1d_int8_int8 (pandas\_libs\algos.c:75951)()
    560 @cython.boundscheck(False)
    561 def take_1d_int8_int8(ndarray[int8_t, ndim=1] values,
--> 562                               int64_t[:] indexer,
    563                               int8_t[:] out,
    564                               fill_value=np.nan):

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Problem description

pd.qcut doesn't seem to support ndarray type. However, docstrings points x : ndarray or Series.

Is this expected behavior or does it imply ndarray means for shape (n, ) only?

I can confirm this was working in earlier versions.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.22.0.dev0+16.gd70526b
pytest: 3.2.0
pip: 9.0.1
setuptools: 36.2.7
Cython: 0.24.1
numpy: 1.12.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999999999
sqlalchemy: 1.0.13
pymysql: 0.7.9.None
psycopg2: 2.7.3.1 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@chris-b1
Copy link
Contributor

chris-b1 commented Nov 8, 2017

Docs could be clearer here, but qcut does accept a ndarray, but requires it be 1-dimensional

In [6]: pd.qcut(d[:, 0], [0, 1])
Out[6]: 
[(-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0], (-0.001, 4.0]]
Categories (1, interval[float64]): [(-0.001, 4.0]]

@chris-b1 chris-b1 added Difficulty Novice Docs Error Reporting Incorrect or improved errors from pandas labels Nov 8, 2017
@chris-b1 chris-b1 added this to the Next Major Release milestone Nov 8, 2017
@jdoepfert
Copy link
Contributor

I am going to update the docstring today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants