ERR: cut/qcut need better error message when passing invalid input #13318

simonm3 · 2016-05-29T20:02:46Z

Labels=False means use integers as category names

To use the category labels I would expect to say labels=True but instead you have to say labels=None.

It seems illogical to say labels=None when you want labels.

jreback · 2016-05-30T12:44:09Z

Not sure what you are refering, labels=True is not valid it can only accept None, False, or a list-like.

True actually causes an error, so should check that. pls send a pull-request!

In [18]: pd.qcut(range(5), 4, labels=['good','bad','ugly','terrible'])
Out[18]: 
[good, good, bad, ugly, terrible]
Categories (4, object): [good < bad < ugly < terrible]

In [19]: pd.qcut(range(5), 4, labels=None)
Out[19]: 
[[0, 1], [0, 1], (1, 2], (2, 3], (3, 4]]
Categories (4, object): [[0, 1] < (1, 2] < (2, 3] < (3, 4]]

In [20]: pd.qcut(range(5), 4, labels=False)
Out[20]: array([0, 0, 1, 2, 3])

In [21]: pd.qcut(range(5), 4, labels=['good','bad','ugly','terrible'])
Out[21]: 
[good, good, bad, ugly, terrible]
Categories (4, object): [good < bad < ugly < terrible]

In [22]: pd.qcut(range(5), 4, labels=True)
TypeError: object of type 'bool' has no len()

simonm3 · 2016-05-30T12:55:03Z

Exactly. I was thinking labels are things like age20-30. So labels=false
means no labels use 1 2 3 4. Labels=(a,b,c) means use user defined
labels......and labels=true would mean use the system defined labels.

Labels=none suggests to me no labels. Labels=true suggests add labels.

I reckon most people would expect true to mean add labels rather than fail
with error
On 30 May 2016 1:44 p.m., "Jeff Reback" [email protected] wrote:

Not sure what you are refering, labels=True is not valid it can only
accept None, False, or a list-like.

True actually causes an error, so should check that. pls send a
pull-request!

In [18]: pd.qcut(range(5), 4, labels=['good','bad','ugly','terrible'])
Out[18]:
[good, good, bad, ugly, terrible]
Categories (4, object): [good < bad < ugly < terrible]

In [19]: pd.qcut(range(5), 4, labels=None)
Out[19]:
[[0, 1], [0, 1], (1, 2], (2, 3], (3, 4]]
Categories (4, object): [[0, 1] < (1, 2] < (2, 3] < (3, 4]]

In [20]: pd.qcut(range(5), 4, labels=False)
Out[20]: array([0, 0, 1, 2, 3])

In [21]: pd.qcut(range(5), 4, labels=['good','bad','ugly','terrible'])
Out[21]:
[good, good, bad, ugly, terrible]
Categories (4, object): [good < bad < ugly < terrible]

In [22]: pd.qcut(range(5), 4, labels=True)
TypeError: object of type 'bool' has no len()

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13318 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABJN6SvNjQ7wBr44z6fHk785F0OqYkCSks5qGturgaJpZM4IpXXy
.

jreback · 2016-05-30T12:58:58Z

doc-string

In [5]: pd.qcut?
Signature: pd.qcut(x, q, labels=None, retbins=False, precision=3)
Docstring:
Quantile-based discretization function. Discretize variable into
equal-sized buckets based on rank or based on sample quantiles. For example
1000 values for 10 quantiles would produce a Categorical object indicating
quantile membership for each data point.

Parameters
----------
x : ndarray or Series
q : integer or array of quantiles
    Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately
    array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles
labels : array or boolean, default None
    Used as labels for the resulting bins. Must be of the same length as
    the resulting bins. If False, return only integer indicators of the
    bins.
retbins : bool, optional
    Whether to return the bins or not. Can be useful if bins is given
    as a scalar.
precision : int
    The precision at which to store and display the bins labels

Returns
-------
out : Categorical or Series or array of integers if labels is False
    The return type (Categorical or Series) depends on the input: a Series
    of type category if input is a Series else Categorical. Bins are
    represented as categories when categorical data is returned.
bins : ndarray of floats
    Returned only if `retbins` is True.

Notes
-----
Out of bounds values will be NA in the resulting Categorical object

simonm3 · 2016-05-30T13:02:08Z

Yes it does what it says in the docs.

What I am saying is that it would be much clearer if labels=true were
defined as yes please add some labels.
On 30 May 2016 1:59 p.m., "Jeff Reback" [email protected] wrote:

doc-string

In [5]: pd.qcut?
Signature: pd.qcut(x, q, labels=None, retbins=False, precision=3)
Docstring:
Quantile-based discretization function. Discretize variable into
equal-sized buckets based on rank or based on sample quantiles. For example
1000 values for 10 quantiles would produce a Categorical object indicating
quantile membership for each data point.

Parameters

x : ndarray or Series
q : integer or array of quantiles
Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately
array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles
labels : array or boolean, default None
Used as labels for the resulting bins. Must be of the same length as
the resulting bins. If False, return only integer indicators of the
bins.
retbins : bool, optional
Whether to return the bins or not. Can be useful if bins is given
as a scalar.
precision : int
The precision at which to store and display the bins labels

Returns

out : Categorical or Series or array of integers if labels is False
The return type (Categorical or Series) depends on the input: a Series
of type category if input is a Series else Categorical. Bins are
represented as categories when categorical data is returned.
bins : ndarray of floats
Returned only if retbins is True.

Notes

Out of bounds values will be NA in the resulting Categorical object

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13318 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABJN6Soi6kUZdI7-rCjq7HNQw1JMyu4Pks5qGt8kgaJpZM4IpXXy
.

jreback · 2016-05-30T13:10:59Z

labels=True doesn't make any sense, you have to pass IN the labels. What does pls have labels mean? I think you can accidently think that the bin integers are actual labels. I would rather have a nice error message for labels=True.

simonm3 · 2016-05-30T16:58:20Z

Just my feedback as a new user of cut/qcut as to how it could be made more
intuitive.

Suggest you ask other new users what they think. I would imagine most would
say that if you want labels then setting labels=None does not seem
intuitive.

On 30 May 2016 at 14:11, Jeff Reback [email protected] wrote:

labels=True doesn't make any sense, you have to pass IN the labels. What
does pls have labels mean? I think you can accidently think that the bin
integers are actual labels. I would rather have a nice error message for
labels=True.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13318 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABJN6TtW-mx1bRaz_9gsQ_vtz8LpFQOWks5qGuH3gaJpZM4IpXXy
.

onesandzeroes · 2016-06-01T06:45:24Z

@jreback I'm not seeing why labels=True is obviously wrong, the docstring does say it accepts 'array or boolean' so I can see why people would try passing True.

But then OP is suggesting that labels=True should produce new behaviour with automatic labels like '{varname}{group_min}-{group_max}'. That seems reasonable enough as a default set of labels.

If we don't want the new behaviour maybe the docstring should just explicitly say 'array or False' so people don't try to pass True.

jreback · 2016-06-01T11:08:41Z

The purpose of this issue is to fix the doc-string and raise an approporite message on labels=True. The default IS to provide labels if labels are not overriden.

In [1]: pd.qcut(range(5), 4, labels=None)
Out[1]: 
[[0, 1], [0, 1], (1, 2], (2, 3], (3, 4]]
Categories (4, object): [[0, 1] < (1, 2] < (2, 3] < (3, 4]]

In [2]: pd.qcut(range(5), 4, labels=None).categories
Out[2]: Index([u'[0, 1]', u'(1, 2]', u'(2, 3]', u'(3, 4]'], dtype='object')

I suppose you could change this to default to labels=True to mean labels=None now. I think this would be backward compat (as it is specifically checking for False and not None).

Further not really sure labels=False is that useful anymore (before Categoricals were first class it might have been to provide numpy compat.

So if one of you wants to take this up and see what's possible w/o breaking anything (or just raise appropriately on labels=True) - go for it

simonm3 · 2016-06-01T11:25:47Z

not really sure labels=False is that useful anymore

I agree. It is the False that is confusing because it implies there is a
True; and the False is unnecessary as you can just use Range instead.

On 1 June 2016 at 12:09, Jeff Reback [email protected] wrote:

The purpose of this issue is to fix the doc-string and raise an
approporite message on labels=True. The default IS to provide labels if
labels are not overriden.

In [1]: pd.qcut(range(5), 4, labels=None)
Out[1]:
[[0, 1], [0, 1], (1, 2], (2, 3], (3, 4]]
Categories (4, object): [[0, 1] < (1, 2] < (2, 3] < (3, 4]]

In [2]: pd.qcut(range(5), 4, labels=None).categories
Out[2]: Index([u'[0, 1]', u'(1, 2]', u'(2, 3]', u'(3, 4]'], dtype='object')

I suppose you could change this to default to labels=True to mean
labels=None now. I think this would be backward compat (as it is
specifically checking for False and not None).

Further not really sure labels=False is that useful anymore (before
Categoricals were first class it might have been to provide numpy compat.

So if one of you wants to take this up and see what's possible w/o
breaking anything (or just raise appropriately on labels=True) - go for it

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#13318 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABJN6cNKVEaha1MlHr5ubTewA4cAtBQTks5qHWhMgaJpZM4IpXXy
.

jreback · 2016-06-01T12:07:23Z

well, the point is you don't need to normally specify labels as they are auto-generated by default.

labels=False just turns this off (which is what I say is a bit counter-intuitive). If you didn't allow a boolean there (the False) I don't think we would be having this discussion. labels would be just to specify your own specific ones.

as you can just use Range instead.

This is not very convenient; you unless you are also passing in bins you don't want to have the nbins parameter floating around in 2 different places.

ryankarlos · 2020-01-04T19:13:57Z

take

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas Difficulty Novice labels May 30, 2016

jreback added this to the Next Major Release milestone May 30, 2016

jreback changed the title ~~qcut and cut labels=True gives error~~ ERR: cut/qcut need better error message when passing invalid input May 30, 2016

Treesbark mentioned this issue Jul 16, 2017

ERR: Improved error message and updated doc in cut/qcut (issue 13318) #16982

Closed

3 tasks

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

jbrockmendel removed the Effort Low label Oct 21, 2019

jbrockmendel added the quantile quantile method label Nov 1, 2019

ryankarlos mentioned this issue Jan 4, 2020

ERR: Improve error message and doc for invalid labels in cut/qcut #30691

Merged

5 tasks

github-actions bot assigned ryankarlos Jan 4, 2020

jreback modified the milestones: Contributions Welcome, 1.0 Jan 7, 2020

jreback closed this as completed in #30691 Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERR: cut/qcut need better error message when passing invalid input #13318

ERR: cut/qcut need better error message when passing invalid input #13318

simonm3 commented May 29, 2016

jreback commented May 30, 2016

simonm3 commented May 30, 2016

jreback commented May 30, 2016

simonm3 commented May 30, 2016

Parameters

Returns

Notes

jreback commented May 30, 2016

simonm3 commented May 30, 2016

onesandzeroes commented Jun 1, 2016

jreback commented Jun 1, 2016

simonm3 commented Jun 1, 2016

jreback commented Jun 1, 2016

ryankarlos commented Jan 4, 2020

ERR: cut/qcut need better error message when passing invalid input #13318

ERR: cut/qcut need better error message when passing invalid input #13318

Comments

simonm3 commented May 29, 2016

jreback commented May 30, 2016

simonm3 commented May 30, 2016

jreback commented May 30, 2016

simonm3 commented May 30, 2016

Parameters

Returns

Notes

jreback commented May 30, 2016

simonm3 commented May 30, 2016

onesandzeroes commented Jun 1, 2016

jreback commented Jun 1, 2016

simonm3 commented Jun 1, 2016

jreback commented Jun 1, 2016

ryankarlos commented Jan 4, 2020