Skip to content

Allow drop bins when using the cut function #20947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 10, 2018
Merged

Conversation

FANGOD
Copy link
Contributor

@FANGOD FANGOD commented May 4, 2018

  • if cut(x, bins=[0, 1, 2, 3, 4, 5, 6, 6, 8, 8, 10], labels=False, retbins=True, right=False) will raise ValueError: You can drop duplicate edges by setting the 'duplicates' kwarg, so add 'duplicates' parameters to the cut function.

@codecov
Copy link

codecov bot commented May 4, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@8e2a4a9). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20947   +/-   ##
=========================================
  Coverage          ?   91.82%           
=========================================
  Files             ?      153           
  Lines             ?    49488           
  Branches          ?        0           
=========================================
  Hits              ?    45443           
  Misses            ?     4045           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.22% <ø> (?)
#single 41.85% <ø> (?)
Impacted Files Coverage Δ
pandas/core/reshape/tile.py 93.37% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e2a4a9...46c9bca. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok would need:

  • a doc-strings update
  • tests, see how testing is done for qcut

@jreback jreback added Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode Interval Interval data type labels May 4, 2018
@pep8speaks
Copy link

pep8speaks commented May 5, 2018

Hello @FANGOD! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 10, 2018 at 18:27 Hours UTC

@FANGOD
Copy link
Contributor Author

FANGOD commented May 5, 2018

duplicates=drop drop non-uniques

>>> pd.cut(s, [0, 2, 4, 6, 10, 10], labels=False, retbins=True,
...    right=False, duplicates='drop')
... # doctest: +ELLIPSIS
(a    0.0
 b    1.0
 c    2.0
 d    3.0
 e    3.0
 dtype: float64, array([0, 2, 4, 6, 8]))

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the doc-string is fine, but we need an actual test (checking the duplicates kw in test_tile.py), follow the pattern of how qcut is tested

also need a whatsnew entry

@@ -65,6 +65,8 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
The precision at which to store and display the bins labels.
include_lowest : bool, default False
Whether the first interval should be left-inclusive or not.
duplicates : {default 'raise', 'drop'}, optional
If bin edges are not unique, raise ValueError or drop non-uniques.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag

add a Raises part of the doc-string

@FANGOD
Copy link
Contributor Author

FANGOD commented May 9, 2018

I don't know what to do. Your ci is too 666.

@@ -65,6 +65,12 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
The precision at which to store and display the bins labels.
include_lowest : bool, default False
Whether the first interval should be left-inclusive or not.
duplicates : {default 'raise', 'drop'}, optional
If bin edges are not unique, raise ValueError("Bin edges must be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont need to show the error message here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead add a Returns section in the doc-string, showing that a ValueError can be raised

e 4.0
dtype: float64, array([0, 2, 4, 6, 8]))

``duplicates=drop`` drop non-uniques
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a sentence



class TestCut(object):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add a new file, put in pandas/tests/reshaping/test_tile.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't find it before.

@jreback
Copy link
Contributor

jreback commented May 10, 2018

ok linted and added a test. ping on green.

@jreback jreback added this to the 0.23.0 milestone May 10, 2018
@jreback jreback merged commit c30f0bb into pandas-dev:master May 10, 2018
@jreback
Copy link
Contributor

jreback commented May 10, 2018

thanks @FANGOD

@FANGOD
Copy link
Contributor Author

FANGOD commented May 11, 2018

Finally. Good experience.

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 13, 2018
topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants