Skip to content

BUG: Issue with pd.cut on Series with duplicate index #42448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 28, 2021

Conversation

debnathshoham
Copy link
Member

Creating a new PR. I had messed up the earlier one by mistake.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the fix @debnathshoham

If you want, and it's easy, would be useful to add type annotations at least to x and bins, and maybe add a short docstring to explain what _bins_to_cuts does. After working on this you surely have a clear idea, and would be very useful for future contributions.

Thanks!

@@ -44,7 +45,7 @@


def cut(
x,
x: AnyArrayLike,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't type in the same PR, so revert this

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 14, 2021
@jreback jreback added this to the 1.4 milestone Jul 14, 2021
@debnathshoham debnathshoham requested a review from jreback July 16, 2021 18:42
@debnathshoham
Copy link
Member Author

please review

@datapythonista
Copy link
Member

Can you merge master, and fix the problems in the CI please? Ping us when it's green. Thanks!

@debnathshoham
Copy link
Member Author

@datapythonista @jreback Green

@@ -417,11 +419,11 @@ def _bins_to_cuts(
else:
bins = unique_bins

side = "left" if right else "right"
side: Union[Literal["left"], Literal["right"]] = "left" if right else "right"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think the typing actually is useful here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually without this, there was a mypy error 9814800. Which was resolved in 2211736, where I added this typing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls rebase on master. i don't think this is necessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased, and removed the typing annotation

@@ -265,6 +265,7 @@ Groupby/resample/rolling
Reshaping
^^^^^^^^^
- :func:`concat` creating :class:`MultiIndex` with duplicate level entries when concatenating a :class:`DataFrame` with duplicates in :class:`Index` and multiple keys (:issue:`42651`)
-Bug in :meth:`pandas.cut` on :class:`Series` with duplicate indices (:issue:`42185`) and non-exact :meth:`pandas.CategoricalIndex` (:issue:`42425`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is failing the CI

/home/runner/work/pandas/pandas/doc/source/whatsnew/v1.4.0.rst:282: WARNING: Bullet list ends without a blank line; unexpected unindent.

e.g. need a space after the '-'

and pls rebase once again CI should then be green.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased. I don't think the doctest failures are related. Although still failing mypy typing

/home/runner/work/pandas/pandas/pandas/io/formats/style_render.py:649: DocTestFailure
/home/runner/work/pandas/pandas/pandas/io/formats/style_render.py:1254: UnexpectedException

@jreback jreback merged commit fda3162 into pandas-dev:master Jul 28, 2021
@jreback
Copy link
Contributor

jreback commented Jul 28, 2021

thanks @debnathshoham

@debnathshoham debnathshoham deleted the duplicate-cut branch July 28, 2021 03:41
CGe0516 pushed a commit to CGe0516/pandas that referenced this pull request Jul 29, 2021
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.cut raises with duplicated index and include_lowest=True
3 participants