Skip to content

BUG: pd.cut with duplicate index Series lowest included #42425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
47e3db1
BUG: pd.cut with duplicate index Series lowest inclued
debnathshoham Jul 7, 2021
a9b6e5f
REF: disambiguate get_loc_level k variable (#42378)
jbrockmendel Jul 7, 2021
0a9cfcf
CI: update vm image version for Azure (#42419)
fangchenli Jul 7, 2021
62147ea
added tests
debnathshoham Jul 8, 2021
f8bcd67
updated whatsnew
debnathshoham Jul 8, 2021
da4ea96
corrected # GH code to 42185
debnathshoham Jul 8, 2021
2500a23
DEPR: treating dt64 as UTC in Timestamp constructor (#42288)
jbrockmendel Jul 8, 2021
82eb380
PERF/REGR: revert #41785 (#42338)
jbrockmendel Jul 8, 2021
ec0fdb7
Update doc/source/whatsnew/v1.3.1.rst
debnathshoham Jul 8, 2021
35b338e
BUG: .loc failing to drop first level (#42435)
jbrockmendel Jul 8, 2021
8d64fe9
BUG: truncate has incorrect behavior when index has only one unique v…
neelmraman Jul 8, 2021
487aafb
CLN: clean doc validation script (#42436)
fangchenli Jul 8, 2021
afeb35e
Fix Formatting Issue (#42438)
9t8 Jul 8, 2021
29e6dc0
DOC: Add more instructions for updating the whatsnew (#42427)
rhshadrach Jul 8, 2021
4653b6a
DOC fix the incorrect doc style in 1.2.1 (#42386)
debnathshoham Jul 8, 2021
8240473
BUG: pd.cut with duplicate index Series lowest inclued
debnathshoham Jul 7, 2021
52f53fd
added tests
debnathshoham Jul 8, 2021
9cb8dbb
updated whatsnew
debnathshoham Jul 8, 2021
a74c2f6
corrected # GH code to 42185
debnathshoham Jul 8, 2021
57b5ddb
updated test func name and code
debnathshoham Jul 8, 2021
6498e50
made suggested changes
debnathshoham Jul 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ jobs:
- template: ci/azure/posix.yml
parameters:
name: macOS
vmImage: macOS-10.14
vmImage: macOS-10.15

- template: ci/azure/windows.yml
parameters:
name: Windows
vmImage: vs2017-win2016
vmImage: windows-2019

- job: py38_32bit
pool:
Expand Down
16 changes: 15 additions & 1 deletion doc/source/development/contributing_codebase.rst
Original file line number Diff line number Diff line change
Expand Up @@ -812,7 +812,21 @@ Changes should be reflected in the release notes located in ``doc/source/whatsne
This file contains an ongoing change log for each release. Add an entry to this file to
document your fix, enhancement or (unavoidable) breaking change. Make sure to include the
GitHub issue number when adding your entry (using ``:issue:`1234``` where ``1234`` is the
issue/pull request number).
issue/pull request number). Your entry should be written using full sentences and proper
grammar.

When mentioning parts of the API, use a Sphinx ``:func:``, ``:meth:``, or ``:class:``
directive as appropriate. Not all public API functions and methods have a
documentation page; ideally links would only be added if they resolve. You can
usually find similar examples by checking the release notes for one of the previous
versions.

If your code is a bugfix, add your entry to the relevant bugfix section. Avoid
adding to the ``Other`` section; only in rare cases should entries go there.
Being as concise as possible, the description of the bug should include how the
user may encounter it and an indication of the bug itself, e.g.
"produces incorrect results" or "incorrectly raises". It may be necessary to also
indicate the new behavior.

If your code is an enhancement, it is most likely necessary to add usage
examples to the existing documentation. This can be done following the section
Expand Down
90 changes: 52 additions & 38 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -338,19 +338,20 @@ maps labels to their new names along the default axis, is allowed to be passed b

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> df = pd.DataFrame([[1]])
>>> df.rename({0: 1}, {0: 2})
In [1]: df = pd.DataFrame([[1]])
In [2]: df.rename({0: 1}, {0: 2})
Out[2]:
FutureWarning: ...Use named arguments to resolve ambiguity...
2
1 1

*pandas 1.0.0*

.. code-block:: python
.. code-block:: ipython

>>> df.rename({0: 1}, {0: 2})
In [3]: df.rename({0: 1}, {0: 2})
Traceback (most recent call last):
...
TypeError: rename() takes from 1 to 2 positional arguments but 3 were given
Expand All @@ -359,26 +360,28 @@ Note that errors will now be raised when conflicting or potentially ambiguous ar

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> df.rename({0: 1}, index={0: 2})
In [4]: df.rename({0: 1}, index={0: 2})
Out[4]:
0
1 1

>>> df.rename(mapper={0: 1}, index={0: 2})
In [5]: df.rename(mapper={0: 1}, index={0: 2})
Out[5]:
0
2 1

*pandas 1.0.0*

.. code-block:: python
.. code-block:: ipython

>>> df.rename({0: 1}, index={0: 2})
In [6]: df.rename({0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'

>>> df.rename(mapper={0: 1}, index={0: 2})
In [7]: df.rename(mapper={0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'
Expand All @@ -405,12 +408,12 @@ Extended verbose info output for :class:`~pandas.DataFrame`

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> df = pd.DataFrame({"int_col": [1, 2, 3],
In [1]: df = pd.DataFrame({"int_col": [1, 2, 3],
... "text_col": ["a", "b", "c"],
... "float_col": [0.0, 0.1, 0.2]})
>>> df.info(verbose=True)
In [2]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
Expand Down Expand Up @@ -440,14 +443,16 @@ Extended verbose info output for :class:`~pandas.DataFrame`

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> pd.array(["a", None])
In [1]: pd.array(["a", None])
Out[1]:
<PandasArray>
['a', None]
Length: 2, dtype: object

>>> pd.array([1, None])
In [2]: pd.array([1, None])
Out[2]:
<PandasArray>
[1, None]
Length: 2, dtype: object
Expand All @@ -470,15 +475,17 @@ As a reminder, you can specify the ``dtype`` to disable all inference.

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> a = pd.array([1, 2, None], dtype="Int64")
>>> a
In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

>>> a[2]
In [3]: a[2]
Out[3]:
nan

*pandas 1.0.0*
Expand All @@ -499,9 +506,10 @@ will now raise.

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> np.asarray(a, dtype="float")
In [1]: np.asarray(a, dtype="float")
Out[1]:
array([ 1., 2., nan])

*pandas 1.0.0*
Expand All @@ -525,9 +533,10 @@ will now be ``pd.NA`` instead of ``np.nan`` in presence of missing values

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> pd.Series(a).sum(skipna=False)
In [1]: pd.Series(a).sum(skipna=False)
Out[1]:
nan

*pandas 1.0.0*
Expand All @@ -543,9 +552,10 @@ integer dtype for the values.

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
In [1]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[1]:
dtype('int64')

*pandas 1.0.0*
Expand All @@ -565,15 +575,17 @@ Comparison operations on a :class:`arrays.IntegerArray` now returns a

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> a = pd.array([1, 2, None], dtype="Int64")
>>> a
In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

>>> a > 1
In [3]: a > 1
Out[3]:
array([False, True, False])

*pandas 1.0.0*
Expand Down Expand Up @@ -640,9 +652,10 @@ scalar values in the result are instances of the extension dtype's scalar type.

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> df.resample("2D").agg(lambda x: 'a').A.dtype
In [1]> df.resample("2D").agg(lambda x: 'a').A.dtype
Out[1]:
CategoricalDtype(categories=['a', 'b'], ordered=False)

*pandas 1.0.0*
Expand All @@ -657,9 +670,10 @@ depending on how the results are cast back to the original dtype.

*pandas 0.25.x*

.. code-block:: python
.. code-block:: ipython

>>> df.resample("2D").agg(lambda x: 'c')
In [1] df.resample("2D").agg(lambda x: 'c')
Out[1]:

A
0 NaN
Expand Down Expand Up @@ -871,10 +885,10 @@ matplotlib directly rather than :meth:`~DataFrame.plot`.

To use pandas formatters with a matplotlib plot, specify

.. code-block:: python
.. code-block:: ipython

>>> import pandas as pd
>>> pd.options.plotting.matplotlib.register_converters = True
In [1]: import pandas as pd
In [2]: pd.options.plotting.matplotlib.register_converters = True

Note that plots created by :meth:`DataFrame.plot` and :meth:`Series.plot` *do* register the converters
automatically. The only behavior change is when plotting a date-like object via ``matplotlib.pyplot.plot``
Expand Down
32 changes: 19 additions & 13 deletions doc/source/whatsnew/v1.2.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,30 +52,34 @@ DataFrame / Series combination) would ignore the indices, only match
the inputs by shape, and use the index/columns of the first DataFrame for
the result:

.. code-block:: python
.. code-block:: ipython

>>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1])
... df2 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2])
>>> df1
In [1]: df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1])
In [2]: df2 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2])
In [3]: df1
Out[3]:
a b
0 1 3
1 2 4
>>> df2
In [4]: df2
Out[4]:
a b
1 1 3
2 2 4

>>> np.add(df1, df2)
In [5]: np.add(df1, df2)
Out[5]:
a b
0 2 6
1 4 8

This contrasts with how other pandas operations work, which first align
the inputs:

.. code-block:: python
.. code-block:: ipython

>>> df1 + df2
In [6]: df1 + df2
Out[6]:
a b
0 NaN NaN
1 3.0 7.0
Expand All @@ -94,20 +98,22 @@ objects (eg ``np.add(s1, s2)``) already aligns and continues to do so.
To avoid the warning and keep the current behaviour of ignoring the indices,
convert one of the arguments to a NumPy array:

.. code-block:: python
.. code-block:: ipython

>>> np.add(df1, np.asarray(df2))
In [7]: np.add(df1, np.asarray(df2))
Out[7]:
a b
0 2 6
1 4 8

To obtain the future behaviour and silence the warning, you can align manually
before passing the arguments to the ufunc:

.. code-block:: python
.. code-block:: ipython

>>> df1, df2 = df1.align(df2)
>>> np.add(df1, df2)
In [8]: df1, df2 = df1.align(df2)
In [9]: np.add(df1, df2)
Out[9]:
a b
0 NaN NaN
1 3.0 7.0
Expand Down
4 changes: 3 additions & 1 deletion doc/source/whatsnew/v1.3.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Fixed regressions
~~~~~~~~~~~~~~~~~
- Pandas could not be built on PyPy (:issue:`42355`)
- :class:`DataFrame` constructed with with an older version of pandas could not be unpickled (:issue:`42345`)
- Performance regression in constructing a :class:`DataFrame` from a dictionary of dictionaries (:issue:`42338`)
-

.. ---------------------------------------------------------------------------
Expand All @@ -24,7 +25,8 @@ Fixed regressions

Bug fixes
~~~~~~~~~
-

-Bug in :meth:`pandas.cut` on :class:`Series` with duplicate indices (:issue:`42185`) and non-exact :meth:`pandas.CategoricalIndex` (:issue:`42425`)
-

.. ---------------------------------------------------------------------------
Expand Down
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Deprecations
- Deprecated :meth:`Index.is_type_compatible` (:issue:`42113`)
- Deprecated ``method`` argument in :meth:`Index.get_loc`, use ``index.get_indexer([label], method=...)`` instead (:issue:`42269`)
- Deprecated treating integer keys in :meth:`Series.__setitem__` as positional when the index is a :class:`Float64Index` not containing the key, a :class:`IntervalIndex` with no entries containing the key, or a :class:`MultiIndex` with leading :class:`Float64Index` level not containing the key (:issue:`33469`)
- Deprecated treating ``numpy.datetime64`` objects as UTC times when passed to the :class:`Timestamp` constructor along with a timezone. In a future version, these will be treated as wall-times. To retain the old behavior, use ``Timestamp(dt64).tz_localize("UTC").tz_convert(tz)`` (:issue:`24559`)
-

.. ---------------------------------------------------------------------------
Expand Down Expand Up @@ -212,7 +213,7 @@ Interval
Indexing
^^^^^^^^
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` when passing a string, the return type depended on whether the index was monotonic (:issue:`24892`)
-
- Bug in :meth:`DataFrame.truncate` and :meth:`Series.truncate` when the object's Index has a length greater than one but only one unique value (:issue:`42365`)

Missing
^^^^^^^
Expand Down
1 change: 1 addition & 0 deletions pandas/_libs/lib.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ def is_string_array(values: np.ndarray, skipna: bool = False): ...
def is_float_array(values: np.ndarray, skipna: bool = False): ...
def is_integer_array(values: np.ndarray, skipna: bool = False): ...
def is_bool_array(values: np.ndarray, skipna: bool = False): ...
def fast_multiget(mapping: dict, keys: np.ndarray, default=np.nan) -> np.ndarray: ...
def fast_unique_multiple_list_gen(gen: Generator, sort: bool = True) -> list: ...
def fast_unique_multiple_list(lists: list, sort: bool = True) -> list: ...
def fast_unique_multiple(arrays: list, sort: bool = True) -> list: ...
Expand Down
Loading