Skip to content

Commit caa4a83

Browse files
committed
Merge branch 'main' into bug_46673
2 parents 8e53d06 + a62897a commit caa4a83

File tree

18 files changed

+189
-288
lines changed

18 files changed

+189
-288
lines changed

doc/source/user_guide/sparse.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -266,8 +266,8 @@ have no replacement.
266266

267267
.. _sparse.scipysparse:
268268

269-
Interaction with scipy.sparse
270-
-----------------------------
269+
Interaction with *scipy.sparse*
270+
-------------------------------
271271

272272
Use :meth:`DataFrame.sparse.from_spmatrix` to create a :class:`DataFrame` with sparse values from a sparse matrix.
273273

doc/source/user_guide/timeseries.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,7 @@ We subtract the epoch (midnight at January 1, 1970 UTC) and then floor divide by
388388
389389
.. _timeseries.origin:
390390

391-
Using the ``origin`` Parameter
391+
Using the ``origin`` parameter
392392
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
393393

394394
Using the ``origin`` parameter, one can specify an alternative starting point for creation
@@ -1523,7 +1523,7 @@ or calendars with additional rules.
15231523
15241524
.. _timeseries.advanced_datetime:
15251525

1526-
Time series-related instance methods
1526+
Time Series-related instance methods
15271527
------------------------------------
15281528

15291529
Shifting / lagging
@@ -2601,7 +2601,7 @@ Transform nonexistent times to ``NaT`` or shift the times.
26012601
26022602
.. _timeseries.timezone_series:
26032603

2604-
Time zone series operations
2604+
Time zone Series operations
26052605
~~~~~~~~~~~~~~~~~~~~~~~~~~~
26062606

26072607
A :class:`Series` with time zone **naive** values is

doc/source/user_guide/visualization.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
{{ header }}
44

55
*******************
6-
Chart Visualization
6+
Chart visualization
77
*******************
88

99
This section demonstrates visualization through charting. For information on
@@ -1746,7 +1746,7 @@ Andrews curves charts:
17461746
17471747
plt.close("all")
17481748
1749-
Plotting directly with matplotlib
1749+
Plotting directly with Matplotlib
17501750
---------------------------------
17511751

17521752
In some situations it may still be preferable or necessary to prepare plots

doc/source/user_guide/window.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
{{ header }}
44

55
********************
6-
Windowing Operations
6+
Windowing operations
77
********************
88

99
pandas contains a compact set of APIs for performing windowing operations - an operation that performs
@@ -490,7 +490,7 @@ For all supported aggregation functions, see :ref:`api.functions_expanding`.
490490

491491
.. _window.exponentially_weighted:
492492

493-
Exponentially Weighted window
493+
Exponentially weighted window
494494
-----------------------------
495495

496496
An exponentially weighted window is similar to an expanding window but with each prior point

doc/source/whatsnew/v1.5.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1028,6 +1028,7 @@ Reshaping
10281028
Sparse
10291029
^^^^^^
10301030
- Bug in :meth:`Series.where` and :meth:`DataFrame.where` with ``SparseDtype`` failing to retain the array's ``fill_value`` (:issue:`45691`)
1031+
- Bug in :meth:`SparseArray.unique` fails to keep original elements order (:issue:`47809`)
10311032
-
10321033

10331034
ExtensionArray

pandas/core/arrays/sparse/array.py

+16-6
Original file line numberDiff line numberDiff line change
@@ -821,7 +821,7 @@ def shift(self: SparseArrayT, periods: int = 1, fill_value=None) -> SparseArrayT
821821

822822
def _first_fill_value_loc(self):
823823
"""
824-
Get the location of the first missing value.
824+
Get the location of the first fill value.
825825
826826
Returns
827827
-------
@@ -834,14 +834,24 @@ def _first_fill_value_loc(self):
834834
if not len(indices) or indices[0] > 0:
835835
return 0
836836

837-
diff = indices[1:] - indices[:-1]
838-
return np.searchsorted(diff, 2) + 1
837+
# a number larger than 1 should be appended to
838+
# the last in case of fill value only appears
839+
# in the tail of array
840+
diff = np.r_[np.diff(indices), 2]
841+
return indices[(diff > 1).argmax()] + 1
839842

840843
def unique(self: SparseArrayT) -> SparseArrayT:
841844
uniques = algos.unique(self.sp_values)
842-
fill_loc = self._first_fill_value_loc()
843-
if fill_loc >= 0:
844-
uniques = np.insert(uniques, fill_loc, self.fill_value)
845+
if len(self.sp_values) != len(self):
846+
fill_loc = self._first_fill_value_loc()
847+
# Inorder to align the behavior of pd.unique or
848+
# pd.Series.unique, we should keep the original
849+
# order, here we use unique again to find the
850+
# insertion place. Since the length of sp_values
851+
# is not large, maybe minor performance hurt
852+
# is worthwhile to the correctness.
853+
insert_loc = len(algos.unique(self.sp_values[:fill_loc]))
854+
uniques = np.insert(uniques, insert_loc, self.fill_value)
845855
return type(self)._from_sequence(uniques, dtype=self.dtype)
846856

847857
def _values_for_factorize(self):

pandas/tests/arrays/sparse/test_array.py

+23-10
Original file line numberDiff line numberDiff line change
@@ -391,23 +391,36 @@ def test_setting_fill_value_updates():
391391

392392

393393
@pytest.mark.parametrize(
394-
"arr, loc",
394+
"arr,fill_value,loc",
395395
[
396-
([None, 1, 2], 0),
397-
([0, None, 2], 1),
398-
([0, 1, None], 2),
399-
([0, 1, 1, None, None], 3),
400-
([1, 1, 1, 2], -1),
401-
([], -1),
396+
([None, 1, 2], None, 0),
397+
([0, None, 2], None, 1),
398+
([0, 1, None], None, 2),
399+
([0, 1, 1, None, None], None, 3),
400+
([1, 1, 1, 2], None, -1),
401+
([], None, -1),
402+
([None, 1, 0, 0, None, 2], None, 0),
403+
([None, 1, 0, 0, None, 2], 1, 1),
404+
([None, 1, 0, 0, None, 2], 2, 5),
405+
([None, 1, 0, 0, None, 2], 3, -1),
406+
([None, 0, 0, 1, 2, 1], 0, 1),
407+
([None, 0, 0, 1, 2, 1], 1, 3),
402408
],
403409
)
404-
def test_first_fill_value_loc(arr, loc):
405-
result = SparseArray(arr)._first_fill_value_loc()
410+
def test_first_fill_value_loc(arr, fill_value, loc):
411+
result = SparseArray(arr, fill_value=fill_value)._first_fill_value_loc()
406412
assert result == loc
407413

408414

409415
@pytest.mark.parametrize(
410-
"arr", [[1, 2, np.nan, np.nan], [1, np.nan, 2, np.nan], [1, 2, np.nan]]
416+
"arr",
417+
[
418+
[1, 2, np.nan, np.nan],
419+
[1, np.nan, 2, np.nan],
420+
[1, 2, np.nan],
421+
[np.nan, 1, 0, 0, np.nan, 2],
422+
[np.nan, 0, 0, 1, 2, 1],
423+
],
411424
)
412425
@pytest.mark.parametrize("fill_value", [np.nan, 0, 1])
413426
def test_unique_na_fill(arr, fill_value):

pandas/tests/scalar/timedelta/test_timedelta.py

+2
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
iNaT,
1515
)
1616
from pandas._libs.tslibs.dtypes import NpyDatetimeUnit
17+
from pandas.compat import IS64
1718
from pandas.errors import OutOfBoundsTimedelta
1819

1920
import pandas as pd
@@ -690,6 +691,7 @@ def test_round_implementation_bounds(self):
690691
with pytest.raises(OverflowError, match=msg):
691692
Timedelta.max.ceil("s")
692693

694+
@pytest.mark.xfail(not IS64, reason="Failing on 32 bit build", strict=False)
693695
@given(val=st.integers(min_value=iNaT + 1, max_value=lib.i8max))
694696
@pytest.mark.parametrize(
695697
"method", [Timedelta.round, Timedelta.floor, Timedelta.ceil]

pandas/tests/scalar/timestamp/test_unary_ops.py

+2
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
)
2222
from pandas._libs.tslibs.dtypes import NpyDatetimeUnit
2323
from pandas._libs.tslibs.period import INVALID_FREQ_ERR_MSG
24+
from pandas.compat import IS64
2425
import pandas.util._test_decorators as td
2526

2627
import pandas._testing as tm
@@ -297,6 +298,7 @@ def test_round_implementation_bounds(self):
297298
with pytest.raises(OverflowError, match=msg):
298299
Timestamp.max.ceil("s")
299300

301+
@pytest.mark.xfail(not IS64, reason="Failing on 32 bit build", strict=False)
300302
@given(val=st.integers(iNaT + 1, lib.i8max))
301303
@pytest.mark.parametrize(
302304
"method", [Timestamp.round, Timestamp.floor, Timestamp.ceil]

web/pandas/config.yml

+26-8
Original file line numberDiff line numberDiff line change
@@ -118,17 +118,27 @@ sponsors:
118118
url: https://www.twosigma.com/
119119
logo: /static/img/partners/two_sigma.svg
120120
kind: partner
121-
description: "Phillip Cloud, Jeff Reback"
122-
- name: "Ursa Labs"
123-
url: https://ursalabs.org/
124-
logo: /static/img/partners/ursa_labs.svg
121+
description: "Jeff Reback"
122+
- name: "Voltron Data"
123+
url: https://voltrondata.com/
124+
logo: /static/img/partners/voltron_data.svg
125125
kind: partner
126-
description: "Wes McKinney, Joris Van den Bossche"
126+
description: "Joris Van den Bossche"
127127
- name: "d-fine GmbH"
128128
url: https://www.d-fine.com/en/
129129
logo: /static/img/partners/dfine.svg
130130
kind: partner
131131
description: "Patrick Hoefler"
132+
- name: "Quansight"
133+
url: https://quansight.com/
134+
logo: /static/img/partners/quansight_labs.svg
135+
kind: partner
136+
description: "Marco Gorelli"
137+
- name: "Nvidia"
138+
url: https://www.nvidia.com
139+
logo: /static/img/partners/nvidia.svg
140+
kind: partner
141+
description: "Matthew Roeschke"
132142
- name: "Tidelift"
133143
url: https://tidelift.com
134144
logo: /static/img/partners/tidelift.svg
@@ -139,6 +149,11 @@ sponsors:
139149
logo: /static/img/partners/czi.svg
140150
kind: regular
141151
description: "<i>pandas</i> is funded by the Essential Open Source Software for Science program of the Chan Zuckerberg Initiative. The funding is used for general maintenance, improve extension types, and a efficient string type."
152+
- name: "Bodo"
153+
url: https://www.bodo.ai/
154+
logo: /static/img/partners/bodo.svg
155+
kind: regular
156+
description: "Bodo's parallel computing platform uses pandas API, and Bodo financially supports pandas development to help improve pandas, in particular the pandas API"
142157
inkind: # not included in active so they don't appear in the home page
143158
- name: "OVH"
144159
url: https://us.ovhcloud.com/
@@ -152,10 +167,13 @@ sponsors:
152167
kind: partner
153168
- name: "Anaconda"
154169
url: https://www.anaconda.com/
155-
logo: /static/img/partners/anaconda.svg
156170
kind: partner
157171
- name: "RStudio"
158172
url: https://www.rstudio.com/
159-
logo: /static/img/partners/r_studio.svg
160173
kind: partner
161-
description: "Wes McKinney"
174+
- name: "Ursa Labs"
175+
url: https://ursalabs.org/
176+
kind: partner
177+
- name: "Gousto"
178+
url: https://www.gousto.co.uk/
179+
kind: partner

0 commit comments

Comments
 (0)