TST: Add test for union with duplicates #40967

phofl · 2021-04-15T20:24:41Z

tests added / passed
Ensure all linting tests pass, see here for how to run them

jbrockmendel · 2021-04-15T20:58:13Z

pandas/core/algorithms.py

        left values which is ordered in front.
-    rvals: np.ndarray
+    rvals: ArrayLike


if we're getting EA here, then the np.append call below is likely going to do something unwanted like cast to object

pd_array casts back in this case

right, but object casting is costly

Ah got you now.

jbrockmendel · 2021-04-15T21:00:08Z

pandas/core/indexes/base.py

-            # "Union[ExtensionArray, ndarray]"; expected "ndarray"
-            # error: Argument 2 to "union_with_duplicates" has incompatible type
-            # "Union[ExtensionArray, ndarray]"; expected "ndarray"
-            result = algos.union_with_duplicates(lvals, rvals)  # type: ignore[arg-type]


i think we should only get here with ndarray[object] for both. if that can be confirmed, then can just do a cast.

Unfortunately with categorical we get there with an EA

so i think the right way to handle this will be to override ExtensionIndex._union (or maybe just CI._union)

jreback · 2021-04-16T00:44:40Z

pandas/core/indexes/base.py:3003: error: unused 'type: ignore' comment

jbrockmendel · 2021-04-16T23:09:55Z

pandas/core/algorithms.py

        right values ordered after lvals.

    Returns
    -------
-    np.ndarray containing the unsorted union of both arrays
+    ArrayLike containing the unsorted union of both arrays


im finding myself getting here on another branch. i think the pd_array call is causing problems. In the case I'm looking at, lvals and rvals are both all-string Categoricals, so the pd_array call gives a StringArray, which gums up the works later in Index.union

If we say that we should not get here with an extension dtype except categorical, we could remove the pd.array call?

Alternatively if we overwrite the categorical _union, this would be solved too?

Alternatively if we overwrite the categorical _union, this would be solved too?

I think we do this on ExtensionIndex, that way we can do cast(np.ndarray, lvals) in Index._union. All ExtensionIndex subclasses other than CategoricalIndex already override _union, so it should be equivalent

so i think the following works for CategoricalIndex._union

def _union(self, other: CategoricalIndex, sort) -> CategoricalIndex: # we only get here with matching dtypes lidx = self.astype(self.categories.dtype) ridx = other.astype(other.categories.dtype) result = lidx._union(ridx, sort=sort) cat = Categorical(result, dtype=self.dtype) return type(self)._simple_new(cat, name=self.name)

(this might also make it possible to remove a Categorical kludge in Index._wrap_setop_result)

The downside here is that the astype both makes a copy and loses potentially-cached is_monotonic/has_duplicates/is_unique information. It may be better to make a method just for _union_non_unique and only override that.

i take it back, that _union messes up when self.dtype.order is non-standard

…p_union � Conflicts: � pandas/core/algorithms.py � pandas/core/indexes/base.py

phofl · 2021-04-20T20:23:22Z

Thx @jbrockmendel

Could add a test additionally but no typing anymore

jreback · 2021-04-20T22:51:42Z

thanks @phofl

Fix typing for union_with_duplicates

9189c75

jbrockmendel reviewed Apr 15, 2021

View reviewed changes

jreback added the Typing type annotations, mypy/pyright type checking label Apr 16, 2021

jreback added this to the 1.3 milestone Apr 16, 2021

jbrockmendel reviewed Apr 16, 2021

View reviewed changes

jbrockmendel mentioned this pull request Apr 19, 2021

REF: remove Categorical._shallow_copy #41030

Merged

4 tasks

phofl added 2 commits April 20, 2021 22:18

Merge branch 'master' of https://github.com/pandas-dev/pandas into ty…

5af30b6

…p_union � Conflicts: � pandas/core/algorithms.py � pandas/core/indexes/base.py

Change test

df9664d

phofl changed the title ~~Fix typing for union_with_duplicates~~ TST: Add test for union with duplicates Apr 20, 2021

phofl added Testing pandas testing functions or related to the test suite and removed Typing type annotations, mypy/pyright type checking labels Apr 20, 2021

jreback merged commit 8b1430d into pandas-dev:master Apr 20, 2021

yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request Apr 21, 2021

TST: Add test for union with duplicates (pandas-dev#40967)

a3425dd

phofl deleted the typ_union branch April 21, 2021 20:40

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

TST: Add test for union with duplicates (pandas-dev#40967)

d696148

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Add test for union with duplicates #40967

TST: Add test for union with duplicates #40967

phofl commented Apr 15, 2021

jbrockmendel Apr 15, 2021

phofl Apr 15, 2021

jbrockmendel Apr 15, 2021

phofl Apr 15, 2021

jbrockmendel Apr 15, 2021

phofl Apr 15, 2021

jbrockmendel Apr 16, 2021

jreback commented Apr 16, 2021

jbrockmendel Apr 16, 2021

phofl Apr 16, 2021

jbrockmendel Apr 17, 2021

jbrockmendel Apr 17, 2021

jbrockmendel Apr 17, 2021

phofl commented Apr 20, 2021

jreback commented Apr 20, 2021

TST: Add test for union with duplicates #40967

TST: Add test for union with duplicates #40967

Conversation

phofl commented Apr 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl commented Apr 20, 2021

jreback commented Apr 20, 2021