BUG: df.loc setitem-with-expansion with duplicate index #40096

jbrockmendel · 2021-02-27T02:25:44Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

also CategoricalIndex.insert

jbrockmendel · 2021-02-27T02:26:24Z

phofl · 2021-02-27T18:04:25Z

pandas/tests/indexing/test_loc.py

+
+        exp_index = index.insert(len(index), key)
+        if isinstance(index, MultiIndex):
+            exp_index = index.insert(len(index), key)


I think this is not needed?

LGTM otherwise

yep, thanks

pandas/core/indexes/extension.py

jreback · 2021-02-27T18:30:39Z

pandas/tests/indexing/test_categorical.py

-            df.loc["d", "A"] = 10
-        with pytest.raises(TypeError, match=msg):
-            df.loc["d", "C"] = 10
+        # Setting-with-expansion with a new key "d" that is not among caegories


might be worth splitting this test (e.g. errors to a new tests) and even these cases

jreback · 2021-02-27T18:30:53Z

pandas/tests/indexing/test_loc.py

+        expected = DataFrame(exp_data, index=exp_index, columns=[0])
+
+        # Add new row, but no new columns
+        df = orig.copy()


same might be worth splitting this up a bit

jreback · 2021-02-27T18:31:07Z

pandas/core/indexing.py

@@ -1641,7 +1641,12 @@ def _setitem_with_indexer(self, indexer, value, name="iloc"):
                    # so the object is the same
                    index = self.obj._get_axis(i)
                    labels = index.insert(len(index), key)
-                    self.obj._mgr = self.obj.reindex(labels, axis=i)._mgr
+                    taker = list(range(len(index))) + [-1]


can you add comments on what is happening here

prob doesn't make much difference but can you just use np.arange here?

In [2]: %timeit list(np.arange(10)) 1.6 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [3]: %timeit list(range(10)) 238 ns ± 7.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

sure that's small sample, is it always really small like this?

needs to have ensure_int_platform generally (you can directly construct the np array like that)

we always use np.arange for this so changing patterns is not helpful

sure that's small sample, is it always really small like this?

In [2]: N = 10**6 In [3]: %timeit list(np.arange(N)) 79.6 ms ± 3.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [4]: %timeit list(range(N)) 32.9 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

wait, better idea, never mind.

N = 10 In [5]: %timeit arr = np.asarray(list(range(N)) + [1], dtype=np.intp) 2.52 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [6]: %timeit arr = np.arange(N + 1); arr[-1] = -1 533 ns ± 11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) N = 10**6 In [8]: %timeit arr = np.arange(N + 1); arr[-1] = -1 397 µs ± 24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [9]: %timeit arr = np.asarray(list(range(N)) + [1], dtype=np.intp) 98.6 ms ± 437 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

jreback · 2021-02-27T18:31:43Z

I think we should deprecate indexing expansion, pretty sure we have an issue for this.

jbrockmendel · 2021-03-02T15:36:23Z

updted per comments + greenish

jbrockmendel added 2 commits February 26, 2021 18:24

BUG: df.loc setitem-with-expansion with duplicate index

3507373

GH ref

09eb6c5

Merge branch 'master' into bug-ci-insert

05e9fc3

phofl reviewed Feb 27, 2021

View reviewed changes

jreback requested changes Feb 27, 2021

View reviewed changes

jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 27, 2021

jbrockmendel added 2 commits March 1, 2021 19:52

Merge branch 'master' into bug-ci-insert

8a9ee17

comments, split test

0afec63

jreback added this to the 1.3 milestone Mar 2, 2021

avoid constructing list

a280932

jreback approved these changes Mar 2, 2021

View reviewed changes

jreback merged commit 13a97c2 into pandas-dev:master Mar 2, 2021

jbrockmendel deleted the bug-ci-insert branch March 2, 2021 22:16

jbrockmendel mentioned this pull request Mar 2, 2021

BUG: Index.insert should coerce to the appropriate type #16277

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: df.loc setitem-with-expansion with duplicate index #40096

BUG: df.loc setitem-with-expansion with duplicate index #40096

jbrockmendel commented Feb 27, 2021

jbrockmendel commented Feb 27, 2021

phofl Feb 27, 2021

jbrockmendel Mar 2, 2021

jreback Feb 27, 2021

jreback Feb 27, 2021

jreback Feb 27, 2021

jreback Mar 2, 2021

jbrockmendel Mar 2, 2021

jreback Mar 2, 2021

jbrockmendel Mar 2, 2021

jbrockmendel Mar 2, 2021

jbrockmendel Mar 2, 2021

jreback commented Feb 27, 2021

jbrockmendel commented Mar 2, 2021

BUG: df.loc setitem-with-expansion with duplicate index #40096

BUG: df.loc setitem-with-expansion with duplicate index #40096

Conversation

jbrockmendel commented Feb 27, 2021

jbrockmendel commented Feb 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 27, 2021

jbrockmendel commented Mar 2, 2021