BUG: _nsorted incorrect with duplicated values in index (#13412) #14707

mroeschke · 2016-11-22T04:26:16Z

closes BUG: in _nsorted for frame with duplicated values index #13412
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

DataFrame.nlargest and DataFrame.nsmallest does not behave correctly when there's duplicate values in the index.

Any feedback is appreciated!

jreback · 2016-11-22T23:53:03Z

I am not sold on this impl. Can you see if there are perf tests (asv) for this, and give a check.

mroeschke · 2016-11-23T18:52:32Z

There wasn't an existing benchmark for DataFrame.nlargest. I wrote my own and I got an after/before ratio of 1.37, so there is some speed regression.

Do you have any performance recommendations @jreback if I use the merge method? Or should I look into a new solution?

codecov-io · 2016-11-23T19:00:24Z

Current coverage is 85.27% (diff: 100%)

Merging #14707 into master will decrease coverage by <.01%

@@             master     #14707   diff @@
==========================================
  Files           144        144          
  Lines         50946      50949     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43445      43447     +2   
- Misses         7501       7502     +1   
  Partials          0          0

Powered by Codecov. Last update c0e13d1...d03be72

jreback · 2016-11-23T20:46:16Z

can you add similar tests for series when duplicates are there the algo might work already, idk.

further I would move _nsorted to algorithms. maybe rename select_n -> select_n_series and call this one select_n_frame

jreback · 2016-11-23T20:46:42Z

the perf is fine. this is a pretty cheap operation.

mroeschke · 2016-11-24T04:54:25Z

Tests passed for Series & reorganized the code @jreback
(Looks like a Travis-CI job failed due to connectivity)

jreback · 2016-11-25T13:32:07Z

pandas/core/algorithms.py

@@ -717,6 +717,17 @@ def select_n(series, n, keep, method):
    return dropped.iloc[inds]


+def select_n_frame(frame, columns, n, method, keep):
+    from pandas.core.series import Series
+    if not is_list_like(columns):


thanks for moving. If you are interested in next PR! would want to clean up this code w.r.t. largest/smallest a bit more anyhow; IOW, make it cleaner / nicer. (and I think we have a perf issue outstanding I think with Series).

jreback · 2016-11-25T13:32:41Z

pandas/core/algorithms.py

@@ -717,6 +717,17 @@ def select_n(series, n, keep, method):
    return dropped.iloc[inds]


+def select_n_frame(frame, columns, n, method, keep):
+    from pandas.core.series import Series


can you add doc-strings to select_n_* methods?. otherwise lgtm. ping on green.

mroeschke · 2016-11-26T03:05:47Z

ping @jreback. Thanks for your help! I'll also look into reorganizing the nlargest/nsmallest code.

jreback · 2016-12-04T18:02:17Z

pandas/tests/frame/test_analytics.py

@@ -1323,6 +1323,34 @@ def test_nsmallest_multiple_columns(self):
        expected = df.sort_values(['a', 'c']).head(5)
        tm.assert_frame_equal(result, expected)

+    def test_nsmallest_nlargest_duplicate_index(self):
+        df = pd.DataFrame({'a': [1, 2, 3, 4],


can you add the issue number as a comment

jreback · 2016-12-04T18:02:57Z

trivial change (and needs rebase). ping on green.

…3412) Add note to whatsnew Add nlargest benchmark Add tests for Series organize nsorted methods pep 8 fixes passed test and pep8 add docstrings add github issue

mroeschke · 2016-12-05T05:32:14Z

ping @jreback. Thanks!

jreback · 2016-12-06T11:35:01Z

thanks!

closes #13412 closes #14707 (cherry picked from commit 6e514da)

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 22, 2016

mroeschke force-pushed the fix_13412 branch from 9b1ca18 to fdfaa97 Compare November 23, 2016 18:48

mroeschke force-pushed the fix_13412 branch from fdfaa97 to b414a3e Compare November 24, 2016 00:39

jreback added this to the 0.19.2 milestone Nov 25, 2016

jreback reviewed Nov 25, 2016

View reviewed changes

mroeschke force-pushed the fix_13412 branch from b414a3e to 0e6ea02 Compare November 25, 2016 20:29

jreback reviewed Dec 4, 2016

View reviewed changes

BUG: _nsorted incorrect with duplicated values in index (pandas-dev#1…

d03be72

…3412) Add note to whatsnew Add nlargest benchmark Add tests for Series organize nsorted methods pep 8 fixes passed test and pep8 add docstrings add github issue

mroeschke force-pushed the fix_13412 branch from 0e6ea02 to d03be72 Compare December 4, 2016 19:10

jreback closed this in 6e514da Dec 6, 2016

jorisvandenbossche pushed a commit that referenced this pull request Dec 15, 2016

BUG: _nsorted incorrect with duplicated values in index

11eb8ab

closes #13412 closes #14707 (cherry picked from commit 6e514da)

mroeschke deleted the fix_13412 branch December 20, 2017 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: _nsorted incorrect with duplicated values in index (#13412) #14707

BUG: _nsorted incorrect with duplicated values in index (#13412) #14707

mroeschke commented Nov 22, 2016

jreback commented Nov 22, 2016

mroeschke commented Nov 23, 2016

codecov-io commented Nov 23, 2016 •

edited

Loading

jreback commented Nov 23, 2016

jreback commented Nov 23, 2016

mroeschke commented Nov 24, 2016

jreback Nov 25, 2016

jreback Nov 25, 2016

mroeschke commented Nov 26, 2016

jreback Dec 4, 2016

jreback commented Dec 4, 2016

mroeschke commented Dec 5, 2016

jreback commented Dec 6, 2016

BUG: _nsorted incorrect with duplicated values in index (#13412) #14707

BUG: _nsorted incorrect with duplicated values in index (#13412) #14707

Conversation

mroeschke commented Nov 22, 2016

jreback commented Nov 22, 2016

mroeschke commented Nov 23, 2016

codecov-io commented Nov 23, 2016 • edited Loading

Current coverage is 85.27% (diff: 100%)

jreback commented Nov 23, 2016

jreback commented Nov 23, 2016

mroeschke commented Nov 24, 2016

jreback Nov 25, 2016

Choose a reason for hiding this comment

jreback Nov 25, 2016

Choose a reason for hiding this comment

mroeschke commented Nov 26, 2016

jreback Dec 4, 2016

Choose a reason for hiding this comment

jreback commented Dec 4, 2016

mroeschke commented Dec 5, 2016

jreback commented Dec 6, 2016

codecov-io commented Nov 23, 2016 •

edited

Loading