-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: cant modify df with duplicate index (#17105) #20939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20939 +/- ##
=========================================
Coverage ? 91.81%
=========================================
Files ? 153
Lines ? 49479
Branches ? 0
=========================================
Hits ? 45428
Misses ? 4051
Partials ? 0
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -1243,6 +1243,7 @@ Indexing | |||
- Bug in ``Series.is_unique`` where extraneous output in stderr is shown if Series contains objects with ``__ne__`` defined (:issue:`20661`) | |||
- Bug in ``.loc`` assignment with a single-element list-like incorrectly assigns as a list (:issue:`19474`) | |||
- Bug in partial string indexing on a ``Series/DataFrame`` with a monotonic decreasing ``DatetimeIndex`` (:issue:`19362`) | |||
- Fixed to allow modifying ``DataFrame`` with duplicate Index (:issue:`17105`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in performing in-place operations on a DataFrame
with a duplicate Index
.
|
||
|
||
def test_modify_with_duplicate_index(): | ||
trange = pd.date_range(start=pd.Timestamp(year=2017, month=1, day=1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number here. move the test to test_loc.py
(same dir)
|
||
# modify the value for the duplicate index entry | ||
df.loc[trange[bool_idx], "A"] = 7 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assert_frame_equal
; construct the expected frame and compare
df = pd.DataFrame(0, index=trange, columns=["A", "B"]) | ||
bool_idx = np.array([False, False, False, False, False, True]) | ||
|
||
# modify the value for the duplicate index entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine as a test, but please add another case like the original issue (e.g. +=
)
thanks! added the requested changes, hopefully its better now :) |
moved the test around, will merge on green. |
thanks @fersarr |
git diff upstream/master -u -- "*.py" | flake8 --diff
Fixing to allow the modification of DataFrames that have duplicate elements in the index. Previously it would fail with
See #17105 for a code snippet.
Replacing
zeros_like(objarray)
withzeros()
because the first unnecessarily returns an array of zeros with the same types asobjarray
. We only want the zeros, not the type, to be able to later compare against -1 and get an array as a result:The result of
zeros_like()
with dates gives a boolean after comparisonThe result of
zeros_like()
with numbers gives an array after comparison