Skip to content

BUG: cant modify df with duplicate index (#17105) #20939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 8, 2018

Conversation

fersarr
Copy link
Contributor

@fersarr fersarr commented May 3, 2018

Fixing to allow the modification of DataFrames that have duplicate elements in the index. Previously it would fail with

AttributeError: 'bool' object has no attribute 'any'

See #17105 for a code snippet.

Replacing zeros_like(objarray) with zeros() because the first unnecessarily returns an array of zeros with the same types as objarray. We only want the zeros, not the type, to be able to later compare against -1 and get an array as a result:

The result of zeros_like() with dates gives a boolean after comparison

>>> myarr_fromindex = np.zeros_like(pd.DatetimeIndex([2,3]))
>>> myarr_fromindex
array(['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
      dtype='datetime64[ns]')
>>> 
>>> type(myarr_fromindex)
<type 'numpy.ndarray'>
>>>
>>> myarr_fromindex == -1
False

The result of zeros_like() with numbers gives an array after comparison

>>> 
>>> 
>>> myarr_fromarr = np.zeros_like([2,3])
>>> myarr_fromarr
array([0, 0])
>>> type(myarr_fromarr)
<type 'numpy.ndarray'>
>>> myarr_fromarr == -1
array([False, False])
>>> 

@codecov
Copy link

codecov bot commented May 3, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@620784f). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20939   +/-   ##
=========================================
  Coverage          ?   91.81%           
=========================================
  Files             ?      153           
  Lines             ?    49479           
  Branches          ?        0           
=========================================
  Hits              ?    45428           
  Misses            ?     4051           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.2% <100%> (?)
#single 41.85% <0%> (?)
Impacted Files Coverage Δ
pandas/core/indexing.py 93.55% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 620784f...cf5ec7d. Read the comment docs.

@@ -1243,6 +1243,7 @@ Indexing
- Bug in ``Series.is_unique`` where extraneous output in stderr is shown if Series contains objects with ``__ne__`` defined (:issue:`20661`)
- Bug in ``.loc`` assignment with a single-element list-like incorrectly assigns as a list (:issue:`19474`)
- Bug in partial string indexing on a ``Series/DataFrame`` with a monotonic decreasing ``DatetimeIndex`` (:issue:`19362`)
- Fixed to allow modifying ``DataFrame`` with duplicate Index (:issue:`17105`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in performing in-place operations on a DataFrame with a duplicate Index.



def test_modify_with_duplicate_index():
trange = pd.date_range(start=pd.Timestamp(year=2017, month=1, day=1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here. move the test to test_loc.py (same dir)


# modify the value for the duplicate index entry
df.loc[trange[bool_idx], "A"] = 7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assert_frame_equal; construct the expected frame and compare

df = pd.DataFrame(0, index=trange, columns=["A", "B"])
bool_idx = np.array([False, False, False, False, False, True])

# modify the value for the duplicate index entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine as a test, but please add another case like the original issue (e.g. +=)

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels May 3, 2018
@fersarr
Copy link
Contributor Author

fersarr commented May 3, 2018

thanks! added the requested changes, hopefully its better now :)

@jreback jreback added this to the 0.23.0 milestone May 4, 2018
@jreback
Copy link
Contributor

jreback commented May 4, 2018

moved the test around, will merge on green.

@jreback jreback merged commit d15c104 into pandas-dev:master May 8, 2018
@jreback
Copy link
Contributor

jreback commented May 8, 2018

thanks @fersarr

jreback added a commit to jreback/pandas that referenced this pull request May 8, 2018
jreback added a commit to jreback/pandas that referenced this pull request May 8, 2018
jreback added a commit that referenced this pull request May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

indexing.py: "'bool' object has no attribtute 'any'" with duplicate time index
2 participants