Skip to content

BUG: Replace internal use of loc with reindex in DataFrame append #26022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 21, 2019

Conversation

cbertinato
Copy link
Contributor

@cbertinato cbertinato commented Apr 7, 2019

The issue here was that the DataFrame append method was using .loc, which only throws a warning now, but would eventually throw a KeyError whenever that went into effect. Just swapped out that use of .loc for .reindex.

@codecov
Copy link

codecov bot commented Apr 7, 2019

Codecov Report

Merging #26022 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26022      +/-   ##
==========================================
- Coverage   91.82%   91.81%   -0.01%     
==========================================
  Files         175      175              
  Lines       52551    52551              
==========================================
- Hits        48256    48252       -4     
- Misses       4295     4299       +4
Flag Coverage Δ
#multiple 90.38% <100%> (ø) ⬆️
#single 40.72% <0%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 96.79% <100%> (-0.12%) ⬇️
pandas/io/gbq.py 75% <0%> (-12.5%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c613c8...1bce70f. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 7, 2019

Codecov Report

Merging #26022 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26022      +/-   ##
==========================================
- Coverage   91.82%   91.82%   -0.01%     
==========================================
  Files         175      175              
  Lines       52539    52539              
==========================================
- Hits        48246    48242       -4     
- Misses       4293     4297       +4
Flag Coverage Δ
#multiple 90.38% <100%> (ø) ⬆️
#single 40.73% <0%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 96.79% <100%> (-0.12%) ⬇️
pandas/io/gbq.py 75% <0%> (-12.5%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6de8133...4944b99. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always write tests first; how else would know if this works?

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 7, 2019
@cbertinato
Copy link
Contributor Author

Sorry about that. In progress.

result = df.append(dicts, ignore_index=True, sort=True)
expected = df.append(DataFrame(dicts), ignore_index=True, sort=True)
assert_frame_equal(result, expected)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate out test and reference issue number as a comment under test function def.

@cbertinato
Copy link
Contributor Author

Prior to this fix, the test still would have passed, but a FutureWarning would have been thrown, and eventually a KeyError. Should I also throw in a check that a FutureWarning and/or KeyError is not thrown?

@codecov
Copy link

codecov bot commented Apr 8, 2019

Codecov Report

Merging #26022 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26022      +/-   ##
==========================================
- Coverage   91.98%   91.97%   -0.01%     
==========================================
  Files         175      175              
  Lines       52377    52377              
==========================================
- Hits        48180    48176       -4     
- Misses       4197     4201       +4
Flag Coverage Δ
#multiple 90.53% <100%> (ø) ⬆️
#single 40.71% <0%> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 96.9% <100%> (-0.12%) ⬇️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc86509...ff96e0c. Read the comment docs.

@gfyoung
Copy link
Member

gfyoung commented Apr 8, 2019

Should I also throw in a check that a FutureWarning and/or KeyError is not thrown?

The way that the test is written is fine. Just need to separate it out.

@@ -325,7 +325,7 @@ Indexing
^^^^^^^^

- Improved exception message when calling :meth:`DataFrame.iloc` with a list of non-numeric objects (:issue:`25753`).
-
- Bug in which :meth:`DataFrame.append` produced a warning indicating that a KeyError will be thrown in the future when the data to be appended contains new columns (:issue:`22252`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks on KeyError. make this clear that this warning should not have been shown.

@@ -167,6 +167,15 @@ def test_append_list_of_series_dicts(self):
expected = df.append(DataFrame(dicts), ignore_index=True, sort=True)
assert_frame_equal(result, expected)

# GH22252
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a new test

@cbertinato cbertinato force-pushed the 22252-df-append branch 2 times, most recently from f050f5f to 9ce63ff Compare April 11, 2019 17:06
@cbertinato
Copy link
Contributor Author

Anything else to do here?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test comment, ping on green.

df = DataFrame(np.random.randn(5, 4),
columns=['foo', 'bar', 'baz', 'qux'])

dicts = [{'foo': 9}, {'bar': 10}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put an assert_produces_warning(None) around this

@jreback jreback added this to the 0.25.0 milestone Apr 19, 2019
@cbertinato cbertinato force-pushed the 22252-df-append branch 2 times, most recently from 5f1ac55 to f7d372d Compare April 20, 2019 03:41
@jreback
Copy link
Contributor

jreback commented Apr 20, 2019

lgtm. can you merge master; ping on green.

@cbertinato
Copy link
Contributor Author

All green!

@jreback jreback merged commit 84fa2ef into pandas-dev:master Apr 21, 2019
@jreback
Copy link
Contributor

jreback commented Apr 21, 2019

thanks @cbertinato

ryanreh99 added a commit to ryanreh99/pandas that referenced this pull request Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A future error warning and a problem with autoconversion of types
3 participants