Skip to content

DOC: add warning to append about inefficiency #16956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

shanral
Copy link
Contributor

@shanral shanral commented Jul 15, 2017

@pep8speaks
Copy link

pep8speaks commented Jul 15, 2017

Hello @shanral! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 16, 2017 at 16:02 Hours UTC

@codecov
Copy link

codecov bot commented Jul 15, 2017

Codecov Report

Merging #16956 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16956      +/-   ##
==========================================
+ Coverage   90.98%   90.99%   +<.01%     
==========================================
  Files         161      161              
  Lines       49288    49288              
==========================================
+ Hits        44846    44849       +3     
+ Misses       4442     4439       -3
Flag Coverage Δ
#multiple 88.76% <ø> (+0.02%) ⬆️
#single 40.2% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.76% <ø> (-0.1%) ⬇️
pandas/core/series.py 94.89% <ø> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 65.05% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96168ef...1106007. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 15, 2017

Codecov Report

Merging #16956 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16956      +/-   ##
==========================================
- Coverage   90.98%   90.97%   -0.02%     
==========================================
  Files         161      161              
  Lines       49288    49293       +5     
==========================================
- Hits        44846    44844       -2     
- Misses       4442     4449       +7
Flag Coverage Δ
#multiple 88.74% <100%> (ø) ⬆️
#single 40.2% <58.33%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.76% <100%> (-0.1%) ⬇️
pandas/core/series.py 94.89% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/indexes/datetimes.py 95.23% <0%> (-0.1%) ⬇️
pandas/core/dtypes/cast.py 86.89% <0%> (ø) ⬆️
pandas/core/generic.py 92.29% <0%> (ø) ⬆️
pandas/core/config_init.py 96% <0%> (+0.03%) ⬆️
pandas/plotting/_core.py 82.72% <0%> (+0.2%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96168ef...ea4cc30. Read the comment docs.

@gfyoung gfyoung added Docs Performance Memory or execution speed performance labels Jul 15, 2017
for nb, content in contents.items():
with open(nb, 'wt') as f:
f.write(content)
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this try-except here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to fix an issue if a doc build failed halfway through due to a missing dependency or ^C. I worked on this fix with @TomAugspurger, but am no longer to reproduce it's success. I am removing it from further pull requests.

-----
Iteratively appending to a series can be more computationally intense
than a single concatenate. A better solution is to append values to a
list then concatenate the list with the original series all at once.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

series --> Series
intense --> intensive
"list then" --> "list and then"

Iteratively appending rows to a Dataframe can be more computationally
intense than a single concatenate. A better solution is to append those
rows to a list then concatenate the list with the original Dataframe
all at once.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intense --> intensive
"list then" --> "list and then"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show some pseudo code

@@ -4653,6 +4653,32 @@ def append(self, other, ignore_index=False, verify_integrity=False):
2 5 6
3 7 8

The following, while not a recommended method for generating a
DataFrame, illustrates how to efficiently generate a DataFrame from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't call this "efficient" since you just said it wasn't efficient.

multiple data sources.

Less efficient:
>>> df = pd.DataFrame(columns=['A'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline between "Less efficient" and ">>> df = ..."

than a single concatenate. A better solution is to append values to a
list then concatenate the list with the original series all at once.
list and then concatenate the list with the original series all at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

series --> Series

@shanral shanral closed this Jul 19, 2017
@shanral shanral deleted the append_doc branch July 19, 2017 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: add warning to append about inefficiency
4 participants