Skip to content

PERF: avoid printing object in Dtype.construct_from_string message #26776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

@codecov
Copy link

codecov bot commented Jun 11, 2019

Codecov Report

Merging #26776 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26776      +/-   ##
==========================================
- Coverage   91.72%   91.72%   -0.01%     
==========================================
  Files         178      178              
  Lines       50779    50781       +2     
==========================================
- Hits        46578    46577       -1     
- Misses       4201     4204       +3
Flag Coverage Δ
#multiple 90.31% <100%> (ø) ⬆️
#single 41.05% <100%> (-0.23%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/base.py 100% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.94% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 157a4e3...f2017d7. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Jun 11, 2019

Codecov Report

Merging #26776 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26776      +/-   ##
==========================================
- Coverage   91.72%   91.72%   -0.01%     
==========================================
  Files         178      178              
  Lines       50779    50781       +2     
==========================================
- Hits        46578    46577       -1     
- Misses       4201     4204       +3
Flag Coverage Δ
#multiple 90.31% <100%> (ø) ⬆️
#single 41.05% <100%> (-0.23%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/base.py 100% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.94% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 157a4e3...f2017d7. Read the comment docs.

@TomAugspurger
Copy link
Contributor

Can you post the ASV for one of the affected benchmarks?

@jreback jreback added the Performance Memory or execution speed performance label Jun 11, 2019
@jreback jreback added this to the 0.25.0 milestone Jun 11, 2019
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Jun 11, 2019

Master:

In [1]: import scipy.sparse

In [2]: N = 1000 
   ...: sparse = scipy.sparse.rand(N, N, 0.005)

In [3]: import warnings

In [4]: warnings.simplefilter('ignore', FutureWarning)

In [5]: %timeit pd.SparseDataFrame(sparse)
2.7 s ± 512 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This PR:

In [5]: %timeit pd.SparseDataFrame(sparse)
430 ms ± 9.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@jreback
Copy link
Contributor

jreback commented Jun 11, 2019

well that's a difference!

though, why is the construction of the SDF so much? (even now)

@jorisvandenbossche
Copy link
Member Author

though, why is the construction of the SDF so much? (even now)

It's quite a big matrix / DataFrame (1000 columns), and we need to create a SparseArray for each column of the sparse matrix. So there is some conversion needed. It might be possible to optimize, although the DataFrame.sparse.from_spmatrix, which uses a totally different implementation, also takes around 400 ms, so quite similar.

@jreback jreback merged commit 646ff0b into pandas-dev:master Jun 12, 2019
@jreback
Copy link
Contributor

jreback commented Jun 12, 2019

thanks @jorisvandenbossche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants