Skip to content

PERF: faster unstacking #15510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

PERF: faster unstacking #15510

wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Feb 26, 2017

closes #15503

so on a non-masked unstack (IOW, a fully product multi-index for example), this is now just
a simple reshape. On a masked unstack, it now will have a much lower O constant, as its in cython, and with release the GIL.

0.19.2 / master

In [2]: m = 100
   ...: n = 1000
   ...: 
   ...: levels = np.arange(m)
   ...: index = pd.MultiIndex.from_product([levels]*2)
   ...: columns = np.arange(n)
   ...: values = np.arange(m*m*n).reshape(m*m, n)
   ...: df = pd.DataFrame(values, index, columns)
   ...: 

In [3]: %timeit df.unstack()
1 loop, best of 3: 285 ms per loop

In [4]: df2 = df.iloc[:-1]

In [5]: %timeit df2.unstack()
1 loop, best of 3: 306 ms per loop

PR

In [2]: %timeit df.unstack()
10 loops, best of 3: 70 ms per loop

In [3]: df2 = df.iloc[:-1]

# & releasing the GIL here.
In [4]: %timeit df2.unstack()
1 loop, best of 3: 191 ms per loop

@jreback jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 26, 2017
@jreback jreback added this to the 0.20.0 milestone Feb 26, 2017
@jreback jreback force-pushed the reshape3 branch 3 times, most recently from 21958b7 to 1bfa04c Compare February 26, 2017 18:44
@codecov-io
Copy link

codecov-io commented Feb 26, 2017

Codecov Report

Merging #15510 into master will decrease coverage by -0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15510      +/-   ##
==========================================
- Coverage   91.04%   91.02%   -0.03%     
==========================================
  Files         136      136              
  Lines       49088    49105      +17     
==========================================
+ Hits        44694    44698       +4     
- Misses       4394     4407      +13
Impacted Files Coverage Δ
pandas/core/reshape.py 99.27% <100%> (+0.02%)
pandas/io/gbq.py 25% <0%> (-58.34%)
pandas/tools/merge.py 91.78% <0%> (-0.35%)
pandas/util/testing.py 81.87% <0%> (-0.19%)
pandas/core/frame.py 97.82% <0%> (-0.1%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd67704...ec29226. Read the comment docs.

@jreback
Copy link
Contributor Author

jreback commented Feb 27, 2017

cc @wesm if you have a chance.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Don't love expansions of our sprawling Cython codebase, but this seems like a solid win as a pretty central data manipulation.

@jreback jreback closed this in 09360d8 Mar 5, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#15503

Author: Jeff Reback <[email protected]>

Closes pandas-dev#15510 from jreback/reshape3 and squashes the following commits:

ec29226 [Jeff Reback] PERF: faster unstacking
@@ -182,9 +185,21 @@ def get_new_values(self):
stride = values.shape[1]
result_width = width * stride
result_shape = (length, result_width)
mask = self.mask
mask_all = mask.all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback couldn use some help grokking how we get here. In Block._unstack adding an assertion assert mask.all() doesn't break any tests. is that something we can rely on? (if so we can simplify code a good bit) If not, how can we construct a counter-example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF/API: fast paths for product MultiIndex?
4 participants