Skip to content

PERF: faster dataframe construction from recarray #44827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 9, 2021

Conversation

GYHHAHA
Copy link
Contributor

@GYHHAHA GYHHAHA commented Dec 9, 2021

n = int(1e6)
df = pd.DataFrame({"A": [.0]*n})
arr = df.to_records(index=False)

In [1]: %timeit pd.DataFrame(arr)
3.64 s ± 77.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- master
2.23 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) <- PR

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Dec 9, 2021

How to add test case for performance pr? I have no experience on this.

@jbrockmendel
Copy link
Member

How to add test case for performance pr? I have no experience on this.

for this you can just show some %timeit results comparing the PR to the status quo

@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Dec 9, 2021

added @jbrockmendel

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback jreback added this to the 1.4 milestone Dec 9, 2021
@jreback jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 9, 2021
@jreback jreback merged commit a2316f3 into pandas-dev:master Dec 9, 2021
@jreback
Copy link
Contributor

jreback commented Dec 9, 2021

thanks @GYHHAHA

@GYHHAHA GYHHAHA deleted the patch-1 branch December 10, 2021 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: dataframe construction from recarray is slow
3 participants