Skip to content

PERF: don't create the skiprows set if using the c-parser #13005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Apr 26, 2016

In [4]: DataFrame(np.random.randn(1000000,1)).to_csv('test.csv',index=False)

branch

In [1]: %memit pd.read_csv('test.csv',skiprows=999999)
peak memory: 65.74 MiB, increment: 1.59 MiB

In [2]: %memit pd.read_csv('test.csv',skiprows=999999)
peak memory: 65.89 MiB, increment: 0.22 MiB

In [3]: %memit pd.read_csv('test.csv',skiprows=999999)
peak memory: 65.98 MiB, increment: 0.28 MiB

master

In [1]: %memit pd.read_csv('test.csv',skiprows=999999)
peak memory: 169.84 MiB, increment: 105.79 MiB

In [2]: %memit pd.read_csv('test.csv',skiprows=999999)
peak memory: 171.27 MiB, increment: 24.11 MiB

In [3]: %memit pd.read_csv('test.csv',skiprows=999999)
peak memory: 173.39 MiB, increment: 24.63 MiB

@jreback jreback added Performance Memory or execution speed performance IO CSV read_csv, to_csv labels Apr 26, 2016
@jreback jreback added this to the 0.18.1 milestone Apr 26, 2016
@jreback
Copy link
Contributor Author

jreback commented Apr 26, 2016

@gfyoung I believe this is handled internally in the c-parser.

@gfyoung
Copy link
Member

gfyoung commented Apr 27, 2016

@jreback : Travis and I both agree. LGTM otherwise.

@jreback jreback closed this in b8921ac Apr 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants