Skip to content

DataFrame.__setitem__ performance regression with object dtype #19299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jan 18, 2018 · 2 comments · Fixed by #55938
Closed

DataFrame.__setitem__ performance regression with object dtype #19299

TomAugspurger opened this issue Jan 18, 2018 · 2 comments · Fixed by #55938
Labels
Benchmark Performance (ASV) benchmarks good first issue Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 18, 2018

This is between 0.20.3 and 0.21.1. Haven't narrowed further yet. Note that

  • It's specific to setitem, getitem is fine
  • It's specific to object dtype
In [1]: import pandas as pd
   ...: pd.__version__
   ...:
Out[1]: '0.20.3'

In [2]: df = pd.DataFrame(index=range(1000), columns=range(100), dtype=object)

In [3]: %timeit df.loc[0, 1] = 1.0
132 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [1]: import pandas as pd
   ...: pd.__version__
   ...:
Out[1]: '0.21.1'

In [2]: df = pd.DataFrame(index=range(1000), columns=range(100), dtype=object)

In [3]: %timeit df.loc[0, 1] = 1.0
1.64 ms ± 70.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This is present on master. I haven't tried profiling yet.

@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jan 18, 2018
@pandas-dev pandas-dev deleted a comment from desasaliho Jan 18, 2018
@mroeschke
Copy link
Member

Performance looks better now. Could use an asv

In [46]: In [2]: df = pd.DataFrame(index=range(1000), columns=range(100), dtype=object)
    ...:
    ...: In [3]: %timeit df.loc[0, 1] = 1.0
45.3 µs ± 313 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@mroeschke mroeschke added Benchmark Performance (ASV) benchmarks good first issue and removed Performance Memory or execution speed performance labels Jun 12, 2021
@jreback
Copy link
Contributor

jreback commented Jun 12, 2021

that's a cached result though but looks fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks good first issue Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants