Skip to content

PERF: 3x speedup in Series of dicts with datetime keys by not having error message scale with input #24743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 13, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jan 12, 2019

Surprisingly, the majority of the time spent in constructing a Series from a dict with datetime-like keys is spent formatting the keys into strings for an error message that gets suppressed.

As there's a test ensuring the string case makes it into the error message, we preserve the behavior there. In non-string cases, we mirror how the other Dtypes handle this case and do not include the input.

$ asv compare v0.24.0rc1 HEAD -s
       before           after         ratio
     [fdc4db25]       [5c82eab5]
     <v0.24.0rc1^0>       <series_dict_speedup>
-      1.14±0.02s          303±4ms     0.27  series_methods.SeriesConstructor.time_constructor('dict') 
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

# TODO(py3): Change this pass to `raise TypeError(msg) from e`
pass
raise TypeError(msg.format(string))
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don’t need an else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this caught? should be checking the error messge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's primarily caught by pandas.core.dtypes.base._DtypeOpsMixin.is_dtype(), which doesn't care about error message at all:

        try:
            return cls.construct_from_string(dtype) is not None
        except TypeError:
            return False

I'll check for direct calls that might be sensitive to the error message contents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing any calls reliant on the specific error message, so should be safe to change.

@codecov
Copy link

codecov bot commented Jan 12, 2019

Codecov Report

Merging #24743 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24743      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52358    52360       +2     
==========================================
+ Hits        48373    48375       +2     
  Misses       3985     3985
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 42.91% <81.81%> (-0.16%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/dtypes.py 95.6% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33f91d8...5c82eab. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 12, 2019

Codecov Report

Merging #24743 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24743      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52358    52360       +2     
==========================================
+ Hits        48373    48375       +2     
  Misses       3985     3985
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 42.91% <81.81%> (-0.16%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/dtypes.py 95.6% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33f91d8...e98fb51. Read the comment docs.

@qwhelan qwhelan force-pushed the series_dict_speedup branch from 5c82eab to 65b2f13 Compare January 12, 2019 19:06
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add some tests that hit this (and the original)

@qwhelan qwhelan force-pushed the series_dict_speedup branch from 65b2f13 to b0b5006 Compare January 13, 2019 00:55
@pep8speaks
Copy link

pep8speaks commented Jan 13, 2019

Hello @qwhelan! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 13, 2019 at 00:56 Hours UTC

@qwhelan qwhelan force-pushed the series_dict_speedup branch from b0b5006 to e98fb51 Compare January 13, 2019 00:56
@qwhelan
Copy link
Contributor Author

qwhelan commented Jan 13, 2019

@jreback Done - original already had a test.

@gfyoung gfyoung added Datetime Datetime data dtype Performance Memory or execution speed performance Dtype Conversions Unexpected or buggy dtype conversions labels Jan 13, 2019
@jreback jreback added this to the 0.24.0 milestone Jan 13, 2019
@jreback jreback merged commit 8f5c9e3 into pandas-dev:master Jan 13, 2019
@jreback
Copy link
Contributor

jreback commented Jan 13, 2019

thanks @qwhelan

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants