Skip to content

BUG: Timestamp.replace chaining not compat with datetime.replace #17356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

BUG: Timestamp.replace chaining not compat with datetime.replace #17356

wants to merge 2 commits into from

Conversation

sscherfke
Copy link

This is a clean version of #16110 and the last thing I’m going to do with this issue.

@gfyoung
Copy link
Member

gfyoung commented Aug 28, 2017

@sscherfke : This look pretty good (test-wise) save for a minor linting issue, which you can see here:

https://travis-ci.org/pandas-dev/pandas/jobs/269100110#L1809

@codecov
Copy link

codecov bot commented Aug 28, 2017

Codecov Report

Merging #17356 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17356      +/-   ##
==========================================
- Coverage   91.03%   91.02%   -0.02%     
==========================================
  Files         162      162              
  Lines       49567    49567              
==========================================
- Hits        45125    45116       -9     
- Misses       4442     4451       +9
Flag Coverage Δ
#multiple 88.8% <ø> (ø) ⬆️
#single 40.24% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 473a7f3...02297a5. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 28, 2017

Codecov Report

Merging #17356 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17356      +/-   ##
==========================================
- Coverage   91.03%   91.02%   -0.02%     
==========================================
  Files         162      162              
  Lines       49567    49567              
==========================================
- Hits        45125    45116       -9     
- Misses       4442     4451       +9
Flag Coverage Δ
#multiple 88.8% <ø> (ø) ⬆️
#single 40.24% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 473a7f3...02297a5. Read the comment docs.

@sscherfke
Copy link
Author

My bad. Fixed it.

@sscherfke
Copy link
Author

Can this PR please be merge? This function computes wrong results since more than half a year by now!

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 4, 2017

From #16110 (comment)

change the definition of tz_convert_single to cpdef int64_t

Can you try that?

@TomAugspurger
Copy link
Contributor

I gave this branch + changing tz_convert_single to a cpdef int64_t and ran through a small benchmark (with profiling enabled):

from datetime import datetime
import pytz
from pandas import Timestamp
import cProfile
import pstats

dt = datetime(2016, 3, 27, 1)
tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo

result_dt = dt.replace(tzinfo=tzinfo)
ts = Timestamp(dt)

code = '''
for _ in range(10000):
    ts.replace(tzinfo=tzinfo)
'''

cProfile.runctx(code, globals(), locals(), "Profile.prof")

s = pstats.Stats("Profile.prof")
s.strip_dirs().sort_stats("time").print_stats()

Master:

Mon Sep  4 08:16:09 2017    Profile.prof

         150005 function calls in 0.104 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.039    0.000    0.072    0.000 tslib.pyx:4210(tz_convert_single)
    10000    0.015    0.000    0.097    0.000 tslib.pyx:677(replace)
    20000    0.015    0.000    0.026    0.000 tslib.pyx:1708(_get_zone)
    20000    0.009    0.000    0.009    0.000 tslib.pyx:4280(_treat_tz_as_dateutil)
        1    0.008    0.008    0.105    0.105 <string>:2(<module>)
    10000    0.006    0.000    0.006    0.000 tslib.pyx:110(create_timestamp_from_ts)
    10000    0.004    0.000    0.005    0.000 tslib.pyx:4322(_get_dst_info)
    20000    0.003    0.000    0.003    0.000 tslib.pyx:1705(_is_utc)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:236(_is_tzlocal)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:4289(_tz_cache_key)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1043(__get__)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1770(_check_dts_bounds)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1044(__get__)
        1    0.000    0.000    0.105    0.105 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 tslib.pyx:4402(_unbox_utcoffsets)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 tslib.pyx:4276(_treat_tz_as_pytz)

PR:

Mon Sep  4 08:19:48 2017    Profile.prof

         210003 function calls in 0.149 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    30000    0.043    0.000    0.043    0.000 {method 'replace' of 'datetime.datetime' objects}
    10000    0.030    0.000    0.143    0.000 tslib.pyx:676(replace)
    10000    0.020    0.000    0.104    0.000 tslib.pyx:1439(convert_to_tsobject)
    10000    0.010    0.000    0.045    0.000 tzinfo.py:179(fromutc)
    10000    0.010    0.000    0.010    0.000 tslib.pyx:3966(_delta_to_nanoseconds)
        1    0.007    0.007    0.150    0.150 <string>:2(<module>)
    10000    0.007    0.000    0.067    0.000 tzinfo.py:189(normalize)
    10000    0.006    0.000    0.006    0.000 tslib.pyx:109(create_timestamp_from_ts)
    10000    0.004    0.000    0.004    0.000 {built-in method _bisect.bisect_right}
    10000    0.002    0.000    0.002    0.000 {built-in method builtins.max}
    10000    0.002    0.000    0.002    0.000 tslib.pyx:1745(maybe_get_tz)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1704(_is_utc)
    20000    0.001    0.000    0.001    0.000 tslib.pyx:1769(_check_dts_bounds)
    10000    0.001    0.000    0.001    0.000 datetime.pxd:144(_pydatetime_to_dts)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1432(_get_utcoffset)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1429(__get__)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1042(__get__)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1330(is_timestamp)
    10000    0.001    0.000    0.001    0.000 tslib.pyx:1043(__get__)
        1    0.000    0.000    0.150    0.150 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

I'm out of time now, but some quick thoughts:

  • datetime.datetime.replace is now the most expensive function call
  • We hit it 3 times per .replace, can we reduce that?
  • We don't call tz_convert_single now (should re-run this with a non-naive one, but that's good)

@jreback jreback changed the title Fix issue #15683 BUG: Timestamp.replace chaining not compat with datetime.replace Sep 7, 2017
@jreback
Copy link
Contributor

jreback commented Sep 13, 2017

replaced by #17507

@jreback jreback closed this Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Timestamp.replace chaining not compat with datetime.replace
4 participants