BUG: Timestamp.replace chaining not compat with datetime.replace #15683 #16110

sscherfke · 2017-04-24T13:47:55Z

closes BUG: Timestamp.replace chaining not compat with datetime.replace #15683
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry

TomAugspurger · 2017-04-24T14:56:43Z

doc/source/whatsnew/v0.20.0.txt

@@ -22,6 +22,7 @@ Highlights include:
 - Support for S3 handling now uses ``s3fs``, see :ref:`here <whatsnew_0200.api_breaking.s3>`
 - Google BigQuery support now uses the ``pandas-gbq`` library, see :ref:`here <whatsnew_0200.api_breaking.gbq>`
 - Switched the test framework to use `pytest <http://doc.pytest.org/en/latest>`__ (:issue:`13097`)
+- Fixed an issue with ``Timestamp.replace(tzinfo)`` around DST changes (:issue:`15683`)


You can move this down to the Bug Fixes section (the Conversion subsection).

TomAugspurger · 2017-04-24T14:59:41Z

pandas/tests/tseries/test_timezones.py

@@ -1280,6 +1280,25 @@ def test_ambiguous_compat(self):
            self.assertEqual(result_pytz.to_pydatetime().tzname(),
                             result_dateutil.to_pydatetime().tzname())

+    def test_tzreplace_issue_15683(self):
+        """Regression test for issue 15683."""


Usually we just leave a comment on the first line of the function like # GH 15683

TomAugspurger · 2017-04-24T14:59:42Z

pandas/tests/tseries/test_timezones.py

+        result_dt = dt.replace(tzinfo=tzinfo)
+        result_pd = Timestamp(dt).replace(tzinfo=tzinfo)
+
+        self.assertEqual(result_dt.timestamp(), result_pd.timestamp())


We've recently switched to pytest for our test-runner, so you can write these as

assert result_dt.timesamp() == result_pd.timestamp() ...

Oh, nice. :)

codecov · 2017-04-24T15:11:21Z

Codecov Report

Merging #16110 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #16110      +/-   ##
==========================================
+ Coverage   90.84%   90.86%   +0.01%     
==========================================
  Files         159      159              
  Lines       50771    50771              
==========================================
+ Hits        46122    46132      +10     
+ Misses       4649     4639      -10

Flag	Coverage Δ
#multiple	`88.64% <ø> (+0.01%)`	⬆️
#single	`40.34% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/common.py	`90.68% <0%> (-0.35%)`	⬇️
pandas/core/indexes/datetimes.py	`95.33% <0%> (-0.1%)`	⬇️
pandas/plotting/_converter.py	`65.35% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 844013b...b7c2e86. Read the comment docs.

codecov · 2017-04-24T15:11:43Z

Codecov Report

Merging #16110 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #16110      +/-   ##
==========================================
- Coverage   91.04%   91.01%   -0.03%     
==========================================
  Files         162      162              
  Lines       49555    49567      +12     
==========================================
- Hits        45115    45114       -1     
- Misses       4440     4453      +13

Flag	Coverage Δ
#multiple	`88.79% <ø> (-0.01%)`	⬇️
#single	`40.24% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/dtypes/concat.py	`98.07% <0%> (-0.21%)`	⬇️
pandas/core/groupby.py	`92.07% <0%> (-0.14%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️
pandas/core/internals.py	`94.21% <0%> (-0.1%)`	⬇️
pandas/core/indexes/category.py	`98.52% <0%> (-0.01%)`	⬇️
pandas/core/algorithms.py	`94.15% <0%> (ø)`	⬆️
pandas/core/generic.py	`92.03% <0%> (+0.04%)`	⬆️
pandas/core/indexes/base.py	`95.93% <0%> (+0.05%)`	⬆️
pandas/compat/numpy/__init__.py	`94.11% <0%> (+0.17%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 870b6a6...dd0e33f. Read the comment docs.

jreback

what happened to the original change? this just needs a localization I think

jreback · 2017-04-24T15:10:43Z

pandas/_libs/tslib.pyx

            _TSObject ts

        # set to naive if needed
        _tzinfo = self.tzinfo
        value = self.value
        if _tzinfo is not None:
-            value = tz_convert_single(value, 'UTC', _tzinfo)
+            value_tz = tz_convert_single(value, _tzinfo, 'UTC')


this doesn't make any sense

change the definition of tz_convert_single to cpdef int64_t and here make value_tz and offset int64_t definitions

jreback · 2017-04-24T15:11:09Z

pandas/_libs/tslib.pyx

@@ -726,16 +728,14 @@ class Timestamp(_Timestamp):
            _tzinfo = tzinfo

        # reconstruct & check bounds
-        value = pandas_datetimestruct_to_datetime(PANDAS_FR_ns, &dts)
+        ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min,


what exactly are you doing here?

When you changed the tzinfo from old_tz to new_tz in the old implementation, the code would first do a conversion "from UTC to old_tz" and later "from new_tz to UTC" in order to calculate a correct offset for the value.

However, this was the reason for the problems I discussed in the issue (e.g., off-by-1h-offsets were calculated and invalid errors "date does not exist in timezone" were raised when converting "from new_tz to UTC"). These issues only occured around DST changes when the new offset resulted in a DST switch, so they were not covered before.

What I now did was simply constructing a new Timestamp instance and let the Pandas internals do the correct calculations for value. Unfortunately, the low-level (C-)stuff is not very well documented so my solution for creating a correct result Timestamp might not be the most optimal one.

However, correct code is better than optimal code, so I hope this PR is good enough for now.

pls run the asv suite and see if this regresses anything. perf is a big deal and this goes thru a few more conversions; this is why I am harping on the impl.

Is there a fast low-level function that takes a year, month, day, hour, minute, second, nano and a tzinfo and calculates value so that I can avoid creating a datetime object? Something like convert_to_tsobject() but with a different signature? Or can I call this function in a different way?

you can trace what these are doing and just replicate it. there is some overhead to creating it in this way.

jreback · 2017-04-24T15:11:40Z

pandas/tests/tseries/test_timezones.py

+    def test_tzreplace_issue_15683(self):
+        """Regression test for issue 15683."""
+        dt = datetime(2016, 3, 27, 1)
+        tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo


this need testing with multiple tz's that hit DST

sscherfke · 2017-04-24T15:48:09Z

Looks like the tests are failing b/c I used datetime.datetime.timestamp() which is new in Py3.3.

sscherfke · 2017-04-25T06:43:25Z

@jreback It's not only a compatibility issue. Timestamp.replace() computes wrong results under certain circumstances that would still be wrong if there was no datetime.replace() to be compatible with.

jreback · 2017-04-25T12:16:19Z

@sscherfke this is like adjusting, then readjusting things again. Start with the code I put up before.

sscherfke · 2017-04-25T12:33:27Z

@jreback The proposed tz_localize_to_utc approach leads to this error.

Since all my attempts to fix localizing the timestamp failed, I just thought "why not just calculate a new value for the given time-struct and tzinfo?". I know that this code is not the very best solution but this was the best that I could come up with based on the documentation available:

# What I actually want to to is:
#
#     value = value_for(dts, tzinfo)
#
# The easiest thing I found that does this is creating a tsobject from a datetime() and taking its "value":
ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, dts.us, tzinfo=_tzinfo)
ts = convert_to_tsobject(ts_input, _tzinfo, None, 0, 0)
# Since datetime() has no nanoseconds, add them manually:
value = ts.value + (dts.ps // 1000)

The good thing of this approach is that we don't calculate any wrong offsets and that there also won't be any false "ambiguous time" errors.

I'd gladly improve the above code with anything that correctly calculates value = value_for(dts, tzinfo)

sscherfke · 2017-05-02T14:31:06Z

How do we proceed now?

My solution is not optional but unfortunately, I have no idea how to improve it.

Nonetheless, it fixes an obvious bug and the tests are passing now, so I’d be really happy if this gets merged.

jreback · 2017-05-03T10:34:29Z

doc/source/whatsnew/v0.20.0.txt

@@ -1576,6 +1576,7 @@ Conversion

 - Bug in ``Timestamp.replace`` now raises ``TypeError`` when incorrect argument names are given; previously this raised ``ValueError`` (:issue:`15240`)
 - Bug in ``Timestamp.replace`` with compat for passing long integers (:issue:`15030`)
+- Bug in ``Timestamp.replace`` when replacing ``tzinfo`` around DST changes (:issue:`15683`)
 - Bug in ``Timestamp`` returning UTC based time/date attributes when a timezone was provided (:issue:`13303`)


this will need to move to 0.20.1

I moved it to 0.20.2 since there was no file for 0.20.1.

jreback · 2017-05-03T10:35:16Z

pandas/_libs/tslib.pyx

@@ -726,16 +728,14 @@ class Timestamp(_Timestamp):
            _tzinfo = tzinfo

        # reconstruct & check bounds
-        value = pandas_datetimestruct_to_datetime(PANDAS_FR_ns, &dts)
+        ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min,


pls run the asv suite and see if this regresses anything. perf is a big deal and this goes thru a few more conversions; this is why I am harping on the impl.

jreback · 2017-05-12T10:31:10Z

@sscherfke can you run the asv's and post any discrepancies.

sscherfke · 2017-05-15T14:39:58Z

@jreback I ran asv continuous -f 1.1 -b replace origin/master HEAD (the replace bench. seems to be the only one affected by this patch) and it said everything is okay.

jreback · 2017-05-16T00:03:34Z

pandas/tests/tseries/test_timezones.py

+        result_dt = dt.replace(tzinfo=tzinfo)
+        result_pd = Timestamp(dt).replace(tzinfo=tzinfo)
+
+        if hasattr(result_dt, 'timestamp'):  # New method in Py 3.3


what is this? 'New method in Py 3.3'?

The datetime.datetime.timestamp() method is new in 3.3.
I developed the Patch on 3.6 and then the GitHub CI failed because of this method.

and why are you doing this? what does it do

I am just doing various tests to make sure, pandas timestamps behave like datetimes, which they did not always do before.

and what exactly does this do? pls show an example of calling timestamp on a datetime.datetime.

I am just checking if both, pandas and datetime dates, are equal in direct comparision and if converted to timestamps. The later check only works on Python 3, so I put a check around it. I could have writte a "timestamp" function for Python 2, but that would have been just too much untrivial work for very little benefit.

My question would be if that timestamp check is needed?
If the timestamp values (which gives the POSIX time) differ, wouldn't the actual datetime.datetime objects also not evaluate to not equal?

In the issue, I had some cases where the datetimes differed but the timestamps were equal. Since the timestamps were always equal even if the datetimes differed, this check might not be necessary but I wanted to do the extra check just to be sure.

The issue that this patch closes was not detected by the existing (very extensive) test suite, so I thought more checks won't hurt. :)

jreback · 2017-05-16T00:04:47Z

@jreback I ran asv continuous -f 1.1 -b replace origin/master HEAD (the replace bench. seems to be the only one affected by this patch) and it said everything is okay.

pls run ALL the benchmarks. The issue is that this could affect a LOT of things (prob not, but checking). .replace is called by the dateutil code itself.

sscherfke · 2017-05-17T07:03:31Z

I've run the complete avs suite three times now, but I got strange/random looking results. E.g.:
"Bad" results from the first run:

+     1.08±0.01ms      1.62±0.01ms     1.50  io_bench.read_parse_dates_iso8601.time_read_parse_dates_iso8601
+      5.42±0.2ms       7.78±0.2ms     1.44  io_bench.read_csv_infer_datetime_format_custom.time_read_csv_infer_datetime_format_custom
+     9.89±0.02ms       13.9±0.1ms     1.41  inference.to_numeric_downcast.time_downcast('int32', 'signed')
+      25.0±0.5ms       30.2±0.8ms     1.21  groupby.groupby_agg_builtins.time_groupby_agg_builtins2
+        1.50±0ms       1.82±0.2ms     1.21  io_bench.read_csv_infer_datetime_format_ymd.time_read_csv_infer_datetime_format_ymd
+     7.72±0.01ms       9.30±0.7ms     1.20  io_bench.frame_to_csv_date_formatting.time_frame_to_csv_date_formatting
+     1.57±0.03ms       1.85±0.1ms     1.18  inference.to_numeric_downcast.time_downcast('datetime64', 'float')
+       419±0.5ms         485±30ms     1.16  groupby.Groups.time_groupby_groups('object_large')
+       157±0.9ms          179±2ms     1.14  frame_methods.Dropna.time_dropna_axis1_all_mixed_dtypes
+       150±0.4ms         172±10ms     1.14  io_bench.frame_to_csv_mixed.time_frame_to_csv_mixed
+     2.66±0.03ms       2.99±0.1ms     1.12  binary_ops.Ops.time_frame_add(True, 1)
+      53.7±0.2μs       59.4±0.1μs     1.11  timeseries.SemiMonthOffset.time_end_incr_n
+     2.61±0.03ms      2.88±0.04ms     1.10  binary_ops.Ops.time_frame_mult(True, 1)
+      53.3±0.1μs       58.8±0.1μs     1.10  timeseries.SemiMonthOffset.time_begin_incr_n

And here are the bad results of the third run (I failed to copy the results of the second run before they vanished beyond my bash history):

+     1.38±0.01ms      1.82±0.09ms     1.32  frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Micro', 1)
+      17.1±0.2ms         20.9±1ms     1.22  frame_methods.frame_fancy_lookup.time_frame_fancy_lookup_all
+     1.39±0.01ms      1.67±0.07ms     1.20  frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Easter', 2)
+      76.8±0.4ms         91.7±4ms     1.19  gil.nogil_rolling_algos_fast.time_nogil_rolling_skew
+     1.34±0.01ms       1.59±0.1ms     1.19  frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Micro', 2)
+          15.2ms           18.0ms     1.18  frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('FY5253Quarter_1', 2)
+     1.34±0.02ms      1.58±0.07ms     1.18  frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Milli', 2)
+      78.2±0.4ms         91.9±1ms     1.17  gil.nogil_rolling_algos_fast.time_nogil_rolling_std
+      69.3±0.6ms       81.0±0.6ms     1.17  gil.nogil_rolling_algos_fast.time_nogil_rolling_var
+      9.16±0.1ms       10.6±0.3ms     1.16  algorithms.Algorithms.time_add_overflow_second_arg_nan
+      8.66±0.1ms       9.99±0.2ms     1.15  frame_methods.frame_mask_bools.time_frame_mask_floats
+         131±2ms          150±2ms     1.15  frame_methods.series_string_vector_slice.time_series_string_vector_slice
+        1.01±0ms      1.16±0.02ms     1.15  indexing.DataFrameIndexing.time_loc_dups
+     7.27±0.05ms       8.28±0.4ms     1.14  groupby.groupby_size.time_groupby_dt_size
+     9.98±0.09ms       11.4±0.3ms     1.14  inference.to_numeric_downcast.time_downcast('int32', 'integer')
+      76.4±0.5ms       86.7±0.5ms     1.14  gil.nogil_rolling_algos_fast.time_nogil_rolling_min
+      12.3±0.1ms       13.8±0.6ms     1.13  io_sql.ReadSQLTypes.time_datetime_read_and_parse_sqlalchemy
+         108±1ms          122±2ms     1.12  frame_methods.frame_duplicated.time_frame_duplicated
+          9.76ms           10.9ms     1.12  frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('FY5253Quarter_1', 1)
+     1.45±0.01ms      1.62±0.04ms     1.12  frame_methods.frame_assign_timeseries_index.time_frame_assign_timeseries_index
+      38.7±0.2ms       43.2±0.9ms     1.11  gil.NoGilGroupby.time_min_2
+     1.96±0.02ms      2.18±0.01ms     1.11  frame_methods.frame_fancy_lookup.time_frame_fancy_lookup
+           311ms            346ms     1.11  gil.nogil_read_csv.time_read_csv
+     1.53±0.01ms       1.70±0.1ms     1.11  frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('BQuarterEnd', 1)
+      50.9±0.3μs       56.3±0.2μs     1.11  timeseries.SemiMonthOffset.time_end_incr
+        68.1±1ms       75.3±0.3ms     1.11  gil.nogil_rolling_algos_fast.time_nogil_rolling_mean
+     13.0±0.06ms      14.3±0.04ms     1.10  parser_vb.read_csv2.time_comment
+     1.95±0.01ms      2.15±0.02ms     1.10  frame_methods.Interpolate.time_interpolate_some_good_infer

As you can see, there's very little overlap between the two datasets (only time_downcast('int32', 'signed') is in both datasets), so I don't know how much I would trust the results.

Anyway, I still think that correct code is more important than a few percent of performance.

jorisvandenbossche · 2017-05-17T08:01:00Z

@sscherfke there is often quite some variation in the timings from one build to another (which makes it sometimes difficult to interpret a single run). So ratio's around 1.3 can most of the time be considered as 'normal', unless they would show up consistently.

In the first run the first two are related to datetimes, so maybe you could repeat those to see whether the slowdown is consistent.

Anyway, I still think that correct code is more important than a few percent of performance.

Yep, but running the asv is also a way to at least have an idea (and to indicate where a possible problem could be), and if there is a performance problem to look into ways to improve that keeping the correctness.

jreback · 2017-05-17T23:16:44Z

pandas/tests/tseries/test_timezones.py

+        result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None)
+        result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None)
+
+        if hasattr(result_dt, 'timestamp'):  # New method in Py 3.3


the correct 3rd comparison is Timestamp(result_dt) == Timestamp(result_pd), in fact to make this more clear I would do:

for left, right in [(Timestamp, Timestamp), (lambda x: x, lambda x: x), (lambda x: x, lamba x: x.to_pydatetime())]: assert left(result_dt) == right(result_pd)

Imho, this is a lot harder to read and understand.

well, you need to make one of the changes

I agree the loop is in this case harder to read

BTW, @jreback Timestamp(dt) and dt.timestamp() is not equivalent. dt.timestamp() is more like Timestamp.value (although I am not sure how dt.timestamp() handles timezones)

we don't use dt.timestamp() anywhere, so a test of this is irrelevant.

Timestamp(result_dt) == Timestamp(result_pd) is the actual test that needs to pass.

sscherfke · 2017-05-18T14:14:01Z

I’m really getting tired of this. I literally spent days trying to understand Pandas and to find a solution that works correctly. I had very little constructive help in the process.

When I finally created he PR, I hoped for a "Thanks for your efforts. merged" but all I’m getting is pointless nit-picking on my solution.

If that's the way you treat new contributors, good luck for the future. Do whatever you like with this PR. I'm out.

jorisvandenbossche · 2017-05-18T14:54:17Z

@sscherfke I am very sorry you feel that way. And thanks for your feedback on the process. As it is certainly important to use how new (or old) contributors experience this.

But, you also have to understand that a PR will almost never be just "merged, thanks". On the one hand we want to keep / obtain a high quality code base where new changes meet certain requirements, do not introduce other bugs or have large performance improvements (and not saying your PR would not meet that! just pointing out in general). And also we want to keep a certain consistency in the codebase (so just merging all PRs as we get them, would result in too much variety).
What I am trying to say is that PRs will almost always get comments and reviews, often an iterative process (which can be tiring due the non-direct lagged interaction through github).

But at the same time, we certainly have to be aware of that such a process can become nitpicky, and try to avoid that!

For this PR, I hope you would like to reconsider trying to finish it, as this is a regression that would be nice to see fixed.
In order to do that, let my ask following questions:

@jreback Are you OK with the general approach of this PR?
For the specific question of @sscherfke about the construction of a Timestamp in cython form all its parts (see https://github.com/pandas-dev/pandas/pull/16110/files#r114604812), is there such a function? (@jreback?) Or is the approach taken OK? (I didn't directly find one)
@sscherfke Would you like to rerun specifically the two benchmarks that showed a potential slowdown to see whether this is systematic? (see my question above BUG: Timestamp.replace chaining not compat with datetime.replace #15683 #16110 (comment))

sscherfke · 2017-05-18T16:19:54Z

But, you also have to understand that a PR will almost never be just "merged, thanks".

I know. I exaggerated a bit to make my point. I’ve contributing to and maintining my own OSS projects since many years now.

The main drawback of my patch is the creation of a datetime instance but, as I said multiple times before, I don't know any low-level pandas function that does exactly what I want. The previous implementations/souggestions by @jreback calculated wrong results.

jreback · 2017-05-18T20:39:19Z

the general approach is ok as long as there are not perf regressions. This is a very very narrow code change, so yes certainly correctness is first, BUT if this causes perf issues with the vast majority of code would need another soln. I don't think perf is actually a problem, but needs some checking.

.replace is actually implicity called a bunch by pytz so this is non-trival and prob not covered by our asv's.

jorisvandenbossche · 2017-05-18T21:11:19Z

While looking in the PR / this issue, I wondered the following:

In [97]: pd.Timestamp("2016-03-27 03:00", tz='CET')
Out[97]: Timestamp('2016-03-27 03:00:00+0200', tz='CET')

In [98]: pd.Timestamp("2016-03-27 03:00", tz='CET').replace(hour=1)
Out[98]: Timestamp('2016-03-27 01:00:00+0200', tz='CET')

In [99]: dt = datetime.datetime(2016, 3, 27, 3)

In [101]: tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo

In [102]: datetime.datetime(2013, 10, 27, 3, 0, tzinfo=tzinfo)
Out[102]: datetime.datetime(2013, 10, 27, 3, 0, tzinfo=<DstTzInfo 'CET' CEST+2:00:00 DST>)

In [103]: datetime.datetime(2013, 10, 27, 3, 0, tzinfo=tzinfo).replace(hour=1)
Out[103]: datetime.datetime(2013, 10, 27, 1, 0, tzinfo=<DstTzInfo 'CET' CEST+2:00:00 DST>)

So the replace is actually producing invalid times? (as due to the hour change, the timezone should be changed from +2 to +1). Which is both for datetime.datetime and pd.Timestamp, so not necessarily related to this PR. But, it is the reason I think that the (simpler) patch of de-localizing to naive in the beginning and re-localizing at the end during the replace method does not work, as errors can get raised in the re-localization when you get invalid dates.

sscherfke · 2017-07-05T12:05:40Z

Any chance that this gets eventually merged? None of the core devs have come up with a better or more performant solution and the bug is still a bug …

TomAugspurger · 2017-07-05T12:48:17Z

I haven't gone through the whole discussion, but IIRC we needed a new ASV benchmark in addition to the existing ones. I added a simple one to https://github.com/TomAugspurger/pandas/tree/sscherfke-master

class TimeTimestamp(object):

    goal_time = 0.5

    def setup(self):
        self.ts = pd.Timestamp("2016-03-28")  # after DST
        self.tzinfo = pytz.timezone("CET")

    def time_timestamp_replace(self):
        self.ts.replace(tzinfo=self.tzinfo).replace(tzinfo=None)

Output

       before           after         ratio
     [15db50bb]       [ab7ea875]
+      10.8±0.2μs       18.3±0.8μs     1.70  timestamp.TimeTimestamp.time_timestamp_replace

So there's some work to be done there. I can take a look after Scipy next week.

jreback · 2017-07-06T10:32:42Z

doc/source/whatsnew/v0.20.2.txt

@@ -37,6 +37,7 @@ Bug Fixes
 Conversion
 ^^^^^^^^^^

+- Bug in ``Timestamp.replace`` when replacing ``tzinfo`` around DST changes (:issue:`15683`)


needs to go in 0.21.0

sscherfke · 2017-07-10T07:42:22Z

For the specific question of @sscherfke about the construction of a Timestamp in cython form all its parts (see https://github.com/pandas-dev/pandas/pull/16110/files#r114604812), is there such a function? (@jreback?)

This question is still unanswered.

jreback · 2017-08-17T10:37:21Z

if you can rebase and move docs to 0.21.0

jorisvandenbossche · 2017-08-17T23:02:01Z

I think this problem (#16110 (comment)) is still a problem for updating this? (unless the current one using stlib datetime.datetime is not a performance bottleneck)

sscherfke · 2017-08-18T05:38:29Z

Yes, it would still be a problem but none of the core devs has come up with a good solution yet. I don’t know enough about Pandas to improve my solution.

sscherfke · 2017-08-18T05:43:27Z

Maybe you just gotta descide whether in this particular case you want good performance or correct results (at least for now).

If you should decide that correct results are more important, I’d glady update my PR for the next release.

jreback

try the change about typing and see how perf is

jreback · 2017-08-18T10:05:57Z

pandas/_libs/tslib.pyx

            _TSObject ts

        # set to naive if needed
        _tzinfo = self.tzinfo
        value = self.value
        if _tzinfo is not None:
-            value = tz_convert_single(value, 'UTC', _tzinfo)
+            value_tz = tz_convert_single(value, _tzinfo, 'UTC')


change the definition of tz_convert_single to cpdef int64_t and here make value_tz and offset int64_t definitions

sscherfke · 2017-08-28T08:13:59Z

#17356 is a clean version of this PR.

Fix issue #15683

fcb17b0

TomAugspurger requested changes Apr 24, 2017

View reviewed changes

Adjustments to coding guidelines

b7c2e86

jreback requested changes Apr 24, 2017

View reviewed changes

jreback changed the title ~~Fix issue #15683~~ BUG: Timestamp.replace chaining not compat with datetime.replace #15683 Apr 25, 2017

jreback added Bug Compat pandas objects compatability with Numpy or Python functions Datetime Datetime data dtype labels Apr 25, 2017

Only perform timestamp() check if meth is available.

5e9bfd2

jreback requested changes May 3, 2017

View reviewed changes

Merge origin

93829c5

jreback requested changes May 16, 2017

View reviewed changes

jreback reviewed May 17, 2017

View reviewed changes

jreback requested changes Jul 6, 2017

View reviewed changes

jreback requested changes Aug 18, 2017

View reviewed changes

Stefan Scherfke and others added 4 commits August 21, 2017 13:51

Merge branch "origin/master"

038c2d3

Add some typing information

5b912c2

Update whatsnew entry

ee30db8

Merge branch 'master' into master

dd0e33f

sscherfke mentioned this pull request Aug 28, 2017

BUG: Timestamp.replace chaining not compat with datetime.replace #17356

Closed

4 tasks

sscherfke closed this Aug 28, 2017

gfyoung added this to the No action milestone Aug 28, 2017

Uh oh!

BUG: Timestamp.replace chaining not compat with datetime.replace #15683 #16110

BUG: Timestamp.replace chaining not compat with datetime.replace #15683 #16110

Uh oh!

Conversation

sscherfke commented Apr 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 24, 2017

Codecov Report

Uh oh!

codecov bot commented Apr 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sscherfke commented Apr 24, 2017

Uh oh!

sscherfke commented Apr 25, 2017

Uh oh!

jreback commented Apr 25, 2017

Uh oh!

sscherfke commented Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sscherfke commented May 2, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented May 12, 2017

Uh oh!

sscherfke commented May 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

codecov bot commented Apr 24, 2017 •

edited

Loading

sscherfke commented Apr 25, 2017 •

edited

Loading

sscherfke commented May 15, 2017 •

edited

Loading

sscherfke commented Jul 10, 2017 •

edited

Loading