Skip to content

simplify+unify offset.apply logic #18263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 16, 2017

Conversation

jbrockmendel
Copy link
Member

Implement _get_lastbday mirroring _get_firstbday.

Implement bstart and bend cases in shift month. Use these to simplify a bunch of the BusinessFoo offsets.

@jbrockmendel
Copy link
Member Author

Running asv now.

@jbrockmendel
Copy link
Member Author

taskset 6 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b timedelta -b period
[...]
       before           after         ratio
     [7f4c960c]       [f6aebe37]
+           44.3s            1.07m     1.45  gil.nogil_datetime_fields.time_period_to_datetime
+           15.0s            17.1s     1.14  gil.nogil_datetime_fields.time_datetime_to_period
+         760±2ns          851±5ns     1.12  period.Properties.time_daysinmonth
-      30.6±0.3μs       27.8±0.2μs     0.91  timeseries.Offsets.time_custom_bday_decr
-     1.87±0.06ms      1.66±0.01ms     0.89  timeseries.DatetimeIndex.time_add_timedelta
-         811±5ns          720±4ns     0.89  period.Properties.time_day
-         836±8ns          736±3ns     0.88  period.PeriodProperties.time_hour
-         204±2ns        154±0.1ns     0.76  timedelta.TimedeltaProperties.time_timedelta_days
-        191±10ms       27.2±0.2ms     0.14  timeseries.AsOfDataFrame.time_asof_nan
-        202±10ms       24.2±0.3ms     0.12  timeseries.AsOfDataFrame.time_asof

taskset 6 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b timedelta -b period
[...]
       before           after         ratio
     [7f4c960c]       [f6aebe37]
+      15.8±0.6μs       20.2±0.1μs     1.28  timeseries.Offsets.time_custom_bday_apply
+      27.1±0.2μs       32.8±0.1μs     1.21  timeseries.Offsets.time_custom_bday_cal_decr
+         725±5ns          833±9ns     1.15  period.PeriodProperties.time_minute
+     16.7±0.08μs       19.0±0.1μs     1.14  timeseries.AsOf.time_asof_single_early
+      17.9±0.1μs       20.3±0.3μs     1.13  timeseries.Offsets.time_timeseries_day_apply
+         102±2μs          114±1μs     1.12  timeseries.Offsets.time_custom_bmonthend_decr_n
+       134±0.9ms          150±1ms     1.12  timeseries.ToDatetime.time_iso8601_tz_spaceformat
+     18.9±0.06μs       21.0±0.2μs     1.11  timeseries.SemiMonthOffset.time_end_incr_n
-           44.6s            39.6s     0.89  gil.nogil_datetime_fields.time_period_to_datetime
-      20.3±0.1μs       17.0±0.1μs     0.84  timeseries.Offsets.time_custom_bday_apply_dt64
-        28.3±2ms           15.7ms     0.56  timeseries.DatetimeIndex.time_dti_tz_factorize
-        28.1±1ms       6.93±0.6ms     0.25  timeseries.DatetimeIndex.time_dti_factorize

taskset 6 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b timedelta -b period
[...]
       before           after         ratio
     [7f4c960c]       [f6aebe37]
+           14.5s            17.0s     1.17  gil.nogil_datetime_fields.time_datetime_to_period
-      8.14±0.5ms      7.00±0.05ms     0.86  timeseries.DatetimeIndex.time_add_offset_fast
-        29.1±1ms       6.95±0.5ms     0.24  timeseries.DatetimeIndex.time_dti_tz_factorize
-        33.2±5ms       6.44±0.4ms     0.19  timeseries.DatetimeIndex.time_dti_factorize
-         301±7ms         57.2±1ms     0.19  timeseries.AsOfDataFrame.time_asof
-         318±9ms       59.7±0.6ms     0.19  timeseries.AsOfDataFrame.time_asof_nan

taskset 6 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries -b timedelta -b period
[...]
      before           after         ratio
     [7f4c960c]       [f6aebe37]
+     12.0±0.04μs      13.4±0.08μs     1.12  timeseries.SemiMonthOffset.time_begin_apply
+      10.5±0.1μs      11.6±0.02μs     1.11  timeseries.Offsets.time_timeseries_year_incr
-      28.0±0.1μs      24.1±0.07μs     0.86  timeseries.Offsets.time_custom_bday_cal_incr
-      29.5±0.6ms       6.74±0.3ms     0.23  timeseries.DatetimeIndex.time_dti_tz_factorize

The optimist in me thinks the time_dti_tz_factorize might be real, but for the most part this looks like a wash.

@jbrockmendel
Copy link
Member Author

       before           after         ratio
     [7f4c960c]       [f6aebe37]
+      46.2±0.4ms       51.8±0.6ms     1.12  timedelta.ToTimedelta.time_convert_string
-           2.37s            2.02s     0.85  timeseries.AsOfDataFrame.time_asof_nan
-      18.4±0.1μs      15.6±0.05μs     0.85  timeseries.SemiMonthOffset.time_end_incr
-       170±0.4ms        138±0.6ms     0.81  timeseries.ToDatetime.time_iso8601_tz_spaceformat

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Nov 13, 2017

These numbers are against an out-of-date master. Will update. update well maybe not that out of date.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments. nice removing code! can you confirm that we have sufficient test coverage of things that you are changing (I think yes, but have a look)


If it's a saturday or sunday, increment first business day to reflect this

days_in_month arg is a dummy so that this has the same signature as
_get_lastbday.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can type these

@@ -416,15 +425,21 @@ cpdef datetime shift_month(datetime stamp, int months, object day_opt=None):
dy -= 1
year = stamp.year + dy

dim = monthrange(year, month)[1]
(wkday, days_in_month) = monthrange(year, month)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no parens

elif day_opt == 'start':
day = 1
elif day_opt == 'end':
day = dim
day = days_in_month
elif day_opt == 'bstart':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about business_start/end


wkday, days_in_month = tslib.monthrange(other.year, other.month)
lastBDay = days_in_month - max(((wkday + days_in_month - 1)
% 7) - 4, 0)
lastBDay = _get_lastbday(wkday, days_in_month)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to de-privatize get_lastbday/first

@jreback jreback added Frequency DateOffsets Clean labels Nov 14, 2017
@jbrockmendel
Copy link
Member Author

Making edits per comments now. As to test coverage (here and in the other two similar PRs): as I went through these I added equality assertions for new vs old versions followed by assert False to make sure they were actually hit.

@codecov
Copy link

codecov bot commented Nov 14, 2017

Codecov Report

Merging #18263 into master will decrease coverage by 51.95%.
The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #18263       +/-   ##
===========================================
- Coverage    91.4%   39.44%   -51.96%     
===========================================
  Files         164      164               
  Lines       49880    49857       -23     
===========================================
- Hits        45592    19667    -25925     
- Misses       4288    30190    +25902
Flag Coverage Δ
#multiple ?
#single 39.44% <0%> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 34.33% <0%> (-62.78%) ⬇️
pandas/io/formats/style.py 0% <0%> (-100%) ⬇️
pandas/tools/hashing.py 0% <0%> (-100%) ⬇️
pandas/types/concat.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/io/json/json.py 0% <0%> (-100%) ⬇️
pandas/parser.py 0% <0%> (-100%) ⬇️
pandas/lib.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.08%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.28%) ⬇️
... and 114 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77f10f0...0b1ecaf. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 14, 2017

Codecov Report

Merging #18263 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18263      +/-   ##
==========================================
- Coverage   91.39%   91.36%   -0.04%     
==========================================
  Files         164      164              
  Lines       49854    49781      -73     
==========================================
- Hits        45566    45484      -82     
- Misses       4288     4297       +9
Flag Coverage Δ
#multiple 89.17% <100%> (-0.02%) ⬇️
#single 39.49% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/offsets.py 96.91% <100%> (-0.16%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/plotting/_core.py 82.45% <0%> (ø) ⬆️
pandas/core/reshape/reshape.py 100% <0%> (ø) ⬆️
pandas/core/indexes/period.py 92.9% <0%> (ø) ⬆️
pandas/core/indexes/numeric.py 97.27% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 91.14% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.48% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c799e2...45d9ddd. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can you unit test some these?

"""
wkday is the result of monthrange(year, month)
(wkday, days_in_month) is the result of monthrange(year, month)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update doc-strings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to reflect what?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean update to a more proper doc-string

@jbrockmendel
Copy link
Member Author

also can you unit test some these?

Yes, but since I just added a test_liboffsets file in #18280 I'd rather close these out separately and then add unit tests in a shared follow-up.

@jorisvandenbossche
Copy link
Member

@jbrockmendel general comment: can you try to use a bit more descriptive PR titles? "Tslibs offsets paramd" does not say much to me :-)

@jbrockmendel
Copy link
Member Author

can you try to use a bit more descriptive PR titles

You got it.

@jbrockmendel jbrockmendel changed the title Tslibs offsets paramd simplify+unify offset.apply logic Nov 15, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some unit tests

"""
wkday is the result of monthrange(year, month)
(wkday, days_in_month) is the result of monthrange(year, month)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean update to a more proper doc-string

@jbrockmendel
Copy link
Member Author

I mean update to a more proper doc-string

Sure thing.

@jbrockmendel
Copy link
Member Author

pls hold off on this pending double-checking cross-DST behavior.

@jbrockmendel
Copy link
Member Author

OK, I've convinced myself the DST issue is handled.

@jreback jreback added this to the 0.22.0 milestone Nov 16, 2017
@jreback jreback merged commit 498a1e1 into pandas-dev:master Nov 16, 2017
@jreback
Copy link
Contributor

jreback commented Nov 16, 2017

thanks, as always more tests are welcome! (w.r.t. DST)

@jbrockmendel
Copy link
Member Author

thanks, as always more tests are welcome! (w.r.t. DST)

I'm psyched to have this merged. There's some follow-ups headed you're way I think you'll like.

w/r/t DST shift_month (and shift_months) is more fragile than I'd like, are safe largely because all existing usages are wrapped with apply_wraps. But I don't want to rely on that always being the case. So expect tests+fixes.

Specifically w/r/t to DST tests I've been thinking of trying hypothesis (#17978)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants