-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Make *_range functions consistent #17482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make *_range functions consistent #17482
Conversation
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -310,6 +310,59 @@ Previously, :func:`to_datetime` did not localize datetime ``Series`` data when ` | |||
|
|||
Additionally, DataFrames with datetime columns that were parsed by :func:`read_sql_table` and :func:`read_sql_query` will also be localized to UTC only if the original SQL columns were timezone aware datetime columns. | |||
|
|||
.. _whatsnew_0200.api.consistency_of_range_functions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean "0210"
pandas/core/indexes/datetimes.py
Outdated
@@ -2030,7 +2031,7 @@ def date_range(start=None, end=None, periods=None, freq='D', tz=None, | |||
|
|||
Notes | |||
----- | |||
2 of start, end, or periods must be specified | |||
Exactly two of start, end, or periods must be specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you're trying to say here, but it's a little confusing the final parameter is plural. Say something like this (and propagate to places where relevant):
"Of the three parameters, start, end, and periods, exactly two must be specified."
For the doc-strings, I would even consider ticking them for additional clarity.
@@ -146,14 +146,26 @@ def test_date_range_businesshour(self): | |||
|
|||
def test_range_misspecified(self): | |||
# GH #1095 | |||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know what the error message is here (you just added it). Let's make this test (and all other relevant ones) stronger by checking the error message too using tm.assert_raises_regexp
.
Codecov Report
@@ Coverage Diff @@
## master #17482 +/- ##
==========================================
- Coverage 91.15% 91.14% -0.02%
==========================================
Files 163 163
Lines 49534 49531 -3
==========================================
- Hits 45153 45144 -9
- Misses 4381 4387 +6
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17482 +/- ##
==========================================
+ Coverage 91.18% 91.19% +<.01%
==========================================
Files 163 163
Lines 49545 49582 +37
==========================================
+ Hits 45179 45214 +35
- Misses 4366 4368 +2
Continue to review full report at Codecov.
|
523dcea
to
8713abb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments. looks really good. a couple of in-line questions. ping when pushed.
doc/source/whatsnew/v0.21.0.txt
Outdated
--------------------------------------------------------------------------- | ||
ValueError: Of the three parameters, start, end, and periods, exactly two must be specified | ||
|
||
Additionally, the endpoint parameter ``end`` was not included in the intervals produced by ``interval_range``. However, all other range functions include ``end`` in their output. To promote consistency among the range functions, ``interval_range`` will now include ``end`` as the right endpoint of the final interval, except if ``freq`` is specified in a way which skips ``end``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this last sentence true for period as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think periods
can cause a skip. Since we're forcing exactly two of start
/ end
/periods
to be passed, if periods
is passed then only one of start
/end
can be passed. The _range functions then compute the non-passed using periods
/ freq
/passed, so by construction nothing can get skipped. I think the only case a skip is possible is end
being skipped when start
/end
/freq
are all passed.
pandas/core/indexes/datetimes.py
Outdated
@@ -412,7 +412,8 @@ def __new__(cls, data=None, | |||
def _generate(cls, start, end, periods, name, offset, | |||
tz=None, normalize=False, ambiguous='raise', closed=None): | |||
if com._count_not_none(start, end, periods) != 2: | |||
raise ValueError('Must specify two of start, end, or periods') | |||
raise ValueError('Of the three parameters, start, end, and ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of the three parameters: start, end, and periods,
comma -> :
pandas/core/indexes/datetimes.py
Outdated
@@ -2030,7 +2031,8 @@ def date_range(start=None, end=None, periods=None, freq='D', tz=None, | |||
|
|||
Notes | |||
----- | |||
2 of start, end, or periods must be specified | |||
Of the three parameters, ``start``, ``end``, and ``periods``, exactly two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same (and all other occurrences)
pandas/core/indexes/interval.py
Outdated
@@ -1039,43 +1039,37 @@ def interval_range(start=None, end=None, freq=None, periods=None, | |||
Left bound for generating data | |||
end : string or datetime-like, default None | |||
Right bound for generating data | |||
freq : interger, string or DateOffset, default 1 | |||
periods : interger, default None | |||
freq : integer, string or DateOffset, default 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhere mention (if not already) that freq must be consistent with the type of the start/end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I think there are tests but for this, but pls confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that there are tests for mixed units in TestIntervalRange.test_errors
. Adding one additional test to be safe.
period_range(periods=5) | ||
|
||
with tm.assert_raises_regex(ValueError, msg): | ||
period_range() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add inconsistent freq w.r.t. start/end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior on 0.20.3 (and this PR as-is) is not to raise, but to convert to freq
:
In [2]: p1 = pd.Period('2017-01-01', freq='W')
...: p2 = pd.Period('2017-12-31', freq='W')
...: pd.period_range(start=p1, end=p2, freq='Q')
...:
Out[2]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4'], dtype='period[Q-DEC]', freq='Q-DEC')
When freq
isn't specified, it converts to daily (default is freq='D'
):
In [5]: pd.period_range(start=p1, end=p2)
Out[5]:
PeriodIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
'2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
'2017-01-09', '2017-01-10',
...
'2017-12-22', '2017-12-23', '2017-12-24', '2017-12-25',
'2017-12-26', '2017-12-27', '2017-12-28', '2017-12-29',
'2017-12-30', '2017-12-31'],
dtype='period[D]', length=365, freq='D')
Same behavior with different frequencies for all of start
/end
/freq
:
In [4]: p3 = pd.Period('2017-12-31', freq='M')
...: pd.period_range(start=p1, end=p3, freq='Q')
...:
Out[4]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4'], dtype='period[Q-DEC]', freq='Q-DEC')
Should this raise if any of start
/end
/freq
are passed with different frequencies? If start
/end
are passed with the same frequency, and freq
isn't specified, should freq
be set to match start
/end
instead of using the daily default? (same thing if one of start
/end
is passed with a frequency and periods
is passed)
Couldn't find a direct comparison with the other range functions to check against for consistency. interval_range
will raise if freq
isn't compatible with they type of start
/end
, but that seems a bit different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is meant to be the 'new' freq, while the start/end serve as the anchor endpoints. this seems fine then. (maybe adding the above example to the docs would be helpful though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will add tests for this behavior.
8713abb
to
09498fd
Compare
ping @jreback Made the review changes, as well as a few additional things I noticed:
I also noticed string/datetime-like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of minor comments.
doc/source/whatsnew/v0.21.0.txt
Outdated
|
||
In [2]: pd.interval_range(start=0, end=4, periods=6) | ||
--------------------------------------------------------------------------- | ||
ValueError: Of the three parameters, start, end, and periods, exactly two must be specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use : after parameters here
pandas/core/indexes/datetimes.py
Outdated
periods : integer or None, default None | ||
If None, must specify start and end | ||
periods : integer, default None | ||
Number of dates to generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of periods
pandas/core/indexes/datetimes.py
Outdated
freq : string or DateOffset, default 'B' (business daily) | ||
Frequency strings can have multiples, e.g. '5H' | ||
Frequency strings can have multiples, e.g. '5, default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the example here looks wrong
pandas/core/indexes/interval.py
Outdated
'exactly two must be specified') | ||
|
||
# must all be same units or None | ||
arr = np.array(list(com._not_none(start, end, freq))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just do this above and compare len(arr) != 2
pandas/core/indexes/interval.py
Outdated
The length of each interval. Must be consistent with the | ||
type of start and end | ||
name : string, default None | ||
Name of the resulting IntervalIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a couple of examples?
closed : string or None, default None | ||
name : string, default None | ||
Name of the resulting TimedeltaIndex | ||
closed : string, default None | ||
Make the interval closed with respect to the given frequency to | ||
the 'left', 'right', or both sides (None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some examples
these are all fine: #17482 (comment) thanks!
Datetimelike should work with |
ping on push / green. |
d34b82c
to
925c72c
Compare
ping @jreback Made the review changes, and additionally:
|
pandas/core/indexes/interval.py
Outdated
raise ValueError("must specify 2 of start, end, periods") | ||
if is_number(start): | ||
is_datetime_interval = False | ||
elif start is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This not correct for the start is None (and end). these must be number, datetime, or timedelta. For datetime, timedelta, then you can wrap the Timestamp/Timedelta around.
925c72c
to
549c402
Compare
ping @jreback Updated datetime-like support. For some reason I originally thought that datetime-like didn't include timedelta, so I shortcut to just parsing timestamps. Updated with support for timedelta and added timedelta related tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see if you can condense the constructor a bit. yeah I know a lot is going on,.
@@ -2091,7 +2094,7 @@ def bdate_range(start=None, end=None, periods=None, freq='B', tz=None, | |||
def cdate_range(start=None, end=None, periods=None, freq='C', tz=None, | |||
normalize=True, name=None, closed=None, **kwargs): | |||
""" | |||
**EXPERIMENTAL** Return a fixed frequency datetime index, with | |||
**EXPERIMENTAL** Return a fixed frequency DatetimeIndex, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side issue we can prob remove the experimental (but new PR for that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this in #17554, as there were other doc related fixes that needed to be made.
pandas/core/indexes/interval.py
Outdated
|
||
iv_type = {'numeric': True, 'timestamp': True, 'timedelta': True} | ||
|
||
start = com._maybe_box_datetimelike(start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about a helper function to validate this instead of repeating code.
maybe
if is_number(start) or is_number(end):
boxer = type(start) if start is not None else type(end)
elif isinstance(start, Timestamp) or isinstance(end, Timestamp):
boxer = Timestamp
elif .....
....
else:
raise
@jreback : cleaned up the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking really good. just a couple of minor doc changes and a question
doc/source/whatsnew/v0.21.0.txt
Outdated
Consistency of Range Functions | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In previous versions, there were some inconsistencies between the various range functions: ``date_range``, ``bdate_range``, ``cdate_range``, ``interval_range``, ``period_range``, and ``timedelta_range``. (:issue:`17471`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add interval_range to the api.rst (might as well add cdate_range too), can you add these as :func:`date_range`
as well.
msg = 'end must be numeric or datetime-like, got {end}' | ||
raise ValueError(msg.format(end=end)) | ||
|
||
if is_float(periods): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much cleaner, nice!
raise ValueError("must specify 2 of start, end, periods") | ||
if is_number(endpoint): | ||
if periods is None: | ||
periods = int((end - start) // freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what we do elsewhere? this is effectively floored not that this is bad, just wondering if its consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's floored elsewhere too, e.g.
pandas/pandas/core/indexes/period.py
Lines 199 to 200 in f11bbf2
if is_float(periods): | |
periods = int(periods) |
pandas/pandas/core/indexes/datetimes.py
Lines 292 to 293 in fa557f7
if is_float(periods): | |
periods = int(periods) |
Clean error message, check for error in tests, fix typo in whatsnew.
addressed jreback's review, cleaned up docstrings, added documention, update to interval_range endpoint fix, additional tests
Made review related changes, implemented support for datetime-like input in interval_range.
d5579cb
to
f6cc860
Compare
@jreback : Made the doc changes. Didn't see an obvious location for |
thanks @jschendel nice PR, and very responsive! keep em' comin! |
@@ -218,10 +218,19 @@ Top-level dealing with datetimelike | |||
to_timedelta | |||
date_range | |||
bdate_range | |||
cdate_range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is not in the public API, so adding it here does not work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, see #17554
Consistency of Range Functions | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In previous versions, there were some inconsistencies between the various range functions: func:`date_range`, func:`bdate_range`, func:`cdate_range`, func:`period_range`, func:`timedelta_range`, and func:`interval_range`. (:issue:`17471`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing :
's for all references: :func:..
instead of func:..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and also cdate_range here does not work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, see #17554
git diff upstream/master -u -- "*.py" | flake8 --diff
Comments:
period_range
, so I created a new filetest_period_range.py
in a similar fashion to the existing filestest_date_range.py
andtest_timedelta_range.py
.with
context where appropriate, etc.).