-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Separate out non-scalar tests from scalar tests; move to ?? in follow-up #18142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18142 +/- ##
==========================================
- Coverage 91.38% 91.36% -0.02%
==========================================
Files 164 164
Lines 49790 49790
==========================================
- Hits 45501 45492 -9
- Misses 4289 4298 +9
Continue to review full report at Codecov.
|
@@ -640,27 +640,6 @@ def conv(v): | |||
# invalid | |||
pytest.raises(ValueError, ct, '- 1days, 00') | |||
|
|||
def test_overflow(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rather than move these within the same file I would like to create another directory level (like we have for indexes) and put them in separate files, e.g.
pandas/tests/scalar/timedeltas/test_basic.py
.....test_slicing.py
etc. much more organized to have separate units of tests in separate named files (rather than classes). be sure to update setup.py with the additional directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm amenable to that in general, but the relevantissue here is that the new classes (which I agree don't belong here) are not scalar tests at all. Some are for DatetimeIndex/TimedeltaIndex which I can figure out destinations for. I haven't looked at Series/DataFrame tests to see where those might go. Or there could be something like tests/tslibs/vectorized/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the would rather u do a PR which actually moves them to the right location rather than a temporary one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
Just pushed a commit moving these tests to the appropriate modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to be very careful with this, move less at a time maybe
Series, DataFrame, DatetimeIndex) | ||
|
||
|
||
class TestDataFrameTimestamps(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we already have test_timeseries.py, is there a reason you are creating a new module here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, will move.
ts = Timestamp(t) | ||
data[ts] = np.nan # works | ||
|
||
def test_to_html_timestamp(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob go with other html output tests (which I think are in formats)
result = df.to_html() | ||
assert '2000-01-01' in result | ||
|
||
def test_compare_invalid(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has nothing to do with frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it looks like an unnecessary use of the constructor, am assuming that someone had a reason.
s = Series(date_range('1/1/2000', periods=10)) | ||
|
||
def f(x): | ||
return (x.hour, x.day, x.month) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It relies on DataFrame (although the whole thing is a smoke test). Maybe rename test_series_map_box_timestamps--> test_map_box_timestamps?
def test_frame_setitem_timestamp(self): | ||
# GH#2155 | ||
columns = DatetimeIndex(start='1/1/2012', end='2/1/2012', | ||
freq=offsets.BDay()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? the columns
above are explicitly passed to the DataFrame constructor...
# extra fields from DatetimeIndex like quarter and week | ||
idx = tm.makeDateIndex(100) | ||
|
||
fields = ['dayofweek', 'dayofyear', 'week', 'weekofyear', 'quarter', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are in ops or misc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this class to test_ops. Keeping it together because I like the fact that it is reasonably circumscribed as "vectorized versions of Timestamp operations", i.e. except for the DatetimeIndex constructor, most of what is being tested is in tslibs.
periods=5).tz_localize('UTC').tz_convert('US/Eastern') | ||
result = dti.round('D') | ||
expected = date_range('20130101', periods=5).tz_localize('US/Eastern') | ||
tm.assert_index_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ops
Hah that was kinda the idea with the original version. |
return (x.hour, x.day, x.month) | ||
|
||
# it works! | ||
s.map(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split this test: 1 for series (move to series dir); leave other here but i think it goes in ops
or wherever apply tests are
# GH 8058 | ||
df = DataFrame(np.random.randn(5, 2)) | ||
a = df[0] | ||
b = Series(np.random.randn(5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a series test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Any complaint if I change the DataFrame call to use Series instead?
ts = Timestamp('20090415', tz=pytz.timezone('US/Eastern'), freq='D') | ||
assert ts == stamp | ||
|
||
def test_date_range_timestamp_equiv_explicit_dateutil(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be made a decorator FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a note to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the decorator usage work? grepping didn't turn up any examples.
|
||
def test_date_range_timestamp_equiv_explicit_dateutil(self): | ||
tm._skip_if_windows_python_3() | ||
from pandas._libs.tslibs.timezones import dateutil_gettz as gettz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import should be at the top
assert idx.freq == Timestamp(idx[-1], idx.freq).freq | ||
assert idx.freqstr == Timestamp(idx[-1], idx.freq).freqstr | ||
|
||
def test_round(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move with similar tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other test_round in this module already needs refactoring to use pytest.parametrize. Will follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that’s fine but move it next to it
class TestTimedeltaIndexVectorizedTimedelta(object): | ||
def test_contains(self): | ||
# Checking for any NaT-like objects | ||
# GH 13603 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an indexing test
for v in [pd.NaT, None, float('nan'), np.nan]: | ||
assert (v in td) | ||
|
||
def test_nat_converters(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a section on nat tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many of these spread around. I really think we should focus for now on getting non-scalar tests out of scalar and recognize that this is a more-than-one-PR task.
assert all(hash(td) == hash(td.to_pytimedelta()) for td in tds) | ||
|
||
def test_round(self): | ||
t1 = timedelta_range('1 days', periods=3, freq='1 min 2 s 3 us') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to like tests
@@ -410,17 +406,6 @@ def test_tz(self): | |||
assert conv.hour == 19 | |||
|
|||
def test_tz_localize_ambiguous(self): | |||
|
|||
ts = Timestamp('2014-11-02 01:00') | |||
ts_dst = ts.tz_localize('US/Eastern', ambiguous=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you moving this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this portion of the test was moved to a non-scalar class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a scalar test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to isolate tests for tslibs. The portion of the test that got moved uses date_range
, which puts it in the DatetimeIndex tests.
@@ -401,3 +401,44 @@ def test_series_box_timedelta(self): | |||
s = Series(rng) | |||
assert isinstance(s[1], Timedelta) | |||
assert isinstance(s.iat[2], Timedelta) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a note to parametrize this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear what "this" is in context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the calling of testit, simple enough should be to paramerize the units which its iterating
ts = Timestamp('20090415', tz=pytz.timezone('US/Eastern'), freq='D') | ||
assert ts == stamp | ||
|
||
def test_date_range_timestamp_equiv_explicit_dateutil(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a note to do this
@@ -502,66 +469,6 @@ def test_round(self): | |||
for freq in ['Y', 'M', 'foobar']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more round that should move?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the scalar test, pretty sure its in the right place.
@@ -676,9 +562,6 @@ def test_timedelta_hash_equality(self): | |||
d = {td: 2} | |||
assert d[v] == 2 | |||
|
|||
tds = timedelta_range('1 second', periods=20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did u move this? and not all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did u move this?
timedelta_range --> not scalar
and not all
You stopped in the middle of a
repr(series.index[0]) | ||
|
||
def test_getitem_pydatetime_tz(self): | ||
index = date_range(start='2012-12-24 16:00', end='2012-12-24 18:00', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indexing test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test uses methods of this class; let's be extra careful before breaking it up.
|
||
s = Series(np.random.randn(len(dr)), index=dr) | ||
|
||
# it works! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this belongs elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have something in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/tests/series/test_timeseries.py where asfreq is tested.
|
||
|
||
class TestDataFrameTimeZones(object): | ||
timezones = ['UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of GH issues related to timezones/timeseries/... and many of them involve behavior that is specific to DataFrame/Series/FooIndex. The goal for this PR (and presumably its a multi-PR goal) is to separate these cases to test in isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. this just collects all of the test_timezones tests that use pd.DataFrame.
scalar_idx_ser_df
can you rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do this in several stages. You are moving too much. Let's do the non-controversial things first.
@@ -11,6 +11,17 @@ | |||
class TestTimedeltaIndex(object): | |||
_multiprocess_can_split_ = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
# Checking for any NaT-like objects | ||
# GH 13603 | ||
td = pd.to_timedelta(range(5), unit='d') + pd.offsets.Hour(1) | ||
for v in [pd.NaT, None, float('nan'), np.nan]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe should consolidate these types of tests in scalar/test_nat.py (even though you are testing the index itself here).
@@ -401,3 +401,44 @@ def test_series_box_timedelta(self): | |||
s = Series(rng) | |||
assert isinstance(s[1], Timedelta) | |||
assert isinstance(s.iat[2], Timedelta) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the calling of testit, simple enough should be to paramerize the units which its iterating
t1a = timedelta_range('1 days', periods=3, freq='1 min 2 s') | ||
t1c = pd.TimedeltaIndex([1, 1, 1], unit='D') | ||
|
||
# note that negative times round DOWN! so don't give whole numbers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should parametrize this (TODO)
assert_series_equal(result, expected1) | ||
|
||
def test_localized_at_time_between_time(self): | ||
from datetime import time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imports to the top
|
||
s = Series(np.random.randn(len(dr)), index=dr) | ||
|
||
# it works! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/tests/series/test_timeseries.py where asfreq is tested.
@@ -65,6 +65,145 @@ def cmptz(self, tz1, tz2): | |||
# tests. | |||
return tz1.zone == tz2.zone | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to audit what you are moving here. A lot of these are fairly generic routines for series/dataframe that are actually testing the operation in question, and NOT the timezone per-se. Moving them here is really confusing.
Sounds good. |
closing as scope out of hand; replaced by multiple smaller prs |
The goal I have in mind is getting to the point where we can test (and measure coverage) tslibs in isolation. That means isolating DataFrame and Series tests from everything else.
This PR doesn't change any tests and doesn't move anything across modules, just puts DataFrame and Series tests in their own classes. Exactly what modules those belong in I haven't thought out.
If I did this right, it shouldn't have any overlap with other outstanding PRs.