Skip to content

Separate out non-scalar tests from scalar tests; move to ?? in follow-up #18142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

jbrockmendel
Copy link
Member

The goal I have in mind is getting to the point where we can test (and measure coverage) tslibs in isolation. That means isolating DataFrame and Series tests from everything else.

This PR doesn't change any tests and doesn't move anything across modules, just puts DataFrame and Series tests in their own classes. Exactly what modules those belong in I haven't thought out.

If I did this right, it shouldn't have any overlap with other outstanding PRs.

@gfyoung gfyoung added Clean Testing pandas testing functions or related to the test suite labels Nov 6, 2017
@codecov
Copy link

codecov bot commented Nov 6, 2017

Codecov Report

Merging #18142 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18142      +/-   ##
==========================================
- Coverage   91.38%   91.36%   -0.02%     
==========================================
  Files         164      164              
  Lines       49790    49790              
==========================================
- Hits        45501    45492       -9     
- Misses       4289     4298       +9
Flag Coverage Δ
#multiple 89.17% <ø> (ø) ⬆️
#single 39.49% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 774030c...16ccef4. Read the comment docs.

@@ -640,27 +640,6 @@ def conv(v):
# invalid
pytest.raises(ValueError, ct, '- 1days, 00')

def test_overflow(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rather than move these within the same file I would like to create another directory level (like we have for indexes) and put them in separate files, e.g.

pandas/tests/scalar/timedeltas/test_basic.py
.....test_slicing.py

etc. much more organized to have separate units of tests in separate named files (rather than classes). be sure to update setup.py with the additional directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm amenable to that in general, but the relevantissue here is that the new classes (which I agree don't belong here) are not scalar tests at all. Some are for DatetimeIndex/TimedeltaIndex which I can figure out destinations for. I haven't looked at Series/DataFrame tests to see where those might go. Or there could be something like tests/tslibs/vectorized/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the would rather u do a PR which actually moves them to the right location rather than a temporary one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

@jbrockmendel
Copy link
Member Author

Just pushed a commit moving these tests to the appropriate modules.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to be very careful with this, move less at a time maybe

Series, DataFrame, DatetimeIndex)


class TestDataFrameTimestamps(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we already have test_timeseries.py, is there a reason you are creating a new module here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, will move.

ts = Timestamp(t)
data[ts] = np.nan # works

def test_to_html_timestamp(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob go with other html output tests (which I think are in formats)

result = df.to_html()
assert '2000-01-01' in result

def test_compare_invalid(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has nothing to do with frame

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it looks like an unnecessary use of the constructor, am assuming that someone had a reason.

s = Series(date_range('1/1/2000', periods=10))

def f(x):
return (x.hour, x.day, x.month)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in series

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It relies on DataFrame (although the whole thing is a smoke test). Maybe rename test_series_map_box_timestamps--> test_map_box_timestamps?

def test_frame_setitem_timestamp(self):
# GH#2155
columns = DatetimeIndex(start='1/1/2012', end='2/1/2012',
freq=offsets.BDay())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

series

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? the columns above are explicitly passed to the DataFrame constructor...

# extra fields from DatetimeIndex like quarter and week
idx = tm.makeDateIndex(100)

fields = ['dayofweek', 'dayofyear', 'week', 'weekofyear', 'quarter',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are in ops or misc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this class to test_ops. Keeping it together because I like the fact that it is reasonably circumscribed as "vectorized versions of Timestamp operations", i.e. except for the DatetimeIndex constructor, most of what is being tested is in tslibs.

periods=5).tz_localize('UTC').tz_convert('US/Eastern')
result = dti.round('D')
expected = date_range('20130101', periods=5).tz_localize('US/Eastern')
tm.assert_index_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops

@jbrockmendel
Copy link
Member Author

you need to be very careful with this, move less at a time maybe

Hah that was kinda the idea with the original version.

return (x.hour, x.day, x.month)

# it works!
s.map(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split this test: 1 for series (move to series dir); leave other here but i think it goes in ops
or wherever apply tests are

# GH 8058
df = DataFrame(np.random.randn(5, 2))
a = df[0]
b = Series(np.random.randn(5))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a series test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Any complaint if I change the DataFrame call to use Series instead?

ts = Timestamp('20090415', tz=pytz.timezone('US/Eastern'), freq='D')
assert ts == stamp

def test_date_range_timestamp_equiv_explicit_dateutil(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be made a decorator FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a note to do this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the decorator usage work? grepping didn't turn up any examples.


def test_date_range_timestamp_equiv_explicit_dateutil(self):
tm._skip_if_windows_python_3()
from pandas._libs.tslibs.timezones import dateutil_gettz as gettz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import should be at the top

assert idx.freq == Timestamp(idx[-1], idx.freq).freq
assert idx.freqstr == Timestamp(idx[-1], idx.freq).freqstr

def test_round(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move with similar tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other test_round in this module already needs refactoring to use pytest.parametrize. Will follow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that’s fine but move it next to it

class TestTimedeltaIndexVectorizedTimedelta(object):
def test_contains(self):
# Checking for any NaT-like objects
# GH 13603
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an indexing test

for v in [pd.NaT, None, float('nan'), np.nan]:
assert (v in td)

def test_nat_converters(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a section on nat tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many of these spread around. I really think we should focus for now on getting non-scalar tests out of scalar and recognize that this is a more-than-one-PR task.

assert all(hash(td) == hash(td.to_pytimedelta()) for td in tds)

def test_round(self):
t1 = timedelta_range('1 days', periods=3, freq='1 min 2 s 3 us')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to like tests

@@ -410,17 +406,6 @@ def test_tz(self):
assert conv.hour == 19

def test_tz_localize_ambiguous(self):

ts = Timestamp('2014-11-02 01:00')
ts_dst = ts.tz_localize('US/Eastern', ambiguous=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you moving this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this portion of the test was moved to a non-scalar class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a scalar test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to isolate tests for tslibs. The portion of the test that got moved uses date_range, which puts it in the DatetimeIndex tests.

@@ -401,3 +401,44 @@ def test_series_box_timedelta(self):
s = Series(rng)
assert isinstance(s[1], Timedelta)
assert isinstance(s.iat[2], Timedelta)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a note to parametrize this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what "this" is in context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the calling of testit, simple enough should be to paramerize the units which its iterating

ts = Timestamp('20090415', tz=pytz.timezone('US/Eastern'), freq='D')
assert ts == stamp

def test_date_range_timestamp_equiv_explicit_dateutil(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a note to do this

@@ -502,66 +469,6 @@ def test_round(self):
for freq in ['Y', 'M', 'foobar']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more round that should move?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the scalar test, pretty sure its in the right place.

@@ -676,9 +562,6 @@ def test_timedelta_hash_equality(self):
d = {td: 2}
assert d[v] == 2

tds = timedelta_range('1 second', periods=20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did u move this? and not all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did u move this?

timedelta_range --> not scalar

and not all

You stopped in the middle of a

repr(series.index[0])

def test_getitem_pydatetime_tz(self):
index = date_range(start='2012-12-24 16:00', end='2012-12-24 18:00',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexing test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses methods of this class; let's be extra careful before breaking it up.


s = Series(np.random.randn(len(dr)), index=dr)

# it works!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this belongs elsewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have something in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas/tests/series/test_timeseries.py where asfreq is tested.



class TestDataFrameTimeZones(object):
timezones = ['UTC', 'Asia/Tokyo', 'US/Eastern', 'dateutil/US/Pacific']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of GH issues related to timezones/timeseries/... and many of them involve behavior that is specific to DataFrame/Series/FooIndex. The goal for this PR (and presumably its a multi-PR goal) is to separate these cases to test in isolation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. this just collects all of the test_timezones tests that use pd.DataFrame.

@jreback
Copy link
Contributor

jreback commented Nov 17, 2017

can you rebase

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this in several stages. You are moving too much. Let's do the non-controversial things first.

@@ -11,6 +11,17 @@
class TestTimedeltaIndex(object):
_multiprocess_can_split_ = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

# Checking for any NaT-like objects
# GH 13603
td = pd.to_timedelta(range(5), unit='d') + pd.offsets.Hour(1)
for v in [pd.NaT, None, float('nan'), np.nan]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should consolidate these types of tests in scalar/test_nat.py (even though you are testing the index itself here).

@@ -401,3 +401,44 @@ def test_series_box_timedelta(self):
s = Series(rng)
assert isinstance(s[1], Timedelta)
assert isinstance(s.iat[2], Timedelta)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the calling of testit, simple enough should be to paramerize the units which its iterating

t1a = timedelta_range('1 days', periods=3, freq='1 min 2 s')
t1c = pd.TimedeltaIndex([1, 1, 1], unit='D')

# note that negative times round DOWN! so don't give whole numbers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should parametrize this (TODO)

assert_series_equal(result, expected1)

def test_localized_at_time_between_time(self):
from datetime import time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports to the top


s = Series(np.random.randn(len(dr)), index=dr)

# it works!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas/tests/series/test_timeseries.py where asfreq is tested.

@@ -65,6 +65,145 @@ def cmptz(self, tz1, tz2):
# tests.
return tz1.zone == tz2.zone


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to audit what you are moving here. A lot of these are fairly generic routines for series/dataframe that are actually testing the operation in question, and NOT the timezone per-se. Moving them here is really confusing.

@jbrockmendel
Copy link
Member Author

I would do this in several stages. You are moving too much. Let's do the non-controversial things first.

Sounds good.

@jreback
Copy link
Contributor

jreback commented Nov 20, 2017

closing as scope out of hand; replaced by multiple smaller prs

@jreback jreback closed this Nov 20, 2017
@jbrockmendel jbrockmendel deleted the scalar_idx_ser_df branch December 8, 2017 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants