-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: Made s3 related tests mock boto #17388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17388 +/- ##
==========================================
+ Coverage 91.18% 91.2% +0.01%
==========================================
Files 163 163
Lines 49545 49545
==========================================
+ Hits 45179 45188 +9
+ Misses 4366 4357 -9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not add moto to EVERY ci file, this completely defeats the purpose. only add where s3fs
is atm. we specifically have certain types of tests marked as disabled. you need to handle moto
NOT being installed (via a pytest.importorskip..
). on windows only add on a single build, ok with adding the pip install but only add for the 3.6 build.
ci/requirements-2.7.pip
Outdated
@@ -8,3 +8,4 @@ py | |||
PyCrypto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls make sure to add a return at the end of the line; that's the red character in the display
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you adding tips files (as .csv) that already exist elsewhere?
ci/requirements-2.7_BUILD_TEST.pip
Outdated
@@ -5,3 +5,4 @@ pandas_gbq | |||
pandas_datareader | |||
statsmodels | |||
scikit-learn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
ci/requirements-2.7_COMPAT.pip
Outdated
@@ -2,3 +2,4 @@ html5lib==1.0b2 | |||
beautifulsoup4==4.2.0 | |||
openpyxl | |||
argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
ci/requirements-2.7_LOCALE.pip
Outdated
@@ -1,3 +1,4 @@ | |||
html5lib==1.0b2 | |||
beautifulsoup4==4.2.1 | |||
blosc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
ci/requirements-2.7_SLOW.pip
Outdated
@@ -0,0 +1 @@ | |||
moto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
ci/requirements-2.7_WIN.pip
Outdated
@@ -0,0 +1 @@ | |||
moto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove but leave the file
ci/requirements-3.6_LOCALE_SLOW.pip
Outdated
@@ -0,0 +1 @@ | |||
moto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
ci/requirements-3.6_NUMPY_DEV.pip
Outdated
@@ -0,0 +1 @@ | |||
moto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
ci/requirements-3.6_WIN.pip
Outdated
@@ -0,0 +1 @@ | |||
moto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok, add a return
|
||
import pandas.util.testing as tm | ||
from pandas import DataFrame | ||
from pandas.io.parsers import read_csv, read_table | ||
|
||
TIPS_FILE = os.path.join(tm.get_data_path(), 'tips.csv') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a fixture
|
||
@pytest.fixture(scope='module') | ||
def salaries_table(): | ||
path = os.path.join(tm.get_data_path(), 'salaries.csv') | ||
return read_table(path) | ||
|
||
|
||
@pytest.fixture(scope='module') | ||
def test_s3_resource(request): | ||
pytest.importorskip('s3fs') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip on moto not installed
import pytest | ||
import six | ||
from moto import mock_s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import may have to be inside the fixture, so that the test module can be imported without moto being installed.
I don't think we have compressed versions of those in the repo. |
c4264a2
to
23aa1d1
Compare
@jreback what is the purpose I've defeated if I may ask? I suppose I can understand skipping some tests based on s3fs not being installed to test that pandas can handle import errors / can run without optional dependencies or something like that, but installing different test dependencies per build is unnecessary and potentially dangerous. It also makes the build process needlessly complicated. I spent an extensive amount of time reading the build scripts and studying the failed build output to figure out which requirements files were being read in each build. I was finally frustrated enough that I just added it to every .pip file so at least all the environments would have that test-only dependency. Most of my time was spent waiting for the build since it took about 25 minutes or so to witness a failure. Skipping tests based on test-only dependencies can also be dangerous as you could implement a test that has a skip, and never realize the test would actually fail when it should have due to a bug the test would have otherwise found. 48 requirements files seems a bit excessive. Skipping based on test-only dependencies is a bad idea, IMHO. |
Part of the reason for the complicated requirements files is the need to support multiple versions of libraries like numpy, matplotlib, etc. We'll need one file per environment for them. For optional deps though, it may be simpler to install them in every environment, especially test-only deps like moto or mock. I like how dask handles checking for accidental use of optional dependencies here |
@TomAugspurger dask's handling of that is nice! Good call on the different versions; that makes sense. |
conn.create_bucket(Bucket='cant_get_it', ACL='private') | ||
add_tips_files('cant_get_it') | ||
|
||
def teardown(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that as of pytest 3.0, the preferred way of doing teardown / finalization is to simply yield
and then run the teardown: https://docs.pytest.org/en/latest/fixture.html#fixture-finalization-executing-teardown-code. It's a bit simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it that way originally, but saw this on that same page
Finalizers will always be called regardless if the fixture setup code raises an exception. This is handy to properly close all resources created by a fixture even if one of them fails to be created/acquired
For this case that probably doesn't matter, but I wasn't sure which was preferred. I can change that to a yield.
@@ -50,151 +93,142 @@ def check_compressed_urls(salaries_table, compression, extension, mode, | |||
tm.assert_frame_equal(url_table, salaries_table) | |||
|
|||
|
|||
class TestS3(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll still keep the class TestS3
, simply for the namespacing, so you can do pytest pandas/tests/io/parser/test_network::TestS3
to hit all the s3 tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. We may want to add something to the contributing docs then to ward people like me from changing existing tests to the 'more functional style' then.
https://pandas.pydata.org/pandas-docs/stable/contributing.html#transitioning-to-pytest
Going forward, we are moving to a more functional style using the pytest framework, which offers a richer testing framework that will facilitate testing and developing. Thus, instead of writing test classes, we will write test functions like this...
# boto3 is a dependency of s3fs | ||
import boto3 | ||
client = boto3.client("s3") | ||
def test_read_csv__handles_boto_s3_object(test_s3_resource, tips_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra _
between csv
and handles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a habit I've formed at my current position. We name our tests like test_name_of_function__the_thing_your_testing
. I can remove that.
result = read_csv(s3_object["Body"]) | ||
assert isinstance(result, DataFrame) | ||
assert not result.empty | ||
result = read_csv(six.BytesIO(s3_object["Body"].read()), encoding='utf8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
six
isn't a dependency of pandas (though it probably is transitively), but you can do from pandas.compat import BytesIO
.
I'm curious why you needed it now, when we didn't before. Does simply passing encoding='utf-8'
to read_csv
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why that stopped working when I starting using moto...
The error I get using result = read_csv(s3_object["Body"], encoding='utf8')
is
if not is_file_like(filepath_or_buffer):
msg = "Invalid file path or buffer object type: {_type}"
> raise ValueError(msg.format(_type=type(filepath_or_buffer)))
E ValueError: Invalid file path or buffer object type: <class 'botocore.response.StreamingBody'>
pandas/io/common.py:219: ValueError
Making it a BytesIO explicitly makes pandas happy about it.
I'm also running python 3.6.0 locally.
def salaries_table(): | ||
path = os.path.join(tm.get_data_path(), 'salaries.csv') | ||
return read_table(path) | ||
|
||
|
||
@pytest.fixture(scope='module') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a test and not a fixture
pytest.importorskip('s3fs') | ||
moto = pytest.importorskip('moto') | ||
moto.mock_s3().start() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if u r trying to test 3 things then use parametrize
('tips.csv.bz2', tips_file + '.bz2'), | ||
] | ||
|
||
def add_tips_files(bucket_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this becomes inline when u use parametrize
moto.mock_s3().stop() | ||
request.addfinalizer(teardown) | ||
|
||
return conn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like you need to create a proper yield fixture that does the setup and tear down
the test itself is pretty simple
pytest.importorskip('s3fs') | ||
# more of an integration test due to the not-public contents portion | ||
# can probably mock this though. | ||
for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use parametrize
tm.get_data_path('tips.csv')).iloc[:10], df) | ||
|
||
|
||
def test_parse_public_s3_bucket_nrows(test_s3_resource): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since u have so many s3 tests would be ok with making a new file just for them (ok leaving them here too)
# Read with a chunksize | ||
chunksize = 5 | ||
local_tips = read_csv(tm.get_data_path('tips.csv')) | ||
for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use parametrize
# Read public file from bucket with not-public contents | ||
df = read_csv('s3://cant_get_it/tips.csv') | ||
def test_parse_public_s3_bucket_python(test_s3_resource): | ||
for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
175389d
to
84565d9
Compare
Hello @kirkhansen! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on September 14, 2017 at 02:26 Hours UTC |
@kirkhansen with regards to #17388 (comment) we have many builds and have a fairly comprehensive test infrastructure in order to support many goals.
so we don't need every dependency running on every single build. sure the vast majority of them do, but the io sub-systems in particular are somewhat segregated to specific builds. Imagine you are a contributor diagnosing failures, having some builds succeed, while having failures isolated to a single or small number of builds is generally a good thing. testing everything on every build is nice to do in a smaller project, but becomes infeasible in larger ones. finally not having a test mock in every build forces the test writer care of mocking properly (e.g. making sure imports are satisfied and such, making testing more robust). |
That's true for some dependencies, but I think docs and test-only dependencies should always be installed:
How so? The behavior of
is the same, regardless of whether moto is installed or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kirkhansen about the BytesIO
from here, that test wasn't being run, since before, since it didn't start with test_
😬 . Thanks for fixing that.
ok I'll buy that we should make |
It appears pytest is being installed in the |
dont' install the conda version, just pip install, its right under the conda line (e.g. we pip install pytest-xdist) |
2e5e761
to
c5176d6
Compare
@@ -5,3 +5,4 @@ cython | |||
pytest>=3.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the no change files e.g
git checkout master ci/requirements-3.6_NUMPY_DEV.pip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added those files when I was attempting to install moto via pip in those environments; they don't exist in master. Do we want to get rid of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually these are empty, so ok
c5176d6
to
eb4f6ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor spelling change. otherwise lgtm. ping on green.
@@ -19,6 +26,40 @@ def salaries_table(): | |||
return read_table(path) | |||
|
|||
|
|||
@pytest.fixture(scope='module') | |||
def test_s3_resource(tips_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename to s3_resources (test is reserved for an actual test)
Kept a couple around for testing things like accessing a private bucket as that's hard to mock. Try the pip counterparts Some more merge request changes
eb4f6ef
to
8f149d8
Compare
thanks @kirkhansen nice PR! |
Thanks Kirk!
…On Thu, Sep 14, 2017 at 5:15 AM, Jeff Reback ***@***.***> wrote:
thanks @kirkhansen <https://github.com/kirkhansen> nice PR!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17388 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIvN6bbTTSr_TXApF5fDpRpRMGjeFks5siPyqgaJpZM4PIPA_>
.
|
my pleasure! |
Kept a couple tests un-mocked for testing things like accessing a private bucket as that's hard to mock.
git diff upstream/master -u -- "*.py" | flake8 --diff