TST: Made s3 related tests mock boto #17388

kirkhansen · 2017-08-31T02:50:06Z

Kept a couple tests un-mocked for testing things like accessing a private bucket as that's hard to mock.

closes TST/CI: Use moto to mock S3 calls #17325
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2017-08-31T03:41:28Z

Codecov Report

Merging #17388 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17388      +/-   ##
==========================================
+ Coverage   91.18%    91.2%   +0.01%     
==========================================
  Files         163      163              
  Lines       49545    49545              
==========================================
+ Hits        45179    45188       +9     
+ Misses       4366     4357       -9

Flag	Coverage Δ
#multiple	`88.98% <ø> (+0.03%)`	⬆️
#single	`40.21% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.77% <0%> (-0.1%)`	⬇️
pandas/core/indexes/datetimes.py	`95.43% <0%> (-0.1%)`	⬇️
pandas/io/common.py	`68.64% <0%> (+0.84%)`	⬆️
pandas/io/s3.py	`85% <0%> (+85%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa557f7...8f149d8. Read the comment docs.

jreback

do not add moto to EVERY ci file, this completely defeats the purpose. only add where s3fs is atm. we specifically have certain types of tests marked as disabled. you need to handle moto NOT being installed (via a pytest.importorskip..). on windows only add on a single build, ok with adding the pip install but only add for the 3.6 build.

jreback · 2017-08-31T10:27:36Z

ci/requirements-2.7.pip

@@ -8,3 +8,4 @@ py
 PyCrypto


pls make sure to add a return at the end of the line; that's the red character in the display

jreback

why are you adding tips files (as .csv) that already exist elsewhere?

jreback · 2017-08-31T10:27:44Z

ci/requirements-2.7_BUILD_TEST.pip

@@ -5,3 +5,4 @@ pandas_gbq
 pandas_datareader
 statsmodels
 scikit-learn


jreback · 2017-08-31T10:27:53Z

ci/requirements-2.7_COMPAT.pip

@@ -2,3 +2,4 @@ html5lib==1.0b2
 beautifulsoup4==4.2.0
 openpyxl
 argparse


jreback · 2017-08-31T10:27:58Z

ci/requirements-2.7_LOCALE.pip

@@ -1,3 +1,4 @@
 html5lib==1.0b2
 beautifulsoup4==4.2.1
 blosc


jreback · 2017-08-31T10:28:10Z

ci/requirements-2.7_SLOW.pip

@@ -0,0 +1 @@
+moto


jreback · 2017-08-31T10:28:35Z

ci/requirements-2.7_WIN.pip

@@ -0,0 +1 @@
+moto


remove but leave the file

jreback · 2017-08-31T10:29:48Z

ci/requirements-3.6_LOCALE_SLOW.pip

@@ -0,0 +1 @@
+moto


jreback · 2017-08-31T10:29:52Z

ci/requirements-3.6_NUMPY_DEV.pip

@@ -0,0 +1 @@
+moto


jreback · 2017-08-31T10:30:01Z

ci/requirements-3.6_WIN.pip

@@ -0,0 +1 @@
+moto


this is ok, add a return

jreback · 2017-08-31T10:30:22Z

pandas/tests/io/parser/test_network.py


 import pandas.util.testing as tm
 from pandas import DataFrame
 from pandas.io.parsers import read_csv, read_table

+TIPS_FILE = os.path.join(tm.get_data_path(), 'tips.csv')


should be a fixture

jreback · 2017-08-31T10:30:48Z

pandas/tests/io/parser/test_network.py


 @pytest.fixture(scope='module')
 def salaries_table():
    path = os.path.join(tm.get_data_path(), 'salaries.csv')
    return read_table(path)


+@pytest.fixture(scope='module')
+def test_s3_resource(request):
+    pytest.importorskip('s3fs')


skip on moto not installed

TomAugspurger · 2017-08-31T11:43:38Z

pandas/tests/io/parser/test_network.py

 import pytest
+import six
+from moto import mock_s3


This import may have to be inside the fixture, so that the test module can be imported without moto being installed.

TomAugspurger · 2017-08-31T11:47:24Z

why are you adding tips files (as .csv) that already exist elsewhere?

I don't think we have compressed versions of those in the repo.

kirkhansen · 2017-09-01T02:11:44Z

do not add moto to EVERY ci file, this completely defeats the purpose. only add where s3fs is atm. we specifically have certain types of tests marked as disabled. you need to handle moto NOT being installed (via a pytest.importorskip..). on windows only add on a single build, ok with adding the pip install but only add for the 3.6 build.

@jreback what is the purpose I've defeated if I may ask? I suppose I can understand skipping some tests based on s3fs not being installed to test that pandas can handle import errors / can run without optional dependencies or something like that, but installing different test dependencies per build is unnecessary and potentially dangerous. It also makes the build process needlessly complicated. I spent an extensive amount of time reading the build scripts and studying the failed build output to figure out which requirements files were being read in each build. I was finally frustrated enough that I just added it to every .pip file so at least all the environments would have that test-only dependency. Most of my time was spent waiting for the build since it took about 25 minutes or so to witness a failure.

Skipping tests based on test-only dependencies can also be dangerous as you could implement a test that has a skip, and never realize the test would actually fail when it should have due to a bug the test would have otherwise found. 48 requirements files seems a bit excessive. Skipping based on test-only dependencies is a bad idea, IMHO.

TomAugspurger · 2017-09-01T11:54:23Z

Part of the reason for the complicated requirements files is the need to support multiple versions of libraries like numpy, matplotlib, etc. We'll need one file per environment for them.

For optional deps though, it may be simpler to install them in every environment, especially test-only deps like moto or mock.

I like how dask handles checking for accidental use of optional dependencies here

kirkhansen · 2017-09-01T21:57:50Z

@TomAugspurger dask's handling of that is nice! Good call on the different versions; that makes sense.
I'm still fairly new to the github interface for pull requests. Is there a proper protocol to notify @jreback that I believe the requested changes have been dealt with?

TomAugspurger · 2017-09-04T19:56:01Z

pandas/tests/io/parser/test_network.py

+    conn.create_bucket(Bucket='cant_get_it', ACL='private')
+    add_tips_files('cant_get_it')
+
+    def teardown():


I think that as of pytest 3.0, the preferred way of doing teardown / finalization is to simply yield and then run the teardown: https://docs.pytest.org/en/latest/fixture.html#fixture-finalization-executing-teardown-code. It's a bit simpler.

I had it that way originally, but saw this on that same page

Finalizers will always be called regardless if the fixture setup code raises an exception. This is handy to properly close all resources created by a fixture even if one of them fails to be created/acquired

For this case that probably doesn't matter, but I wasn't sure which was preferred. I can change that to a yield.

TomAugspurger · 2017-09-04T19:57:49Z

pandas/tests/io/parser/test_network.py

@@ -50,151 +93,142 @@ def check_compressed_urls(salaries_table, compression, extension, mode,
    tm.assert_frame_equal(url_table, salaries_table)


-class TestS3(object):


I think we'll still keep the class TestS3, simply for the namespacing, so you can do pytest pandas/tests/io/parser/test_network::TestS3 to hit all the s3 tests.

That's fine. We may want to add something to the contributing docs then to ward people like me from changing existing tests to the 'more functional style' then.

https://pandas.pydata.org/pandas-docs/stable/contributing.html#transitioning-to-pytest

Going forward, we are moving to a more functional style using the pytest framework, which offers a richer testing framework that will facilitate testing and developing. Thus, instead of writing test classes, we will write test functions like this...

TomAugspurger · 2017-09-04T20:01:54Z

pandas/tests/io/parser/test_network.py

-        # boto3 is a dependency of s3fs
-        import boto3
-        client = boto3.client("s3")
+def test_read_csv__handles_boto_s3_object(test_s3_resource, tips_file):


extra _ between csv and handles

That's a habit I've formed at my current position. We name our tests like test_name_of_function__the_thing_your_testing. I can remove that.

TomAugspurger · 2017-09-04T20:10:08Z

pandas/tests/io/parser/test_network.py

-        result = read_csv(s3_object["Body"])
-        assert isinstance(result, DataFrame)
-        assert not result.empty
+    result = read_csv(six.BytesIO(s3_object["Body"].read()), encoding='utf8')


six isn't a dependency of pandas (though it probably is transitively), but you can do from pandas.compat import BytesIO.

I'm curious why you needed it now, when we didn't before. Does simply passing encoding='utf-8' to read_csv work?

I'm not sure why that stopped working when I starting using moto...
The error I get using result = read_csv(s3_object["Body"], encoding='utf8') is

if not is_file_like(filepath_or_buffer): msg = "Invalid file path or buffer object type: {_type}" > raise ValueError(msg.format(_type=type(filepath_or_buffer))) E ValueError: Invalid file path or buffer object type: <class 'botocore.response.StreamingBody'> pandas/io/common.py:219: ValueError

Making it a BytesIO explicitly makes pandas happy about it.
I'm also running python 3.6.0 locally.

jreback · 2017-09-04T21:55:54Z

pandas/tests/io/parser/test_network.py

 def salaries_table():
    path = os.path.join(tm.get_data_path(), 'salaries.csv')
    return read_table(path)


+@pytest.fixture(scope='module')


this is a test and not a fixture

jreback · 2017-09-04T21:56:17Z

pandas/tests/io/parser/test_network.py

+    pytest.importorskip('s3fs')
+    moto = pytest.importorskip('moto')
+    moto.mock_s3().start()
+


if u r trying to test 3 things then use parametrize

jreback · 2017-09-04T21:56:49Z

pandas/tests/io/parser/test_network.py

+        ('tips.csv.bz2', tips_file + '.bz2'),
+    ]
+
+    def add_tips_files(bucket_name):


this becomes inline when u use parametrize

jreback · 2017-09-04T21:58:25Z

pandas/tests/io/parser/test_network.py

+        moto.mock_s3().stop()
+    request.addfinalizer(teardown)
+
+    return conn


it looks like you need to create a proper yield fixture that does the setup and tear down

the test itself is pretty simple

jreback · 2017-09-04T21:58:48Z

pandas/tests/io/parser/test_network.py

+    pytest.importorskip('s3fs')
+    # more of an integration test due to the not-public contents portion
+    # can probably mock this though.
+    for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]:


use parametrize

jreback · 2017-09-04T22:00:06Z

pandas/tests/io/parser/test_network.py

+        tm.get_data_path('tips.csv')).iloc[:10], df)
+
+
+def test_parse_public_s3_bucket_nrows(test_s3_resource):


since u have so many s3 tests would be ok with making a new file just for them (ok leaving them here too)

jreback · 2017-09-04T22:00:20Z

pandas/tests/io/parser/test_network.py

+    # Read with a chunksize
+    chunksize = 5
+    local_tips = read_csv(tm.get_data_path('tips.csv'))
+    for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]:


use parametrize

jreback · 2017-09-04T22:00:29Z

pandas/tests/io/parser/test_network.py

-        # Read public file from bucket with not-public contents
-        df = read_csv('s3://cant_get_it/tips.csv')
+def test_parse_public_s3_bucket_python(test_s3_resource):
+    for ext, comp in [('', None), ('.gz', 'gzip'), ('.bz2', 'bz2')]:


pep8speaks · 2017-09-04T22:06:36Z

Hello @kirkhansen! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 14, 2017 at 02:26 Hours UTC

jreback · 2017-09-05T10:28:24Z

@kirkhansen with regards to #17388 (comment)

we have many builds and have a fairly comprehensive test infrastructure in order to support many goals.

backcompat testing of specific versions
slow/fast testing
having builds not take excessive time
easy interpretation of failing results

so we don't need every dependency running on every single build. sure the vast majority of them do, but the io sub-systems in particular are somewhat segregated to specific builds.

Imagine you are a contributor diagnosing failures, having some builds succeed, while having failures isolated to a single or small number of builds is generally a good thing.

testing everything on every build is nice to do in a smaller project, but becomes infeasible in larger ones.

finally not having a test mock in every build forces the test writer care of mocking properly (e.g. making sure imports are satisfied and such, making testing more robust).

TomAugspurger · 2017-09-05T12:48:23Z

so we don't need every dependency running on every single build.

That's true for some dependencies, but I think docs and test-only dependencies should always be installed:

pytest
pytest-xdist
pytest-cov
flake8
sphinx
nbsphinx
moto

finally not having a test mock in every build forces the test writer care of mocking properly

How so? The behavior of

ImportError if s3fs isn't installed
succeed if it is

is the same, regardless of whether moto is installed or not.

TomAugspurger

@kirkhansen about the BytesIO from here, that test wasn't being run, since before, since it didn't start with test_ 😬 . Thanks for fixing that.

jreback · 2017-09-06T12:12:47Z

ok I'll buy that we should make moto a standard test install. so remove moto from all .pip/.build files, and simply add where we install pytest (for both windows and linux), then it will always be available. would be ok removing the import skips as well. pls add to the ci\requirements_dev.txt though

kirkhansen · 2017-09-12T02:59:39Z

so remove moto from all .pip/.build files, and simply add where we install pytest

It appears pytest is being installed in the install_travis.sh, install_circle.sh, and appveyor.yml via conda. It also appears that the conda-forge channel is only added to some tests causing lots of build fails. I assume there is good reason to not include the conda-forge channel on all builds. How would you like me to proceed, @jreback?

jreback · 2017-09-12T11:14:10Z

It appears pytest is being installed in the install_travis.sh, install_circle.sh, and appveyor.yml via conda. It also appears that the conda-forge channel is only added to some tests causing lots of build fails. I assume there is good reason to not include the conda-forge channel on all builds. How would you like me to proceed, @jreback?

dont' install the conda version, just pip install, its right under the conda line (e.g. we pip install pytest-xdist)

jreback · 2017-09-13T00:18:40Z

ci/requirements_dev.txt

@@ -5,3 +5,4 @@ cython
 pytest>=3.1.0


remove the no change files e.g

git checkout master ci/requirements-3.6_NUMPY_DEV.pip

I added those files when I was attempting to install moto via pip in those environments; they don't exist in master. Do we want to get rid of them?

actually these are empty, so ok

jreback

minor spelling change. otherwise lgtm. ping on green.

jreback · 2017-09-13T12:28:52Z

pandas/tests/io/parser/test_network.py

@@ -19,6 +26,40 @@ def salaries_table():
    return read_table(path)


+@pytest.fixture(scope='module')
+def test_s3_resource(tips_file):


can you rename to s3_resources (test is reserved for an actual test)

Kept a couple around for testing things like accessing a private bucket as that's hard to mock. Try the pip counterparts Some more merge request changes

jreback · 2017-09-14T10:14:51Z

thanks @kirkhansen nice PR!

TomAugspurger · 2017-09-14T14:58:51Z

Thanks Kirk!

…

On Thu, Sep 14, 2017 at 5:15 AM, Jeff Reback ***@***.***> wrote: thanks @kirkhansen <https://github.com/kirkhansen> nice PR! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17388 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIvN6bbTTSr_TXApF5fDpRpRMGjeFks5siPyqgaJpZM4PIPA_> .

kirkhansen · 2017-09-15T13:31:23Z

my pleasure!

jreback requested changes Aug 31, 2017

View reviewed changes

jreback reviewed Aug 31, 2017

View reviewed changes

jreback requested changes Aug 31, 2017

View reviewed changes

TomAugspurger reviewed Aug 31, 2017

View reviewed changes

gfyoung added Testing pandas testing functions or related to the test suite CI Continuous Integration labels Aug 31, 2017

kirkhansen force-pushed the moto-s3-tests branch from c4264a2 to 23aa1d1 Compare September 1, 2017 01:33

TomAugspurger reviewed Sep 4, 2017

View reviewed changes

jreback requested changes Sep 4, 2017

View reviewed changes

kirkhansen force-pushed the moto-s3-tests branch from 175389d to 84565d9 Compare September 4, 2017 22:06

TomAugspurger approved these changes Sep 5, 2017

View reviewed changes

kirkhansen force-pushed the moto-s3-tests branch from 2e5e761 to c5176d6 Compare September 12, 2017 23:45

jreback reviewed Sep 13, 2017

View reviewed changes

kirkhansen force-pushed the moto-s3-tests branch from c5176d6 to eb4f6ef Compare September 13, 2017 04:12

jreback requested changes Sep 13, 2017

View reviewed changes

jreback approved these changes Sep 13, 2017

View reviewed changes

jreback added this to the 0.21.0 milestone Sep 13, 2017

TST: Made s3 related tests mock boto

8f149d8

Kept a couple around for testing things like accessing a private bucket as that's hard to mock. Try the pip counterparts Some more merge request changes

kirkhansen force-pushed the moto-s3-tests branch from eb4f6ef to 8f149d8 Compare September 14, 2017 02:26

jreback merged commit 97abd2c into pandas-dev:master Sep 14, 2017

kirkhansen deleted the moto-s3-tests branch September 15, 2017 13:31

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

TST: Made s3 related tests mock boto (pandas-dev#17388)

59437ea

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

TST: Made s3 related tests mock boto (pandas-dev#17388)

b707680

		@@ -50,151 +93,142 @@ def check_compressed_urls(salaries_table, compression, extension, mode,
		tm.assert_frame_equal(url_table, salaries_table)


		class TestS3(object):

		tm.get_data_path('tips.csv')).iloc[:10], df)


		def test_parse_public_s3_bucket_nrows(test_s3_resource):

TST: Made s3 related tests mock boto #17388

TST: Made s3 related tests mock boto #17388

Conversation

kirkhansen commented Aug 31, 2017

codecov bot commented Aug 31, 2017 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Aug 31, 2017

kirkhansen commented Sep 1, 2017

TomAugspurger commented Sep 1, 2017

kirkhansen commented Sep 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kirkhansen Sep 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Sep 4, 2017 • edited Loading

Comment last updated on September 14, 2017 at 02:26 Hours UTC

jreback commented Sep 5, 2017

TomAugspurger commented Sep 5, 2017

TomAugspurger left a comment

Choose a reason for hiding this comment

jreback commented Sep 6, 2017

kirkhansen commented Sep 12, 2017 • edited Loading

jreback commented Sep 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 14, 2017

TomAugspurger commented Sep 14, 2017 via email

kirkhansen commented Sep 15, 2017

codecov bot commented Aug 31, 2017 •

edited

Loading

kirkhansen Sep 4, 2017 •

edited

Loading

pep8speaks commented Sep 4, 2017 •

edited

Loading

kirkhansen commented Sep 12, 2017 •

edited

Loading