Skip to content

BF: mark two tests (test_{statsmodels,seaborn}) as requiring network #19754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

yarikoptic
Copy link
Contributor

  • [notworthit] closes #xxxx
  • [nowtheywill] tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • [notworthit] whatsnew entry

@@ -73,6 +74,7 @@ def test_scikit_learn(df):
clf.predict(digits.data[-1:])


@tm.network
def test_seaborn():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these failing somewhere?
these actually require network under the hood?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding this one -- not sure but I guess so -- we had to skip it in Debian.

The next one (statsmodels) -- positive, you could see it requesting a dataset there. anyways -- here is a "hands on demonstration" (without those decorators)

$> http_proxy=http://1.2.3.4 https_proxy=http://13.2.4.4 HOME=/tmp/temphome pytest -s -v usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py
============================================= test session starts ==============================================
platform linux2 -- Python 2.7.14+, pytest-3.2.1, py-1.4.34, pluggy-0.4.0 -- /usr/bin/python
cachedir: ../../.cache
rootdir: /home/yoh/deb/gits/pkg-exppsy/pandas, inifile: setup.cfg
plugins: xdist-1.18.2, localserver-0.3.7, cov-2.5.1, hypothesis-3.44.1, celery-4.1.0
collected 9 items                                                                                               

usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_dask SKIPPED
usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_xarray SKIPPED
usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_statsmodels FAILED
usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_scikit_learn PASSED
usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_seaborn FAILED
usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_pandas_gbq SKIPPED
usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_geopandas SKIPPED
usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_pyarrow SKIPPED
usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_pandas_datareader <- debian/tmp/usr/lib/python2.7/dist-packages/pandas/util/testing.py SKIPPED

=================================================== FAILURES ===================================================
_______________________________________________ test_statsmodels _______________________________________________

    def test_statsmodels():
    
        statsmodels = import_module('statsmodels')  # noqa
        import statsmodels.api as sm
        import statsmodels.formula.api as smf
>       df = sm.datasets.get_rdataset("Guerry", "HistData").data

usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py:60: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib/python2.7/dist-packages/statsmodels/datasets/utils.py:290: in get_rdataset
    data, from_cache = _get_data(data_base_url, dataname, cache)
/usr/lib/python2.7/dist-packages/statsmodels/datasets/utils.py:221: in _get_data
    data, from_cache = _urlopen_cached(url, cache)
/usr/lib/python2.7/dist-packages/statsmodels/datasets/utils.py:212: in _urlopen_cached
    data = urlopen(url).read()
/usr/lib/python2.7/urllib2.py:154: in urlopen
    return opener.open(url, data, timeout)
/usr/lib/python2.7/urllib2.py:429: in open
    response = self._open(req, data)
/usr/lib/python2.7/urllib2.py:447: in _open
    '_open', req)
/usr/lib/python2.7/urllib2.py:407: in _call_chain
    result = func(*args)
/usr/lib/python2.7/urllib2.py:1241: in https_open
    context=self._context)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <urllib2.HTTPSHandler instance at 0x7f9f8393d3f8>
http_class = <class httplib.HTTPSConnection at 0x7f9f94521328>
req = <urllib2.Request instance at 0x7f9f8393d488>, http_conn_args = {'context': None}, host = '13.2.4.4'
h = <httplib.HTTPSConnection instance at 0x7f9f8393dab8>, tunnel_headers = {}
proxy_auth_hdr = 'Proxy-Authorization', err = error(110, 'Connection timed out')

    def do_open(self, http_class, req, **http_conn_args):
        """Return an addinfourl object for the request, using http_class.
    
            http_class must implement the HTTPConnection API from httplib.
            The addinfourl return value is a file-like object.  It also
            has methods and attributes including:
                - info(): return a mimetools.Message object for the headers
                - geturl(): return the original request URL
                - code: HTTP status code
            """
        host = req.get_host()
        if not host:
            raise URLError('no host given')
    
        # will parse host:port
        h = http_class(host, timeout=req.timeout, **http_conn_args)
        h.set_debuglevel(self._debuglevel)
    
        headers = dict(req.unredirected_hdrs)
        headers.update(dict((k, v) for k, v in req.headers.items()
                            if k not in headers))
    
        # We want to make an HTTP/1.1 request, but the addinfourl
        # class isn't prepared to deal with a persistent connection.
        # It will try to read all remaining data from the socket,
        # which will block while the server waits for the next request.
        # So make sure the connection gets closed after the (only)
        # request.
        headers["Connection"] = "close"
        headers = dict(
            (name.title(), val) for name, val in headers.items())
    
        if req._tunnel_host:
            tunnel_headers = {}
            proxy_auth_hdr = "Proxy-Authorization"
            if proxy_auth_hdr in headers:
                tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr]
                # Proxy-Authorization should not be sent to origin
                # server.
                del headers[proxy_auth_hdr]
            h.set_tunnel(req._tunnel_host, headers=tunnel_headers)
    
        try:
            h.request(req.get_method(), req.get_selector(), req.data, headers)
        except socket.error, err: # XXX what error?
            h.close()
>           raise URLError(err)
E           URLError: <urlopen error [Errno 110] Connection timed out>

/usr/lib/python2.7/urllib2.py:1198: URLError
_________________________________________________ test_seaborn _________________________________________________

    def test_seaborn():
    
        seaborn = import_module('seaborn')
>       tips = seaborn.load_dataset("tips")

usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py:78: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/lib/python2.7/dist-packages/seaborn/utils.py:394: in load_dataset
    urlretrieve(full_path, cache_path)
/usr/lib/python2.7/urllib.py:98: in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
/usr/lib/python2.7/urllib.py:245: in retrieve
    fp = self.open(url, data)
/usr/lib/python2.7/urllib.py:213: in open
    return getattr(self, name)(url)
/usr/lib/python2.7/urllib.py:350: in open_http
    h.endheaders(data)
/usr/lib/python2.7/httplib.py:1038: in endheaders
    self._send_output(message_body)
/usr/lib/python2.7/httplib.py:882: in _send_output
    self.send(msg)
/usr/lib/python2.7/httplib.py:844: in send
    self.connect()
/usr/lib/python2.7/httplib.py:821: in connect
    self.timeout, self.source_address)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

address = ('13.2.4.4', 80), timeout = <object object at 0x7f9fa5ca80c0>, source_address = None

    def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT,
                          source_address=None):
        """Connect to *address* and return the socket object.
    
        Convenience function.  Connect to *address* (a 2-tuple ``(host,
        port)``) and return the socket object.  Passing the optional
        *timeout* parameter will set the timeout on the socket instance
        before attempting to connect.  If no *timeout* is supplied, the
        global default timeout setting returned by :func:`getdefaulttimeout`
        is used.  If *source_address* is set it must be a tuple of (host, port)
        for the socket to bind as a source address before making the connection.
        A host of '' or port 0 tells the OS to use the default.
        """
    
        host, port = address
        err = None
        for res in getaddrinfo(host, port, 0, SOCK_STREAM):
            af, socktype, proto, canonname, sa = res
            sock = None
            try:
                sock = socket(af, socktype, proto)
                if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
                    sock.settimeout(timeout)
                if source_address:
                    sock.bind(source_address)
                sock.connect(sa)
                return sock
    
            except error as _:
                err = _
                if sock is not None:
                    sock.close()
    
        if err is not None:
>           raise err
E           IOError: [Errno socket error] [Errno 110] Connection timed out

/usr/lib/python2.7/socket.py:575: IOError
=============================================== warnings summary ===============================================
debian/tmp/usr/lib/python2.7/dist-packages/pandas/tests/test_downstream.py::test_statsmodels
  /usr/lib/python2.7/dist-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
    from pandas.core import datetools

-- Docs: http://doc.pytest.org/en/latest/warnings.html
========================== 2 failed, 1 passed, 6 skipped, 1 warnings in 31.78 seconds 

@jreback jreback added Testing pandas testing functions or related to the test suite IO Network Local or Cloud (AWS, GCS, etc.) IO Issues labels Feb 19, 2018
@jreback jreback added this to the 0.23.0 milestone Feb 19, 2018
@jreback
Copy link
Contributor

jreback commented Feb 19, 2018

ok this is fine
ping on green

@yarikoptic
Copy link
Contributor Author

Would you mind if I suggest a PR which would use such fake proxy setup for those runs with --skip-network? This would allow to detect such tests "hot"

@jreback
Copy link
Contributor

jreback commented Feb 19, 2018

sure @yarikoptic always happy to have a mock for network testing (that shows if we are not doing it)

@yarikoptic
Copy link
Contributor Author

ok, closing in favor of #19757 which includes this patch + http_proxy's setup for travis

@yarikoptic yarikoptic closed this Feb 19, 2018
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Feb 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Network Local or Cloud (AWS, GCS, etc.) IO Issues Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants