Skip to content

CI failures on master #28612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Sep 25, 2019 · 4 comments
Closed

CI failures on master #28612

TomAugspurger opened this issue Sep 25, 2019 · 4 comments
Labels
CI Continuous Integration

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 25, 2019

I retriggered a passing build on master, and it failed: https://travis-ci.org/pandas-dev/pandas/builds/588926447.

Last passing build was https://travis-ci.org/pandas-dev/pandas/builds/588919832.

@jbrockmendel do you have time to look into this today?

=================================== FAILURES ===================================
2294____________________________ test_with_s3_url[gzip] ____________________________
2295[gw1] linux -- Python 3.7.4 /home/travis/miniconda3/envs/pandas-dev/bin/python
2296
2297compression = 'gzip', s3_resource = s3.ServiceResource()
2298
2299    @td.skip_if_not_us_locale
2300    def test_with_s3_url(compression, s3_resource):
2301        # Bucket "pandas-test" created in tests/io/conftest.py
2302    
2303        df = pd.read_json('{"a": [1, 2, 3], "b": [4, 5, 6]}')
2304    
2305        with tm.ensure_clean() as path:
2306            df.to_json(path, compression=compression)
2307            with open(path, "rb") as f:
2308                s3_resource.Bucket("pandas-test").put_object(Key="test-1", Body=f)
2309    
2310>       roundtripped_df = pd.read_json("s3://pandas-test/test-1", compression=compression)
2311
2312pandas/tests/io/json/test_compression.py:48: 
2313_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2314pandas/io/json/_json.py:602: in read_json
2315    compression=compression,
2316pandas/io/json/_json.py:665: in __init__
2317    self.data = self._preprocess_data(data)
2318pandas/io/json/_json.py:676: in _preprocess_data
2319    data = data.read()
2320../../../miniconda3/envs/pandas-dev/lib/python3.7/gzip.py:276: in read
2321    return self._buffer.read(size)
2322_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2323
2324self = <gzip._GzipReader object at 0x7fb226df7450>, size = 8192
2325
2326    def read(self, size=-1):
2327        if size < 0:
2328            return self.readall()
2329        # size=0 is special because decompress(max_length=0) is not supported
2330        if not size:
2331            return b""
2332    
2333        # For certain input data, a single
2334        # call to decompress() may not return
2335        # any data. In this case, retry until we get some data or reach EOF.
2336        while True:
2337            if self._decompressor.eof:
2338                # Ending case: we've come to the end of a member in the file,
2339                # so finish up this member, and read a new gzip header.
2340                # Check the CRC and file size, and set the flag so we read
2341                # a new member
2342                self._read_eof()
2343                self._new_member = True
2344                self._decompressor = self._decomp_factory(
2345                    **self._decomp_args)
2346    
2347            if self._new_member:
2348                # If the _new_member flag is set, we have to
2349                # jump to the next member, if there is one.
2350                self._init_read()
2351                if not self._read_gzip_header():
2352                    self._size = self._pos
2353                    return b""
2354                self._new_member = False
2355    
2356            # Read a chunk of data from the file
2357            buf = self._fp.read(io.DEFAULT_BUFFER_SIZE)
2358    
2359            uncompress = self._decompressor.decompress(buf, size)
2360            if self._decompressor.unconsumed_tail != b"":
2361                self._fp.prepend(self._decompressor.unconsumed_tail)
2362            elif self._decompressor.unused_data != b"":
2363                # Prepend the already read bytes to the fileobj so they can
2364                # be seen by _read_eof() and _read_gzip_header()
2365                self._fp.prepend(self._decompressor.unused_data)
2366    
2367            if uncompress != b"":
2368                break
2369            if buf == b"":
2370>               raise EOFError("Compressed file ended before the "
2371                               "end-of-stream marker was reached")
2372E               EOFError: Compressed file ended before the end-of-stream marker was reached
2373
2374../../../miniconda3/envs/pandas-dev/lib/python3.7/gzip.py:482: EOFError
2375____________________________ test_with_s3_url[bz2] _____________________________
2376
@TomAugspurger TomAugspurger added the CI Continuous Integration label Sep 25, 2019
@WillAyd
Copy link
Member

WillAyd commented Sep 25, 2019

Haven't followed this all the way through but something similar came up in #28206 so maybe need a s3fs pin

@WillAyd
Copy link
Member

WillAyd commented Sep 25, 2019

fsspec/s3fs#225 suggests this would have been fixed with the release of s3fs 0.3.4 but that's what is failing on Travis, so maybe fsspec is the real culprit

@jbrockmendel
Copy link
Member

I'll take a look after some caffeine.

@russellbrooks
Copy link

russellbrooks commented Sep 25, 2019

@WillAyd I also think it's related to fsspec. Related issue in #28490:

I've also experienced many issues with pandas reading S3-based parquet files ever since s3fs refactored the file system components into fspsec.
Many of the most recent errors appear to be resolved by forcing fsspec>=0.5.1 which was released 4 days ago. Otherwise s3fs was resolving to fsspec 0.4.0 using conda for me without other constraints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

No branches or pull requests

4 participants