Skip to content

TST/DOC: test pyarrow tz data + doc / enable cross compat tests for pyarrow/fastparquet #18662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

Commits:

TST: add parquet test with tz datetime data for pyarrow

  • clean-up basic data types tests: make common dataframe with types
    supported by both pyarrow and fastparquet

DOC: document differences between pyarrow and fastparquet in supported data types

TST: enable pyarrow/fastparquet cross compatibility tests on smaller subset of dataframe

Closes #17448
Also adds a test for #18628

+ clean-up basic data types tests: make common dataframe with types
supported by both pyarrow and fastparquet
@jorisvandenbossche jorisvandenbossche added IO Parquet parquet, feather Testing pandas testing functions or related to the test suite labels Dec 6, 2017
@pep8speaks
Copy link

pep8speaks commented Dec 6, 2017

Hello @jorisvandenbossche! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 08, 2017 at 08:47 Hours UTC

@codecov
Copy link

codecov bot commented Dec 7, 2017

Codecov Report

Merging #18662 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18662      +/-   ##
==========================================
+ Coverage   91.57%   91.57%   +<.01%     
==========================================
  Files         153      153              
  Lines       51210    51210              
==========================================
+ Hits        46894    46895       +1     
+ Misses       4316     4315       -1
Flag Coverage Δ
#multiple 89.43% <ø> (+0.01%) ⬆️
#single 40.67% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/util/testing.py 81.82% <0%> (-0.2%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/plotting/_converter.py 65.25% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdba133...b05ae5d. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 7, 2017

Codecov Report

Merging #18662 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18662      +/-   ##
==========================================
+ Coverage   91.57%   91.57%   +<.01%     
==========================================
  Files         153      153              
  Lines       51210    51212       +2     
==========================================
+ Hits        46894    46899       +5     
+ Misses       4316     4313       -3
Flag Coverage Δ
#multiple 89.43% <ø> (+0.02%) ⬆️
#single 40.67% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.68% <0%> (ø) ⬆️
pandas/plotting/_converter.py 65.25% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdba133...6161f65. Read the comment docs.


# additional supported types for pyarrow
import pyarrow
if LooseVersion(pyarrow.__version__) >= LooseVersion('0.7.0'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this after we change the dep (@dhirschfeld PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. For the rest any comments?

@jorisvandenbossche jorisvandenbossche added this to the 0.21.1 milestone Dec 8, 2017
@jorisvandenbossche
Copy link
Member Author

The failure is the unreliable parallel_coordinates, so unrelated

@jorisvandenbossche jorisvandenbossche merged commit 371649b into pandas-dev:master Dec 10, 2017
@jorisvandenbossche jorisvandenbossche deleted the parquet-test-tz branch December 10, 2017 14:41
@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

did this pass CI?

on pyarrow 0.7.1, fp 0.1.3, on macosx

(pandas) bash-3.2$ pytest pandas/tests/io/test_parquet.py --tb=short
=========================================================================================== test session starts ===========================================================================================
platform darwin -- Python 3.6.1, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /Users/jreback/pandas, inifile: setup.cfg
plugins: xdist-1.16.0, cov-2.3.1
collected 38 items                                                                                                                                                                                         

pandas/tests/io/test_parquet.py .....F............s...s...x...s.s.....

================================================================================================ FAILURES =================================================================================================
_________________________________________________________________________________________ test_cross_engine_pa_fp _________________________________________________________________________________________
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/api.py:96: in __init__
    with open_with(fn2, 'rb') as f:
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/util.py:44: in default_open
    return open(f, mode)
E   NotADirectoryError: [Errno 20] Not a directory: '/var/folders/h3/mr_r3bkj5yg0pbx9fr3tk1r00000gp/T/tmpii71wdx8/_metadata'

During handling of the above exception, another exception occurred:
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/api.py:119: in _parse_header
    fmd = read_thrift(f, parquet_thrift.FileMetaData)
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/thrift_structures.py:22: in read_thrift
    obj.read(pin)
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1899: in read
    _elem53.read(iprot)
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1742: in read
    _elem33.read(iprot)
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1656: in read
    self.meta_data.read(iprot)
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1487: in read
    self.statistics.read(iprot)
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:298: in read
    iprot.skip(ftype)
../miniconda3/envs/pandas/lib/python3.6/site-packages/thrift/protocol/TProtocol.py:208: in skip
    self.readString()
../miniconda3/envs/pandas/lib/python3.6/site-packages/thrift/protocol/TProtocol.py:184: in readString
    return binary_to_str(self.readBinary())
../miniconda3/envs/pandas/lib/python3.6/site-packages/thrift/compat.py:37: in binary_to_str
    return bin_val.decode('utf8')
E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 2: invalid start byte

During handling of the above exception, another exception occurred:
pandas/tests/io/test_parquet.py:186: in test_cross_engine_pa_fp
    result = read_parquet(path, engine=fp)
pandas/io/parquet.py:211: in read_parquet
    return impl.read(path, columns=columns, **kwargs)
pandas/io/parquet.py:123: in read
    return self.api.ParquetFile(path).to_pandas(columns=columns, **kwargs)
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/api.py:102: in __init__
    self._parse_header(f, verify)
../miniconda3/envs/pandas/lib/python3.6/site-packages/fastparquet/api.py:122: in _parse_header
    self.fn)
E   fastparquet.util.ParquetException: Metadata parse failed: /var/folders/h3/mr_r3bkj5yg0pbx9fr3tk1r00000gp/T/tmpii71wdx8
======================================================================== 1 failed, 32 passed, 4 skipped, 1 xfailed in 2.93 seconds ========================================================================

@jorisvandenbossche
Copy link
Member Author

It did yes, but might be we don't have to good combination of versions on the CI to catch that

@jorisvandenbossche
Copy link
Member Author

So it seems we don't have pyarrow on Mac, so this will be skipping those tests.
Do you know if that was on purpose? To have a build with only fastparquet?

@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

@jorisvandenbossche no wasn't on purpose, should add that.

@jorisvandenbossche
Copy link
Member Author

Can you also open an issue on the fastparquet tracker?

jreback added a commit to jreback/pandas that referenced this pull request Dec 10, 2017
@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

yep

jreback added a commit to jreback/pandas that referenced this pull request Dec 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Parquet parquet, feather Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants