Skip to content

ENH: Support fspath protocol #16301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 18, 2017
Merged

Conversation

TomAugspurger
Copy link
Contributor

Supports the fspath protocol in most readers and writers.

closes #13823
xref #14123

@TomAugspurger TomAugspurger added Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label Python 3.6 labels May 9, 2017
@TomAugspurger
Copy link
Contributor Author

Wait till 0.21 on this? Technically this could be an API change for if people using ExcelFile directly, as I stringify self.io. I can probably work around this.

This will cause one error in the pytables tests, since closed HDFStore objects don't allow object access, so I can't check hasattr(obj, '__fspath__'). I think I'll change HDFStore.__getattr__, but have to run for now.

Also, any feedback on the testing method? Typically we have the tests split by the kind of reader / writer. But for things like these protocols, it's nice to test the interface against every reader / writer that should support it, so I put in in common.py.

@@ -89,6 +107,68 @@ def test_iterator(self):
tm.assert_frame_equal(first, expected.iloc[[0]])
tm.assert_frame_equal(concat(it), expected.iloc[1:])

@pytest.mark.parametrize('reader, module, path', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are moving these here. then maybe do the same with Path/localpath tests (#16292). can do after of course.

though not sure whether I like better in each module, or centralized.

cc @chris-b1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though not sure whether I like better in each module, or centralized.

I like both :) Per-module for things specific to that, and centralized for "interface"-type things like this. The importskips are kind of annoying though. Happy to split this if you want.

I guess one alternative would be some kind of abstract base class for this set of tests, that would require methods to "implement" tests for test_fspath, test_compression, etc. That seems like a bit much to me though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think this is fine actually, though #16292 would need to change (you can rebase on top maybe)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, going to do it on top #16292. We'll still need it for pathlib on 3.5, which doesn't implement fspath.

@TomAugspurger
Copy link
Contributor Author

8c10784 has the API changes required on HDFStore. Relatively minor, so I'm ok with just changing it in 0.21, but we can do a deprecation if you want, or I can just code around it in _stringify_path and we never change it.

@jreback
Copy link
Contributor

jreback commented May 10, 2017

8c10784 has the API changes required on HDFStore. Relatively minor, so I'm ok with just changing it in 0.21, but we can do a deprecation if you want, or I can just code around it in _stringify_path and we never change it.

ok to just change this

@codecov
Copy link

codecov bot commented May 10, 2017

Codecov Report

Merging #16301 into master will increase coverage by <.01%.
The diff coverage is 91.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16301      +/-   ##
==========================================
+ Coverage   90.39%   90.39%   +<.01%     
==========================================
  Files         161      161              
  Lines       50863    50877      +14     
==========================================
+ Hits        45977    45990      +13     
- Misses       4886     4887       +1
Flag Coverage Δ
#multiple 88.22% <91.89%> (+0.06%) ⬆️
#single 40.33% <40.54%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/pytables.py 93.06% <ø> (-0.01%) ⬇️
pandas/io/packers.py 88.23% <100%> (+0.03%) ⬆️
pandas/io/feather_format.py 87.87% <100%> (+1.21%) ⬆️
pandas/io/formats/excel.py 74.33% <100%> (+1.17%) ⬆️
pandas/io/pickle.py 80.43% <100%> (+0.88%) ⬆️
pandas/io/stata.py 93.47% <100%> (ø) ⬆️
pandas/io/json/json.py 90.36% <100%> (+0.02%) ⬆️
pandas/io/formats/format.py 95.66% <100%> (ø) ⬆️
pandas/io/common.py 69.78% <100%> (-0.3%) ⬇️
pandas/io/sas/sasreader.py 86.2% <100%> (+1.02%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09d1c97...03d29ba. Read the comment docs.

@codecov
Copy link

codecov bot commented May 10, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@539de79). Click here to learn what that means.
The diff coverage is 93.87%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #16301   +/-   ##
=========================================
  Coverage          ?    90.4%           
=========================================
  Files             ?      161           
  Lines             ?    50987           
  Branches          ?        0           
=========================================
  Hits              ?    46096           
  Misses            ?     4891           
  Partials          ?        0
Flag Coverage Δ
#multiple 88.24% <83.67%> (?)
#single 40.19% <55.1%> (?)
Impacted Files Coverage Δ
pandas/io/formats/format.py 95.66% <100%> (ø)
pandas/io/json/json.py 90.36% <100%> (ø)
pandas/io/stata.py 93.47% <100%> (ø)
pandas/io/common.py 69.91% <100%> (ø)
pandas/io/pickle.py 81.25% <100%> (ø)
pandas/io/pytables.py 93.06% <100%> (ø)
pandas/io/feather_format.py 87.87% <100%> (ø)
pandas/io/formats/excel.py 74.24% <100%> (ø)
pandas/io/packers.py 88.23% <100%> (ø)
pandas/io/sas/sasreader.py 86.2% <100%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 539de79...ac37895. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

Rebased on top of #16292

I also added a third commit adding __fspath__ to our own path-like objects: HDFStore and ExcelFile. Do we have any others?

@chris-b1
Copy link
Contributor

We also have ExcelWriter

pytest.raises(NotImplementedError, read_hdf, store, 'df')

def test_read_hdf_generic_buffer_errors(self):
pytest.raises(NotImplementedError, read_hdf, BytesIO(b''), 'df')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I had to modify this test to hit the block here. I'm pretty sure you were just using store as an example of a buffer that isn't an HDFStore or filepath. But with these changes it'll be coerced to a path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@TomAugspurger
Copy link
Contributor Author

@jreback do we have any official policy on whether or not to include package data in the distribution? This test uses a pickle that's not included, so it was failing on the build test. e3b3fcb works around it.

Right now our wheel is ~38M unzipped, and about 15M of that are test files.

@jreback
Copy link
Contributor

jreback commented May 17, 2017

@jreback do we have any official policy on whether or not to include package data in the distribution? This test uses a pickle that's not included, so it was failing on the build test. e3b3fcb works around it.

we DO include things, but should only be smallish test files. Don't work around things like that.

@jreback
Copy link
Contributor

jreback commented May 17, 2017

I think your file is not included, because its not in the filter for package_data (for .mp)

Previously, we called _check_if_open, which would raise a ClosedFileError
whenever the desired attribute wasn't found. This prevented the check required
for PEP519 to work properly, since hasattr shouldn't raise that error.
Ensures that most of pandas readers and writers will honor the fspath
protocol, if an object defines it.

TST: remove old xfails
- HDFStore
- ExcelFile

(cherry picked from commit f9baec385e2513234a69a05031d2753899290d38)
@TomAugspurger
Copy link
Contributor Author

I think your file is not included, because its not in the filter for package_data (for .mp)

Completely forgot that I added the messagepack file :/ Fixed now.

All green, other than that matplotlib test, which I'm going to debug now.

@jreback
Copy link
Contributor

jreback commented May 18, 2017

@TomAugspurger lgtm. plotting occasional failures are unrelated and I think you have an issue for that anyhow.

@TomAugspurger TomAugspurger merged commit 7ea5f49 into pandas-dev:master May 18, 2017
@TomAugspurger
Copy link
Contributor Author

Thanks. I'm (trying to) debug the plotting stuff on my Travis branch, where it's currently not failing :/

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017
* ENH: Support fspath protocol

Ensures that most of pandas readers and writers will honor the fspath
protocol, if an object defines it.

TST: remove old xfails

* API: Raise AttributeError on closed HDFStore

Previously, we called _check_if_open, which would raise a ClosedFileError
whenever the desired attribute wasn't found. This prevented the check required
for PEP519 to work properly, since hasattr shouldn't raise that error.

* ENH: add __fspath__ to pandas own file-like objects

- HDFStore
- ExcelFile
@TomAugspurger TomAugspurger deleted the fspath branch May 27, 2017 16:36
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
* ENH: Support fspath protocol

Ensures that most of pandas readers and writers will honor the fspath
protocol, if an object defines it.

TST: remove old xfails

* API: Raise AttributeError on closed HDFStore

Previously, we called _check_if_open, which would raise a ClosedFileError
whenever the desired attribute wasn't found. This prevented the check required
for PEP519 to work properly, since hasattr shouldn't raise that error.

* ENH: add __fspath__ to pandas own file-like objects

- HDFStore
- ExcelFile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Add support for PEP 519
4 participants