-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PKG: Exclude data test files. #19535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/io/test_html.py
Outdated
@@ -65,9 +65,6 @@ def _skip_if_none_of(module_names): | |||
pytest.skip("Bad version of bs4: 4.2.0") | |||
|
|||
|
|||
DATA_PATH = tm.get_data_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This types of changes were because pytest.skip
can't be called outside of test methods.
setup.py
Outdated
@@ -722,11 +722,7 @@ def pxd(name): | |||
maintainer=AUTHOR, | |||
version=versioneer.get_version(), | |||
packages=find_packages(include=['pandas', 'pandas.*']), | |||
package_data={'': ['data/*', 'templates/*'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone know what the data/*
files were referencing? I think it was supposed to be pandas/tests/data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this looks for a data and template sub directory in any of the packages. It doesn’t refer to just one directory.
FWIW I think if those sub directories had an init they would automatically be included per the line above this, but given that’s not the case this helps include those folders relative to any of the packages that are found
And likewise we should also make sure to by accident not write tests that are skipped due to an error in the path? (maybe a grep for "Data files not included in pandas distribution." in the travis log could help for that?) |
Yes, that’s a good idea. Initially I wrote a decorator to explicitly mark tests using too much data to avoid that issue, but there were too many.
…________________________________
From: Joris Van den Bossche <[email protected]>
Sent: Sunday, February 4, 2018 3:17:47 PM
To: pandas-dev/pandas
Cc: Tom Augspurger; Author
Subject: Re: [pandas-dev/pandas] [WIP]PKG: Exclude data test files. (#19535)
And need to think about how to test this going forward to ensure we don't write tests that aren't skipped if a data file isn't present.
And likewise we should also make sure to by accident not write tests that are skipped due to an error in the path? (maybe a grep for "Data files not included in pandas distribution." in the travis log could help for that?)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#19535 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIqG83ONnJGZU-V_uZ-96YYvyWKqKks5tRh57gaJpZM4R4tLh>.
|
pandas/tests/io/conftest.py
Outdated
|
||
|
||
@pytest.fixture(scope='module') | ||
def jsonl_file(): | ||
"""Path a JSONL dataset""" | ||
return os.path.join(HERE, 'parser', 'data', 'items.jsonl') | ||
path = os.path.join(HERE, 'parser', 'data', 'items.jsonl') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should just make this a function (or maybe a decorator)
Codecov Report
@@ Coverage Diff @@
## master #19535 +/- ##
=========================================
Coverage ? 91.9%
=========================================
Files ? 153
Lines ? 49544
Branches ? 0
=========================================
Hits ? 45532
Misses ? 4012
Partials ? 0
Continue to review full report at Codecov.
|
Hello @TomAugspurger! Thanks for updating the PR.
Comment last updated on June 21, 2018 at 14:14 Hours UTC |
@@ -25,12 +25,12 @@ if [ "$DOC" ]; then | |||
echo "We are not running pytest as this is a doc-build" | |||
|
|||
elif [ "$COVERAGE" ]; then | |||
echo pytest -s -m "single" --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas | |||
pytest -s -m "single" --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas | |||
echo pytest -s -m "single" -r xXs --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe should make a variable that holds all of these options for both the echo and the run to avoid duplication
pandas/tests/conftest.py
Outdated
if request.config.getoption("--strict-data-files"): | ||
raise ValueError("Failed.") | ||
else: | ||
pytest.skip("Data files not included in pandas distribution.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add the path name in the message
pandas/tests/io/test_common.py
Outdated
@@ -170,6 +170,8 @@ def test_read_non_existant(self, reader, module, error_class, fn_ext): | |||
]) | |||
def test_read_fspath_all(self, reader, module, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can generally just autouse the datapath fixture
doc/source/whatsnew/v0.23.2.txt
Outdated
Build Changes | ||
------------- | ||
|
||
- The source and binary distributions no longer include test files, resulting in smaller download sizes. Tests relying on these files will be skipped when using ``pandas.test()``. (:issue:`19320`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test files -> "test data files" or "data files for testing"
That slightly changes the meaning :)
…On Wed, Jun 20, 2018 at 10:55 AM, Joris Van den Bossche < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In doc/source/whatsnew/v0.23.2.txt
<#19535 (comment)>:
> @@ -37,6 +37,11 @@ Documentation Changes
-
-
+Build Changes
+-------------
+
+- The source and binary distributions no longer include test files, resulting in smaller download sizes. Tests relying on these files will be skipped when using ``pandas.test()``. (:issue:`19320`)
test files -> "test data files" or "data files for testing"
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19535 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHInYwadTa-r5CTvUgr2OirWJuCimaks5t-nB8gaJpZM4R4tLh>
.
|
[ci skip]
But in the correct sense? From reading the current whatsnew, I understood that you removed all test |
Right, the old way sounding like everything was being removed.
I'll push an update once appveyor finishes.
…On Wed, Jun 20, 2018 at 11:02 AM, Joris Van den Bossche < ***@***.***> wrote:
That slightly changes the meaning :)
But in the correct sense?
From reading the current whatsnew, I understood that you removed all test
.py files, so could not run any test (which of course contradicts with
the rest of the sentence, but still ..)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19535 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIhVtPjWliqcKG-rbYGqxFOY-kG8Lks5t-nICgaJpZM4R4tLh>
.
|
@@ -59,7 +60,8 @@ def setup_method(self, method): | |||
self.mixed_frame = _mixed_frame.copy() | |||
self.categorical = _cat_frame.copy() | |||
|
|||
def teardown_method(self, method): | |||
yield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we should really not use this pattern, rather changing to all fixtures. as a temporary workaround this ok, can you create an issue to 'fix' this properly though.
pandas/tests/io/test_html.py
Outdated
@@ -25,8 +25,7 @@ | |||
import pandas.util._test_decorators as td | |||
from pandas.util.testing import makeCustomDataframe as mkdf, network | |||
|
|||
|
|||
DATA_PATH = tm.get_data_path() | |||
HERE = os.path.dirname(__file__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this?
pandas/tests/io/test_packers.py
Outdated
# GH12142 0.17 files packed in P2 can't be read in P3 | ||
if (compat.PY3 and version.startswith('0.17.') and | ||
legacy_packer.split('.')[-4][-1] == '2'): | ||
pytest.skip("Files packed in Py2 can't be read in Py3.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the filename here
@@ -37,6 +37,11 @@ Documentation Changes | |||
- | |||
- | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a ref here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General note: IMO it is not needed to always ask this of contributors, as this ref is only needed when we actually want to make an explicit link to it from within the rst files (and chances are quite high we will never do this). The ref can always be added at the moment one adds a link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but in general its a good practice
pandas/tests/io/test_html.py
Outdated
|
||
DATA_PATH = tm.get_data_path() | ||
|
||
def pytest_generate_tests(metafunc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do people thing about this approach?
This seems to be the recommended why to dynamically generate parametrized fixtures, based on some condition at runtime (the files present in this folder).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in this case I'm not sure it makes sense...
When the files aren't present, paths
will be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this probably is what we want. When the files are present, we'll get one fixture per file, which is exactly what we want. When they aren't present, nothing is collected (no fixtures), which is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But will it then still fail in the strict case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The files won't be there, so no fixtures are generated. It won't "exist", so there's nothing to fail :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's impossible for dynamically generated fixtures like this (which may be your point).
The alternative is to explicitly list them, which isn't so difficult, as there are only 8. Then we'll have a fixture like html_files
. going forward, if we have to add an additional HTML file, we would add it there. I'd prefer being explicit in this case, since there are so few files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the advantage of explicitly listing them is also that we don't need to use this dark pytest magic :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the normal pytest magic :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late here but I don’t mind this approach (not that different from metaclasses in Python). Is it not theoretically possible though to just fail the generated tests if the strict parameter is supplied yet no files are found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I'd prefer to explicitly list them since there are so few.
pandas/tests/io/test_html.py
Outdated
os.path.join(DATA_PATH, 'html_encoding', '*.html'))) | ||
def test_encode(self, f): | ||
_, encoding = os.path.splitext(os.path.basename(f))[0].split('_') | ||
def test_encode(self, html_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where html_file
is used.
test_foo.py
Outdated
@@ -0,0 +1,22 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😳
lgtm otherwise. merge when ready. |
Fixed a merge conflict. |
(cherry picked from commit 36422a8)
(cherry picked from commit 36422a8)
Source: 12M -> 9.5M
Binary: 10M -> 7.8M
Need to do a bit more testing to make sure I didn't break anything.
And need to think about how to test this going forward to ensure we don't write tests that aren't skipped if a data file isn't present.
Closes #19320
Closes #21436