-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: windows with TemporaryFile an read_csv #13398 #13481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mbrucher no, pls rebase on master, that test failure was fixed very recently. I don't like skipping ANYTHING if at all possible (even on windows), yes occasionally we do, but only if its not a simple fix. |
tests go here: https://github.com/pydata/pandas/blob/master/pandas/io/tests/parser/common.py these will be testsed on all platforms and parsers. Model after other tests.s @ unless @gfyoung has a better location. |
OK, thx. |
I'll add the test. |
If you have something failing, pls post; its clean now on appveyor and my manual runs on 3.5 (and 2.7) |
yeah I did end up skipping this one on windows. |
@jreback : yes, |
Oh... That may be because I have the anaconda version of pandas that is used instead of my develop :/ |
It may actually also fail on OS X and Linux, not only on Windows . Definitely compatibility issue, but also a mismatch in the implementation (testing the availablity of a function and using another one to do the actual work). |
yeah bug/compat doesn't matter much, except what I think I overuse the compat label :< either way if IOW if I have the same named |
I've added a test. I kept read_table so that I could use a regexp with the Python engine and use the path using the generator (sep!=None or len(sep) > 1) |
new_file.seek(0) | ||
|
||
dataframe = read_table(new_file, sep=r"\s+", header=None, engine="python") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
No need to import
read_table
for this.read_csv
andread_table
are absolutely identical (except for the default delimiter), andread_csv
will also trigger this error. -
You should check the result of the
read_
operation to make sure you aren't getting garbage out. -
Make reference the issue (the one you raised) under the
def ...
statement
4) Do not specify engine='python'
- we want to test that both the C and Python engines both work with TemporaryFile
, even though the issue was specific to the Python engine. This is better coverage.
- Actually, since you do need that regex
sep
, move the test intopython_parser_only.py
- I forgot that initially when I said thatcommon.py
was the right place to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You meqn reference the issue in the test (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly:
def test_temporary_file(self):
# see gh-13398
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those tests run? They all fail when called explicitely (and they should).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they do. If you look at test_parsers.py
, all of them get imported as test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the errors I get:
> nosetests -s pandas\io\tests\parser\python_parser_only.py:PythonParserTests
> EEEEEEEE
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_BytesIO_input
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
> self.test(*self.arg)
> File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 84, in test_BytesIO_input
> result = self.read_table(data, sep="::", encoding='cp1255')
> AttributeError: 'PythonParserTests' object has no attribute 'read_table'
>
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_decompression_regex_sep
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
> self.test(*self.arg)
> File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 133, in test_decompression_regex_sep
> data = open(self.csv1, 'rb').read()
> AttributeError: 'PythonParserTests' object has no attribute 'csv1'
>
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_negative_skipfooter_raises
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
> self.test(*self.arg)
> File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 34, in test_negative_skipfooter_raises
> self.read_csv(StringIO(text), skipfooter=-1)
> AttributeError: 'PythonParserTests' object has no attribute 'read_csv'
And they are proper issues in the test file.
Indeed, files work properly. So probably a Python default in the implementation. But the test in pandas was not the good one anyway (should have tested next instead of readline). |
Current coverage is 84.33%@@ master #13481 diff @@
==========================================
Files 138 138
Lines 51072 51083 +11
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43069 43083 +14
+ Misses 8003 8000 -3
Partials 0 0
|
…f (assuming that previous test is OK)
@jreback : not entirely sure why they changed the behavior of |
@@ -33,7 +33,8 @@ class TestCommonIOCapabilities(tm.TestCase): | |||
foo2,12,13,14,15 | |||
bar2,12,13,14,15 | |||
""" | |||
|
|||
data2 = data1.replace(",", " ") | |||
|
|||
def test_expand_user(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your original example was fine. No need to do this here unless data2
is intended to be used elsewhere. All you had to do was change read_table
to read_csv
in the initial version of your test.
Also, move this into python_parser_only.py
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to use the regular expression and make sure I'm going through the code path I modified (I need a non default sep).
Also see my current issue with the file (not run by nosetests by default?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, figured it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your original test was using regular expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! That's just to build a consistent DataFrame with two different mechanism, I'm assuming the one with StringIO + csv data will always be OK.
@gfyoung No, you can't specify a name indeed. It's if you want to open the file through another mechanism (which is what people use on forums, they don't pass the object, they open the file again, but it fails for Windows if yu do this). |
@mbrucher : ah, yes, that is indeed true - it fails on Windows but does work on Unix though. |
new_file.flush() | ||
new_file.seek(0) | ||
|
||
result = read_csv(new_file, sep=r"\s+", engine="python") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a corresponding test for the c engine that does the same? or is this specifically an issue with the python engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's specific to the Python engine as it has to do with the regex.
new_file.flush() | ||
new_file.seek(0) | ||
|
||
result = self.read_csv(new_file, sep=r"\s+", engine="python") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to specify the engine here because it's already specified as Python only. General rule of thumb for these tests: you shouldn't need to explicitly specify the engine.
ok, @gfyoung comment, and pls add a whatsnew note. ping when green. |
new_file.seek(0) | ||
|
||
result = self.read_csv(new_file, sep=r"\s+", engine="python") | ||
expected = self.read_csv(StringIO(data1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better practice is to explicitly define your expected output by constructing the DataFrame itself. If you want, you can simplify your data input to make this construction easier to write out (does your data need to be that long for the error to be triggered - why not that nice simple data you used in your initial commit?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to be too clever. I reused the data from test_common, should have reverted it back.
Should be good now.
thanks @mbrucher ping when green. |
@@ -493,6 +493,7 @@ Bug Fixes | |||
- Bug in ``pd.read_csv()`` in which the ``nrows`` argument was not properly validated for both engines (:issue:`10476`) | |||
- Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`) | |||
- Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing ``NaN`` values were not being parsed (:issue:`13320`) | |||
- Bug in ``pd.read_csv()`` with ``engine='python'`` when reading from a tempfile.TemporaryFile on Windows with Python 3 a file with the separator expressed as a regex (:issue:`13398`) | |||
- Bug in ``pd.read_csv()`` that prevents ``usecols`` kwarg from accepting single-byte unicode strings (:issue:`13219`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some grammar issues in that sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, changed
new_file.flush() | ||
new_file.seek(0) | ||
|
||
result = self.read_csv(new_file, sep=r"\s+", header=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double check that this actually errors without your changes? I couldn't get an error on my end (I think you need to change the sep
argument to another multi-char regex).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the error when I revert the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...how odd. It's because we do special handling of \s+
, and you might be seeing it (while I am not) due to LOCALE differences. To be safe, I would recommend passing in sep="\s*"
for that test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I understand. I don't think it makes a difference as the test is not yet done there ont he regex, but I don't know the rest of pandas, so I trust you. I used @jreback regex, so using mine again in my last change.
Sorry for the earlier rants ;) |
@jreback Everything is green now, thx for the help! |
thanks! |
Follow-up to #13481 and partially fixes #13932. Regex mistakenly matches to the empty string, which will cause Python 3.x to issue a warning. In addition, not sure why this test was placed in `python_parsers_only.py`... Author: gfyoung <[email protected]> Closes #13943 from gfyoung/regex-empty-match and squashes the following commits: b93325e [gfyoung] TST: Removed regex warning for read_csv
git diff upstream/master | flake8 --diff
Change the way of reading back to readline (consistent with the test before entering the function)
One failure on Windows 10 (Python 3.5), but expected to fail actually (should probably tag it as well?)