BUG: windows with TemporaryFile an read_csv #13398 #13481

mbrucher · 2016-06-18T16:09:28Z

[ x] closes TemporaryFile as input to read_table raises TypeError: '_TemporaryFileWrapper' object is not an iterator #13398
[ x] no tests added -> could not find location for IO tests?
[ x] passes git diff upstream/master | flake8 --diff

Change the way of reading back to readline (consistent with the test before entering the function)

One failure on Windows 10 (Python 3.5), but expected to fail actually (should probably tag it as well?)

======================================================================
FAIL: test_next (pandas.io.tests.test_common.TestMMapWrapper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "G:\Informatique\pandas\pandas\io\tests\test_common.py", line 139, in test_next   
    self.assertEqual(next_line, line)
AssertionError: 'a,b,c\r\n' != 'a,b,c\n'
- a,b,c
?      -
+ a,b,c

Change the way of reading back to readline (consistent with the test before entering the function)

jreback · 2016-06-18T16:11:09Z

@mbrucher no, pls rebase on master, that test failure was fixed very recently.

I don't like skipping ANYTHING if at all possible (even on windows), yes occasionally we do, but only if its not a simple fix.

jreback · 2016-06-18T16:14:18Z

tests go here: https://github.com/pydata/pandas/blob/master/pandas/io/tests/parser/common.py

these will be testsed on all platforms and parsers. Model after other tests.s @

unless @gfyoung has a better location.

mbrucher · 2016-06-18T16:16:35Z

OK, thx.
Actually, I'm on master from yesterday (missed one patch, I don't know why). The test doesn't seem to be mixed on master :/

mbrucher · 2016-06-18T16:16:43Z

I'll add the test.

jreback · 2016-06-18T16:17:48Z

If you have something failing, pls post; its clean now on appveyor and my manual runs on 3.5 (and 2.7)

jreback · 2016-06-18T16:18:36Z

commit 9670b3139171b3b628cecb87a3d0b46cc0361eba
Author: Jeff Reback <[email protected]>
Date:   Tue Jun 14 17:44:13 2016 -0400

    TST: skip mmap error comparison on windows

yeah I did end up skipping this one on windows.

gfyoung · 2016-06-18T16:26:19Z

@jreback : yes, common.py is the right place for this to go since it doesn't fall into any nice category that was laid out previously. BTW, wouldn't consider this a bug but rather a compat issue because you can iterate over a TemporaryFilein Python 2.x but not in Python 3.x.

mbrucher · 2016-06-18T16:26:44Z

Oh... That may be because I have the anaconda version of pandas that is used instead of my develop :/
So I'll add the test and we'll see.

mbrucher · 2016-06-18T16:28:07Z

It may actually also fail on OS X and Linux, not only on Windows . Definitely compatibility issue, but also a mismatch in the implementation (testing the availablity of a function and using another one to do the actual work).

jreback · 2016-06-18T16:29:57Z

yeah bug/compat doesn't matter much, except what I think I overuse the compat label :<

either way if TemporaryFile is not behaving as 'regular' files that's an issue. More generally I would say that that is a python bug no? Or is that defined for some reason?

IOW if I have the same named TemporaryFile is it the SAME if I close and re-open it (SAME as is name, but prob not inode, so its not the 'same')? (I am guessing no).

mbrucher · 2016-06-18T16:33:25Z

I've added a test. I kept read_table so that I could use a regexp with the Python engine and use the path using the generator (sep!=None or len(sep) > 1)

gfyoung · 2016-06-18T16:34:43Z

pandas/io/tests/test_common.py

+        new_file.seek(0)
+
+        dataframe = read_table(new_file, sep=r"\s+", header=None, engine="python")
+



No need to import read_table for this. read_csv and read_table are absolutely identical (except for the default delimiter), and read_csv will also trigger this error.

You should check the result of the read_ operation to make sure you aren't getting garbage out.

Make reference the issue (the one you raised) under the def ... statement

4) Do not specify engine='python' - we want to test that both the C and Python engines both work with TemporaryFile, even though the issue was specific to the Python engine. This is better coverage.

Actually, since you do need that regex sep, move the test into python_parser_only.py- I forgot that initially when I said that common.py was the right place to go.

You meqn reference the issue in the test (comment)?

Exactly:

def test_temporary_file(self): # see gh-13398 ...

Are those tests run? They all fail when called explicitely (and they should).

Yes, they do. If you look at test_parsers.py, all of them get imported as test cases.

Some of the errors I get:

> nosetests -s pandas\io\tests\parser\python_parser_only.py:PythonParserTests > EEEEEEEE > ====================================================================== > ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_BytesIO_input > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest > self.test(*self.arg) > File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 84, in test_BytesIO_input > result = self.read_table(data, sep="::", encoding='cp1255') > AttributeError: 'PythonParserTests' object has no attribute 'read_table' > > ====================================================================== > ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_decompression_regex_sep > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest > self.test(*self.arg) > File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 133, in test_decompression_regex_sep > data = open(self.csv1, 'rb').read() > AttributeError: 'PythonParserTests' object has no attribute 'csv1' > > ====================================================================== > ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_negative_skipfooter_raises > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest > self.test(*self.arg) > File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 34, in test_negative_skipfooter_raises > self.read_csv(StringIO(text), skipfooter=-1) > AttributeError: 'PythonParserTests' object has no attribute 'read_csv'

And they are proper issues in the test file.

mbrucher · 2016-06-18T16:35:19Z

Indeed, files work properly. So probably a Python default in the implementation. But the test in pandas was not the good one anyway (should have tested next instead of readline).

codecov-io · 2016-06-18T16:38:28Z

Current coverage is 84.33%

Merging #13481 into master will increase coverage by <.01%

@@             master     #13481   diff @@
==========================================
  Files           138        138          
  Lines         51072      51083    +11   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43069      43083    +14   
+ Misses         8003       8000     -3   
  Partials          0          0

Powered by Codecov. Last updated by 35bb1a1...8b52631

…f (assuming that previous test is OK)

gfyoung · 2016-06-18T16:47:12Z

@jreback : not entirely sure why they changed the behavior of TemporaryFile - it's indeed an odd feature but it hasn't really reared its head until now. With regards to your question about opening and closing again, I'm not sure I follow you here. When you say re-open with the same name, how? I don't think you can specify a name with NamedTemporaryFile.

gfyoung · 2016-06-18T16:49:10Z

pandas/io/tests/test_common.py

@@ -33,7 +33,8 @@ class TestCommonIOCapabilities(tm.TestCase):
 foo2,12,13,14,15
 bar2,12,13,14,15
 """
-
+    data2 = data1.replace(",", " ")
+
    def test_expand_user(self):


Your original example was fine. No need to do this here unless data2 is intended to be used elsewhere. All you had to do was change read_table to read_csv in the initial version of your test.

Also, move this into python_parser_only.py!

It's to use the regular expression and make sure I'm going through the code path I modified (I need a non default sep).
Also see my current issue with the file (not run by nosetests by default?).

OK, figured it out.

Your original test was using regular expression.

Oh! That's just to build a consistent DataFrame with two different mechanism, I'm assuming the one with StringIO + csv data will always be OK.

mbrucher · 2016-06-18T16:50:49Z

@gfyoung No, you can't specify a name indeed. It's if you want to open the file through another mechanism (which is what people use on forums, they don't pass the object, they open the file again, but it fails for Windows if yu do this).

gfyoung · 2016-06-18T16:52:13Z

@mbrucher : ah, yes, that is indeed true - it fails on Windows but does work on Unix though.

jreback · 2016-06-18T21:38:44Z

pandas/io/tests/test_common.py

+        new_file.flush()
+        new_file.seek(0)
+
+        result = read_csv(new_file, sep=r"\s+", engine="python")


is there a corresponding test for the c engine that does the same? or is this specifically an issue with the python engine?

I think it's specific to the Python engine as it has to do with the regex.

gfyoung · 2016-06-18T23:11:13Z

pandas/io/tests/parser/python_parser_only.py

+        new_file.flush()
+        new_file.seek(0)
+
+        result = self.read_csv(new_file, sep=r"\s+", engine="python")


You don't need to specify the engine here because it's already specified as Python only. General rule of thumb for these tests: you shouldn't need to explicitly specify the engine.

jreback · 2016-06-19T19:43:20Z

ok, @gfyoung comment, and pls add a whatsnew note. ping when green.

gfyoung · 2016-06-19T19:50:11Z

pandas/io/tests/parser/python_parser_only.py

+        new_file.seek(0)
+
+        result = self.read_csv(new_file, sep=r"\s+", engine="python")
+        expected = self.read_csv(StringIO(data1))


Better practice is to explicitly define your expected output by constructing the DataFrame itself. If you want, you can simplify your data input to make this construction easier to write out (does your data need to be that long for the error to be triggered - why not that nice simple data you used in your initial commit?)

I was trying to be too clever. I reused the data from test_common, should have reverted it back.
Should be good now.

jreback · 2016-06-19T20:20:06Z

thanks @mbrucher ping when green.

gfyoung · 2016-06-19T21:16:02Z

doc/source/whatsnew/v0.18.2.txt

@@ -493,6 +493,7 @@ Bug Fixes
 - Bug in ``pd.read_csv()`` in which the ``nrows`` argument was not properly validated for both engines (:issue:`10476`)
 - Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`)
 - Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing ``NaN`` values were not being parsed (:issue:`13320`)
+- Bug in ``pd.read_csv()`` with ``engine='python'`` when reading from a tempfile.TemporaryFile on Windows with Python 3 a file with the separator expressed as a regex (:issue:`13398`)
 - Bug in ``pd.read_csv()`` that prevents ``usecols`` kwarg from accepting single-byte unicode strings (:issue:`13219`)


I think there are some grammar issues in that sentence.

OK, changed

gfyoung · 2016-06-19T21:22:28Z

pandas/io/tests/parser/python_parser_only.py

+        new_file.flush()
+        new_file.seek(0)
+
+        result = self.read_csv(new_file, sep=r"\s+", header=None)


Can you double check that this actually errors without your changes? I couldn't get an error on my end (I think you need to change the sep argument to another multi-char regex).

I have the error when I revert the change.

Hmm...how odd. It's because we do special handling of \s+, and you might be seeing it (while I am not) due to LOCALE differences. To be safe, I would recommend passing in sep="\s*" for that test.

OK, I understand. I don't think it makes a difference as the test is not yet done there ont he regex, but I don't know the rest of pandas, so I trust you. I used @jreback regex, so using mine again in my last change.

gfyoung · 2016-06-19T21:36:19Z

Ah, that's better! @jreback this LGTM and is ready to merge if there are no other concerns, and Travis is happy. Thanks for your patience @mbrucher !

mbrucher · 2016-06-19T21:38:30Z

Sorry for the earlier rants ;)
I fixed the last lint issues, hopefully the last commits won't break the fix I put in. We'll see tomorrow.

mbrucher · 2016-06-20T11:29:37Z

@jreback Everything is green now, thx for the help!

jreback · 2016-06-22T10:21:40Z

thanks!

Follow-up to #13481 and partially fixes #13932. Regex mistakenly matches to the empty string, which will cause Python 3.x to issue a warning. In addition, not sure why this test was placed in `python_parsers_only.py`... Author: gfyoung <[email protected]> Closes #13943 from gfyoung/regex-empty-match and squashes the following commits: b93325e [gfyoung] TST: Removed regex warning for read_csv

#13398

d8decae

Change the way of reading back to readline (consistent with the test before entering the function)

jreback changed the title ~~#13398~~ BUG: windows with TemporaryFile an read_csv #13398 Jun 18, 2016

jreback added Bug IO CSV read_csv, to_csv Windows Windows OS labels Jun 18, 2016

Adding a test with Python engine

5af8465

gfyoung reviewed Jun 18, 2016
View reviewed changes

mbrucher added 2 commits June 18, 2016 17:41

Added reference to original issue in the test + test the result itsel…

119fb65

…f (assuming that previous test is OK)

Using same way of referencing as just above, consistency.

98e476e

gfyoung reviewed Jun 18, 2016
View reviewed changes

Moved the test to the Python parser test file

fd20aaf

jreback reviewed Jun 18, 2016
View reviewed changes

lint changes

d8ceb57

gfyoung reviewed Jun 18, 2016
View reviewed changes

jreback added this to the 0.18.2 milestone Jun 19, 2016

gfyoung reviewed Jun 19, 2016
View reviewed changes

Simplified test and added what's new note.

1c33fb5

lint change

aa3f0aa

gfyoung reviewed Jun 19, 2016
View reviewed changes

mbrucher added 2 commits June 19, 2016 22:18

Grammar

5871625

Simplified

0d54151

gfyoung reviewed Jun 19, 2016
View reviewed changes

Yet another small update for more general regex

8b52631

jreback closed this in 30d710f Jun 22, 2016

gfyoung mentioned this pull request Aug 9, 2016

TST: Removed regex warning for read_csv #13943

Closed

		new_file.seek(0)

		dataframe = read_table(new_file, sep=r"\s+", header=None, engine="python")

Uh oh!

BUG: windows with TemporaryFile an read_csv #13398 #13481

BUG: windows with TemporaryFile an read_csv #13398 #13481

Uh oh!

Conversation

mbrucher commented Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Jun 18, 2016

Uh oh!

jreback commented Jun 18, 2016

Uh oh!

mbrucher commented Jun 18, 2016

Uh oh!

mbrucher commented Jun 18, 2016

Uh oh!

jreback commented Jun 18, 2016

Uh oh!

jreback commented Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gfyoung commented Jun 18, 2016

Uh oh!

mbrucher commented Jun 18, 2016

Uh oh!

mbrucher commented Jun 18, 2016

Uh oh!

jreback commented Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbrucher commented Jun 18, 2016

Uh oh!

gfyoung Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbrucher Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbrucher commented Jun 18, 2016

Uh oh!

codecov-io commented Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 84.33%

Uh oh!

gfyoung commented Jun 18, 2016

Uh oh!

gfyoung Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbrucher commented Jun 18, 2016

Uh oh!

gfyoung commented Jun 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Jun 19, 2016

Uh oh!

mbrucher commented Jun 18, 2016 •

edited

Loading

jreback commented Jun 18, 2016 •

edited

Loading

jreback commented Jun 18, 2016 •

edited

Loading

gfyoung Jun 18, 2016 •

edited

Loading

mbrucher Jun 18, 2016 •

edited

Loading

codecov-io commented Jun 18, 2016 •

edited

Loading

gfyoung Jun 18, 2016 •

edited

Loading

gfyoung Jun 19, 2016 •

edited

Loading