Skip to content

BUG: windows with TemporaryFile an read_csv #13398 #13481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

BUG: windows with TemporaryFile an read_csv #13398 #13481

wants to merge 11 commits into from

Conversation

mbrucher
Copy link
Contributor

@mbrucher mbrucher commented Jun 18, 2016

Change the way of reading back to readline (consistent with the test before entering the function)

One failure on Windows 10 (Python 3.5), but expected to fail actually (should probably tag it as well?)

======================================================================
FAIL: test_next (pandas.io.tests.test_common.TestMMapWrapper)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "G:\Informatique\pandas\pandas\io\tests\test_common.py", line 139, in test_next   
    self.assertEqual(next_line, line)
AssertionError: 'a,b,c\r\n' != 'a,b,c\n'
- a,b,c
?      -
+ a,b,c

Change the way of reading back to readline (consistent with the test before entering the function)
@jreback
Copy link
Contributor

jreback commented Jun 18, 2016

@mbrucher no, pls rebase on master, that test failure was fixed very recently.

I don't like skipping ANYTHING if at all possible (even on windows), yes occasionally we do, but only if its not a simple fix.

@jreback
Copy link
Contributor

jreback commented Jun 18, 2016

tests go here: https://github.com/pydata/pandas/blob/master/pandas/io/tests/parser/common.py

these will be testsed on all platforms and parsers. Model after other tests.s @

unless @gfyoung has a better location.

@jreback jreback changed the title #13398 BUG: windows with TemporaryFile an read_csv #13398 Jun 18, 2016
@jreback jreback added Bug IO CSV read_csv, to_csv Windows Windows OS labels Jun 18, 2016
@mbrucher
Copy link
Contributor Author

OK, thx.
Actually, I'm on master from yesterday (missed one patch, I don't know why). The test doesn't seem to be mixed on master :/

@mbrucher
Copy link
Contributor Author

I'll add the test.

@jreback
Copy link
Contributor

jreback commented Jun 18, 2016

If you have something failing, pls post; its clean now on appveyor and my manual runs on 3.5 (and 2.7)

@jreback
Copy link
Contributor

jreback commented Jun 18, 2016

commit 9670b3139171b3b628cecb87a3d0b46cc0361eba
Author: Jeff Reback <[email protected]>
Date:   Tue Jun 14 17:44:13 2016 -0400

    TST: skip mmap error comparison on windows

yeah I did end up skipping this one on windows.

@gfyoung
Copy link
Member

gfyoung commented Jun 18, 2016

@jreback : yes, common.py is the right place for this to go since it doesn't fall into any nice category that was laid out previously. BTW, wouldn't consider this a bug but rather a compat issue because you can iterate over a TemporaryFilein Python 2.x but not in Python 3.x.

@mbrucher
Copy link
Contributor Author

Oh... That may be because I have the anaconda version of pandas that is used instead of my develop :/
So I'll add the test and we'll see.

@mbrucher
Copy link
Contributor Author

It may actually also fail on OS X and Linux, not only on Windows . Definitely compatibility issue, but also a mismatch in the implementation (testing the availablity of a function and using another one to do the actual work).

@jreback
Copy link
Contributor

jreback commented Jun 18, 2016

yeah bug/compat doesn't matter much, except what I think I overuse the compat label :<

either way if TemporaryFile is not behaving as 'regular' files that's an issue. More generally I would say that that is a python bug no? Or is that defined for some reason?

IOW if I have the same named TemporaryFile is it the SAME if I close and re-open it (SAME as is name, but prob not inode, so its not the 'same')? (I am guessing no).

@mbrucher
Copy link
Contributor Author

I've added a test. I kept read_table so that I could use a regexp with the Python engine and use the path using the generator (sep!=None or len(sep) > 1)

new_file.seek(0)

dataframe = read_table(new_file, sep=r"\s+", header=None, engine="python")


Copy link
Member

@gfyoung gfyoung Jun 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. No need to import read_table for this. read_csv and read_table are absolutely identical (except for the default delimiter), and read_csv will also trigger this error.

  2. You should check the result of the read_ operation to make sure you aren't getting garbage out.

  3. Make reference the issue (the one you raised) under the def ... statement

4) Do not specify engine='python' - we want to test that both the C and Python engines both work with TemporaryFile, even though the issue was specific to the Python engine. This is better coverage.

  1. Actually, since you do need that regex sep, move the test into python_parser_only.py- I forgot that initially when I said that common.py was the right place to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meqn reference the issue in the test (comment)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly:

def test_temporary_file(self):
    # see gh-13398
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those tests run? They all fail when called explicitely (and they should).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they do. If you look at test_parsers.py, all of them get imported as test cases.

Copy link
Contributor Author

@mbrucher mbrucher Jun 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the errors I get:

> nosetests -s pandas\io\tests\parser\python_parser_only.py:PythonParserTests
> EEEEEEEE
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_BytesIO_input
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
>     self.test(*self.arg)
>   File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 84, in test_BytesIO_input
>     result = self.read_table(data, sep="::", encoding='cp1255')
> AttributeError: 'PythonParserTests' object has no attribute 'read_table'
> 
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_decompression_regex_sep
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
>     self.test(*self.arg)
>   File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 133, in test_decompression_regex_sep
>     data = open(self.csv1, 'rb').read()
> AttributeError: 'PythonParserTests' object has no attribute 'csv1'
> 
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_negative_skipfooter_raises
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
>     self.test(*self.arg)
>   File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 34, in test_negative_skipfooter_raises
>     self.read_csv(StringIO(text), skipfooter=-1)
> AttributeError: 'PythonParserTests' object has no attribute 'read_csv'

And they are proper issues in the test file.

@mbrucher
Copy link
Contributor Author

Indeed, files work properly. So probably a Python default in the implementation. But the test in pandas was not the good one anyway (should have tested next instead of readline).

@codecov-io
Copy link

codecov-io commented Jun 18, 2016

Current coverage is 84.33%

Merging #13481 into master will increase coverage by <.01%

@@             master     #13481   diff @@
==========================================
  Files           138        138          
  Lines         51072      51083    +11   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43069      43083    +14   
+ Misses         8003       8000     -3   
  Partials          0          0          

Powered by Codecov. Last updated by 35bb1a1...8b52631

@gfyoung
Copy link
Member

gfyoung commented Jun 18, 2016

@jreback : not entirely sure why they changed the behavior of TemporaryFile - it's indeed an odd feature but it hasn't really reared its head until now. With regards to your question about opening and closing again, I'm not sure I follow you here. When you say re-open with the same name, how? I don't think you can specify a name with NamedTemporaryFile.

@@ -33,7 +33,8 @@ class TestCommonIOCapabilities(tm.TestCase):
foo2,12,13,14,15
bar2,12,13,14,15
"""

data2 = data1.replace(",", " ")

def test_expand_user(self):
Copy link
Member

@gfyoung gfyoung Jun 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your original example was fine. No need to do this here unless data2 is intended to be used elsewhere. All you had to do was change read_table to read_csv in the initial version of your test.

Also, move this into python_parser_only.py!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to use the regular expression and make sure I'm going through the code path I modified (I need a non default sep).
Also see my current issue with the file (not run by nosetests by default?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, figured it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your original test was using regular expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! That's just to build a consistent DataFrame with two different mechanism, I'm assuming the one with StringIO + csv data will always be OK.

@mbrucher
Copy link
Contributor Author

@gfyoung No, you can't specify a name indeed. It's if you want to open the file through another mechanism (which is what people use on forums, they don't pass the object, they open the file again, but it fails for Windows if yu do this).

@gfyoung
Copy link
Member

gfyoung commented Jun 18, 2016

@mbrucher : ah, yes, that is indeed true - it fails on Windows but does work on Unix though.

new_file.flush()
new_file.seek(0)

result = read_csv(new_file, sep=r"\s+", engine="python")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a corresponding test for the c engine that does the same? or is this specifically an issue with the python engine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's specific to the Python engine as it has to do with the regex.

new_file.flush()
new_file.seek(0)

result = self.read_csv(new_file, sep=r"\s+", engine="python")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to specify the engine here because it's already specified as Python only. General rule of thumb for these tests: you shouldn't need to explicitly specify the engine.

@jreback jreback added this to the 0.18.2 milestone Jun 19, 2016
@jreback
Copy link
Contributor

jreback commented Jun 19, 2016

ok, @gfyoung comment, and pls add a whatsnew note. ping when green.

new_file.seek(0)

result = self.read_csv(new_file, sep=r"\s+", engine="python")
expected = self.read_csv(StringIO(data1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better practice is to explicitly define your expected output by constructing the DataFrame itself. If you want, you can simplify your data input to make this construction easier to write out (does your data need to be that long for the error to be triggered - why not that nice simple data you used in your initial commit?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to be too clever. I reused the data from test_common, should have reverted it back.
Should be good now.

@jreback
Copy link
Contributor

jreback commented Jun 19, 2016

thanks @mbrucher ping when green.

@@ -493,6 +493,7 @@ Bug Fixes
- Bug in ``pd.read_csv()`` in which the ``nrows`` argument was not properly validated for both engines (:issue:`10476`)
- Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`)
- Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing ``NaN`` values were not being parsed (:issue:`13320`)
- Bug in ``pd.read_csv()`` with ``engine='python'`` when reading from a tempfile.TemporaryFile on Windows with Python 3 a file with the separator expressed as a regex (:issue:`13398`)
- Bug in ``pd.read_csv()`` that prevents ``usecols`` kwarg from accepting single-byte unicode strings (:issue:`13219`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some grammar issues in that sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed

new_file.flush()
new_file.seek(0)

result = self.read_csv(new_file, sep=r"\s+", header=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check that this actually errors without your changes? I couldn't get an error on my end (I think you need to change the sep argument to another multi-char regex).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the error when I revert the change.

Copy link
Member

@gfyoung gfyoung Jun 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...how odd. It's because we do special handling of \s+, and you might be seeing it (while I am not) due to LOCALE differences. To be safe, I would recommend passing in sep="\s*" for that test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand. I don't think it makes a difference as the test is not yet done there ont he regex, but I don't know the rest of pandas, so I trust you. I used @jreback regex, so using mine again in my last change.

@gfyoung
Copy link
Member

gfyoung commented Jun 19, 2016

Ah, that's better! @jreback this LGTM and is ready to merge if there are no other concerns, and Travis is happy. Thanks for your patience @mbrucher !

@mbrucher
Copy link
Contributor Author

Sorry for the earlier rants ;)
I fixed the last lint issues, hopefully the last commits won't break the fix I put in. We'll see tomorrow.

@mbrucher
Copy link
Contributor Author

@jreback Everything is green now, thx for the help!

@jreback jreback closed this in 30d710f Jun 22, 2016
@jreback
Copy link
Contributor

jreback commented Jun 22, 2016

thanks!

jreback pushed a commit that referenced this pull request Aug 9, 2016
Follow-up to #13481 and partially fixes #13932.  Regex mistakenly
matches to the empty string, which will cause Python 3.x to issue a
warning.  In addition, not sure why this test was placed in
`python_parsers_only.py`...

Author: gfyoung <[email protected]>

Closes #13943 from gfyoung/regex-empty-match and squashes the following commits:

b93325e [gfyoung] TST: Removed regex warning for read_csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Windows Windows OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TemporaryFile as input to read_table raises TypeError: '_TemporaryFileWrapper' object is not an iterator
4 participants