Skip to content

BUG: windows with TemporaryFile an read_csv #13398 #13481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
2 changes: 1 addition & 1 deletion pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1868,7 +1868,7 @@ class MyDialect(csv.Dialect):

else:
def _read():
line = next(f)
line = f.readline()
pat = re.compile(sep)
yield pat.split(line.strip())
for line in f:
Expand Down
15 changes: 14 additions & 1 deletion pandas/io/tests/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ class TestCommonIOCapabilities(tm.TestCase):
foo2,12,13,14,15
bar2,12,13,14,15
"""

data2 = data1.replace(",", " ")

def test_expand_user(self):
Copy link
Member

@gfyoung gfyoung Jun 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your original example was fine. No need to do this here unless data2 is intended to be used elsewhere. All you had to do was change read_table to read_csv in the initial version of your test.

Also, move this into python_parser_only.py!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to use the regular expression and make sure I'm going through the code path I modified (I need a non default sep).
Also see my current issue with the file (not run by nosetests by default?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, figured it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your original test was using regular expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! That's just to build a consistent DataFrame with two different mechanism, I'm assuming the one with StringIO + csv data will always be OK.

filename = '~/sometest'
expanded_name = common._expand_user(filename)
Expand Down Expand Up @@ -90,6 +91,18 @@ def test_iterator(self):
expected.index = [0 for i in range(len(expected))]
tm.assert_frame_equal(concat(it), expected.iloc[1:])

def test_temporary_file(self):
# GH13398
from tempfile import TemporaryFile
new_file = TemporaryFile("w+")
new_file.write(self.data2)
new_file.flush()
new_file.seek(0)

result = read_csv(new_file, sep=r"\s+", engine="python")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a corresponding test for the c engine that does the same? or is this specifically an issue with the python engine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's specific to the Python engine as it has to do with the regex.

expected = read_csv(StringIO(self.data1))
tm.assert_frame_equal(result, expected)


Copy link
Member

@gfyoung gfyoung Jun 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. No need to import read_table for this. read_csv and read_table are absolutely identical (except for the default delimiter), and read_csv will also trigger this error.

  2. You should check the result of the read_ operation to make sure you aren't getting garbage out.

  3. Make reference the issue (the one you raised) under the def ... statement

4) Do not specify engine='python' - we want to test that both the C and Python engines both work with TemporaryFile, even though the issue was specific to the Python engine. This is better coverage.

  1. Actually, since you do need that regex sep, move the test into python_parser_only.py- I forgot that initially when I said that common.py was the right place to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meqn reference the issue in the test (comment)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly:

def test_temporary_file(self):
    # see gh-13398
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those tests run? They all fail when called explicitely (and they should).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they do. If you look at test_parsers.py, all of them get imported as test cases.

Copy link
Contributor Author

@mbrucher mbrucher Jun 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the errors I get:

> nosetests -s pandas\io\tests\parser\python_parser_only.py:PythonParserTests
> EEEEEEEE
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_BytesIO_input
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
>     self.test(*self.arg)
>   File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 84, in test_BytesIO_input
>     result = self.read_table(data, sep="::", encoding='cp1255')
> AttributeError: 'PythonParserTests' object has no attribute 'read_table'
> 
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_decompression_regex_sep
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
>     self.test(*self.arg)
>   File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 133, in test_decompression_regex_sep
>     data = open(self.csv1, 'rb').read()
> AttributeError: 'PythonParserTests' object has no attribute 'csv1'
> 
> ======================================================================
> ERROR: pandas.io.tests.parser.python_parser_only.PythonParserTests.test_negative_skipfooter_raises
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "G:\Anaconda3\lib\site-packages\nose\case.py", line 198, in runTest
>     self.test(*self.arg)
>   File "G:\Informatique\pandas\pandas\io\tests\parser\python_parser_only.py", line 34, in test_negative_skipfooter_raises
>     self.read_csv(StringIO(text), skipfooter=-1)
> AttributeError: 'PythonParserTests' object has no attribute 'read_csv'

And they are proper issues in the test file.

class TestMMapWrapper(tm.TestCase):

Expand Down