Skip to content

BUG: windows with TemporaryFile an read_csv #13398 #13481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.18.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,7 @@ Bug Fixes
- Bug in ``pd.read_csv()`` in which the ``nrows`` argument was not properly validated for both engines (:issue:`10476`)
- Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`)
- Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing ``NaN`` values were not being parsed (:issue:`13320`)
- Bug in ``pd.read_csv()`` with ``engine='python'`` when reading from a tempfile.TemporaryFile on Windows with Python 3 a file with the separator expressed as a regex (:issue:`13398`)
- Bug in ``pd.read_csv()`` that prevents ``usecols`` kwarg from accepting single-byte unicode strings (:issue:`13219`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some grammar issues in that sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed

- Bug in ``pd.read_csv()`` that prevents ``usecols`` from being an empty set (:issue:`13402`)
- Bug in ``pd.read_csv()`` with ``engine=='c'`` in which null ``quotechar`` was not accepted even though ``quoting`` was specified as ``None`` (:issue:`13411`)
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1868,7 +1868,7 @@ class MyDialect(csv.Dialect):

else:
def _read():
line = next(f)
line = f.readline()
pat = re.compile(sep)
yield pat.split(line.strip())
for line in f:
Expand Down
15 changes: 15 additions & 0 deletions pandas/io/tests/parser/python_parser_only.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,18 @@ def test_read_table_buglet_4x_multiindex(self):
columns=list('abcABC'), index=list('abc'))
actual = self.read_table(StringIO(data), sep='\s+')
tm.assert_frame_equal(actual, expected)

def test_temporary_file(self):
# GH13398
data1 = "0,0"
data2 = data1.replace(",", " ")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this? What's wrong with data = "0 0"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing. It's late Sunday evening with a crappy game on tv :p

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Join the club! 😄

from tempfile import TemporaryFile
new_file = TemporaryFile("w+")
new_file.write(data2)
new_file.flush()
new_file.seek(0)

result = self.read_csv(new_file, sep=r"\s+", header=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check that this actually errors without your changes? I couldn't get an error on my end (I think you need to change the sep argument to another multi-char regex).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the error when I revert the change.

Copy link
Member

@gfyoung gfyoung Jun 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...how odd. It's because we do special handling of \s+, and you might be seeing it (while I am not) due to LOCALE differences. To be safe, I would recommend passing in sep="\s*" for that test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand. I don't think it makes a difference as the test is not yet done there ont he regex, but I don't know the rest of pandas, so I trust you. I used @jreback regex, so using mine again in my last change.

expected = DataFrame([[0, 0]])
tm.assert_frame_equal(result, expected)