-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TemporaryFile as input to read_table raises TypeError: '_TemporaryFileWrapper' object is not an iterator #13398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is only with |
in the future, pls show the entire |
pull-requests are welcome |
Do you mean that if I used sep='\s+', there is no exception? |
yes if u were splitting on white space it would use the c engine which would give u an error that the data file is empty since u used a regex it went to the python engine and gives that weird error (only on Windows) |
Oh, OK. The thing is that I may have several spaces between columns, so I have to use the regex :( |
\s+ is white space with at least a single space having 0 spaces is very weird |
Yes, agreed that 0 spaces is weird :) |
oh the example above it IS empty in any case I'd u would like to debug - I think it's a simple fix |
Oh yes, sorry. I forgot I had to remove the data as it is confidential! |
The issue is that you can't call next() on a file apparently. |
I should add that this advise also applies to normal file objects (i.e. those created by calling |
@gfyoung this repros exactly as above with an empty file |
I know but I thought @mbrucher said the file contained data, and I was addressing that. In any case, unless a more convincing example can provided, I think this is safe to close, as the function does work with tempfiles in the manner I described , data or no data. |
no it doesn't on Windows |
|
So if the file is populated, of course same issue:
Tested on OS X with Python 2.7 (brew version), works like a charm, so there must be a difference in the implementation. I don't have a 3.5 on my Mac, so can't try it to see if it's the OS or the Python version :/ @gfyoung I know perfectly well how files work, thank you very much. I've been writing Python for more than a decade now, I hit all these issues in the past and obviously I know how to avoid them. But I guess you haven't tried my code before posting your message. As @jreback said, it should be "easy" to fix, so I'll have a try when I have time. |
@mbrucher what do you mean a 'list of strings', do you mean? you can! The difference is that this is not very efficient as have to be introspected (to figure out what exactly you are passing, as there are many possibilities), and then converted to a storage format (e.g. numpy). These may not necessarily be cheap; hence from the parser has more info available (e.g. it already knows the layout and can infer dtypes directly).
|
Actually I was thinking of something like pd.read_table(["0 0", "1 1"], header=None, sep=r"\s+", engine="python") as the data is not yet parsed in my case (reading a report file that mixes lots of things together, only looking for specific tables that I then append to a list). |
Much more efficient to do this with the c-engine, you have whitespace separating. Introduce line separation and you are set.
|
OK, thanks. It seems that file like object don't implement next(). The issue comes from the fact that to select the type of reader, we check the attribute readline which is used for separators of length 1, but pandas uses next() for the other separators. |
@mbrucher : Whoa, slow down there, aren't we letting our ego get bit in the way of rationale conversation? First of all, your code gave no indication that you were aware of this, so if you would like to update your code example in the initial post, go right ahead and do so. Second, I did in fact try it out on a newly-acquired Windows 7 machine using Python 2.7.11 using |
@gfyoung Which is why I specified the Python version, as there is a change in the API AFAIK on the behavior of next. Anyway, the pull request fixes it and I'm adding a test as we speek. |
@mbrucher : fair enough - but it's worthwhile to note since this issue you raise isn't then a general Windows bug but rather a change in the way |
They must have forgotten when they changed the next API :( |
dcloses #13398 Author: Matthieu Brucher <[email protected]> Closes #13481 from mbrucher/issue-13398 and squashes the following commits: 8b52631 [Matthieu Brucher] Yet another small update for more general regex 0d54151 [Matthieu Brucher] Simplified 5871625 [Matthieu Brucher] Grammar aa3f0aa [Matthieu Brucher] lint change 1c33fb5 [Matthieu Brucher] Simplified test and added what's new note. d8ceb57 [Matthieu Brucher] lint changes fd20aaf [Matthieu Brucher] Moved the test to the Python parser test file 98e476e [Matthieu Brucher] Using same way of referencing as just above, consistency. 119fb65 [Matthieu Brucher] Added reference to original issue in the test + test the result itself (assuming that previous test is OK) 5af8465 [Matthieu Brucher] Adding a test with Python engine d8decae [Matthieu Brucher] #13398 Change the way of reading back to readline (consistent with the test before entering the function)
Although the requirement in the doc says that the input can be a file like object, it doesn't work with objects from tempfile. On Windows, they can't be reopened, so I need to pass the object itself.
Code Sample, a copy-pastable example if possible
Expected Output
Not an exception!
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.1.final.0
python-bits: 64
pandas: 0.18.0
The text was updated successfully, but these errors were encountered: