-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_csv with date_parser lock file open on failure #15302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We had a very similar issue, can't seem to find this now. But on windows, this is fixed I think in 0.19.2, but maybe 0.20.0. (still in dev) can you try? |
I tried with 0.19.2 and the problem remains. When 0.20.0 becomes available on one of my systems I will give that a try. |
I thought this was solved, but was able to repro on windows. So it seems that we are not closing on calling a @rsheftel would you like to do a pull-request to fix? |
I don't think I am enough of an expert in the pandas code to submit a change that fixes this. Do you want me to just create a blank pull-request? (Sorry I'm new to the GitHub / collaborative world and how exactly it works) |
no this would be a regular pull-request that has tests and makes the change. http://pandas.pydata.org/pandas-docs/stable/contributing.html |
If I gain expertise in the code base and feel I can confidently contribute I will. Thanks. |
I looked into this with the help of @faizanv and finally got to the bottom of what's happening in pandas, but I'm still trying to figure out what's happening in the C API. The proximate cause of this issue is that when we're reading a standard CSV file from a user provided path, we open a file handle in C (using If there's no exception (or we catch the exception) the file handle is closed by a call to It's not clear to me why it gets called when we catch the exception versus when we don't. |
@jreback The issue is that Python knows nothing about any calls to |
We do something like this: try:
rows = parser.read()
finally:
parser.close() but we don't track the file handle of files opened by the |
you could try explicitly closing IN the actual parser when the .close method is called (ios do cleanup), but would prob have to set a flag so that you don't do it again in dealloc |
:) Funny you mention that. That is my current working solution |
I'm quite sure this should work with 1.3.2, but I'm not sure whether the following is a sufficient test for this: PYTHONWARNINGS="default" python test.py test.py: from io import StringIO
import pandas as pd
def strict_parser(dates):
assert False
with StringIO("a,b,c,datetime") as file:
pd.read_csv(
file,
parse_dates=["datetime"],
index_col=["datetime"],
date_parser=strict_parser,
) This should print a |
Cannot reproduce anymore with v1.5.0.dev0 on Linux.
|
take |
take |
Issue Description: Here what I observed is that when we are using the Proposed Solution: To handle this issue gracefully, we can modify the def strict_parser(dates):
try:
# Attempt to parse with timezone offset
datetimes = [pd.Timestamp(datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S%z'), tz='UTC') for date in dates]
return pd.DatetimeIndex(datetimes)
except ValueError:
# If parsing with timezone offset fails, try parsing without timezone offset
datetimes = [pd.Timestamp(datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S'), tz='UTC') for date in dates]
return pd.DatetimeIndex(datetimes) |
|
Problem description
When using the date_parser functionality of read_csv() if the file read fails then the file is left locked open. My use case is that I am trying to enforce a strict datetime format that must include the time zone offset. Another thread stated that the way to accomplish this is with date_parser. My issue is that I would like to move a file that fails being loaded to another directory, but I cannot because the failure in the date_parser keeps the file open until the python session is terminated.
Code Sample, a copy-pastable example if possible
The file data.csv is:
datetime,data
2010-05-05 09:30:00-0500,10
2010-05-05 09:35:00-0500,20
2010-05-05 09:40:00,30
Output of
pd.show_versions()
pandas: 0.19.1
The text was updated successfully, but these errors were encountered: