Skip to content

BUG: unclosed file warning when passing invalid encoding #30034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 5, 2019

Conversation

jbrockmendel
Copy link
Member

This isn't directly related to #30031 but might solve it if the problem there is unclosed files piling up.

TL;DR: the actual problem is in the stdlib codecs.open:

def open(filename, mode='r', encoding=None, errors='strict', buffering=1):
    if encoding is not None and 'b' not in mode:
        mode = mode + 'b'
    file = builtins.open(filename, mode, buffering)
    if encoding is None:
        return file
    info = lookup(encoding)
    srw = StreamReaderWriter(file, info.streamreader, info.streamwriter, errors)
    # Add attributes to simplify introspection
    srw.encoding = encoding
    return srw

When we pass a bogus encoding "foo" in test_format, the lookup call raises, and leaves behind the file opened a few lines earlier.

@WillAyd
Copy link
Member

WillAyd commented Dec 4, 2019

Pretty interesting - is there something on bpo for this?

@simonjayhawkins
Copy link
Member

should we consider replacing all uses of codecs. used in a few other places?

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Dec 4, 2019
@jreback jreback added this to the 1.0 milestone Dec 4, 2019
@jreback
Copy link
Contributor

jreback commented Dec 4, 2019

seems the windows builds failed with this change for some reason.

@jbrockmendel
Copy link
Member Author

is there something on bpo for this?

Just opened https://bugs.python.org/issue38971

should we consider replacing all uses of codecs. used in a few other places?

Maybe? I'd want to track down why we're using codecs.open in the first place (best guess is a py2/py3 compat legacy)

seems the windows builds failed with this change for some reason.

Yah, looks like we're getting '\n' instead of '\r\n' somewhere along the way. Shouldn't be too hard to troubleshoot.

@jbrockmendel
Copy link
Member Author

windows problem was fixed by passing newline=""

@jreback jreback merged commit 773d4f8 into pandas-dev:master Dec 5, 2019
@jreback
Copy link
Contributor

jreback commented Dec 5, 2019

thanks

@jbrockmendel jbrockmendel deleted the ci-resource branch December 5, 2019 16:08
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants