Skip to content

REF: Split up parsers into smaller files #39217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 21, 2021
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 17, 2021

Refactored them into 4 files, similar to described in the issue. If this is accepted I will try to clean this up a bit more through unifying the calls a bit

@phofl phofl added IO CSV read_csv, to_csv Refactor Internal refactoring of code labels Jan 17, 2021
@jreback jreback added this to the 1.3 milestone Jan 19, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would update pandas/io/parsers/__init__.py to import read_csv, read_fwf, read_table, and other public functions here (if any others).

you can leave .readers just import them to __init__.py (then you don't need to change the other modules that import).

otherwise this is a straight move?

@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

Thanks, will do that.

Yes, just moved them around. Will go to refactoring after this

@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

I moved the relevant functions up, I changed the import paths for the helper functions in the test functions

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ping on green

@jreback
Copy link
Contributor

jreback commented Jan 19, 2021

@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

Have to check, this one worked 3 days ago in the pipelines

@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

Hm merged master, let's check if this persists. Can not reproduce this locally

@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

Additionally this worked the commit before when I forgot a few imports. At least a place to start pinning if it persists

@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

Green now

@twoertwein I think you worked with unclosed handles the last few days/weeks? Could you maybe have a look and see if this is familiar?

@phofl
Copy link
Member Author

phofl commented Jan 19, 2021

@twoertwein
Copy link
Member

@twoertwein I think you worked with unclosed handles the last few days/weeks? Could you maybe have a look and see if this is familiar?

I assume that most of the ssl warnings come from boto3 (I honestly gave up on them for now in #39047). Wasn't a test like that recently xfailed (reason s3/boto3 being updated)?

@phofl
Copy link
Member Author

phofl commented Jan 20, 2021

Yes, but this test really failed not only raised a warning,

@twoertwein
Copy link
Member

do you mean the OSError: [Errno 98] Address already in use in test_server_and_default_headers test? Is it possible that multiple instances of pytest run on the same machine but they like to use the same port?

or test_warn_if_chunks_have_mismatched_type. I think I have looked at this (or a similar) test when working on #39047. As far as I remember, there shouldn't be a need for any "ssl-stuff" in this test. The only resource that might not be closed is StringIO. Locally (on linux) I can not re-create most of the ResourceWanrings.

@phofl
Copy link
Member Author

phofl commented Jan 20, 2021

Sorry this was not clear enough by myself. I was talking about test_warn_if_chunks_have_mismatched_type. Have the same issue, can not reproduce locally.

return TextFileReader(*args, **kwds)


def _clean_na_values(na_values, keep_default_na=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth moving from here down to another module (validation.py?) in a followup, e.g. keep readers.py for the actual readers

@jreback jreback merged commit 5d65e0a into pandas-dev:master Jan 21, 2021
@jreback
Copy link
Contributor

jreback commented Jan 21, 2021

thanks @phofl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST/REF: splitting pandas/io/parsers.py into multiple files
3 participants