Skip to content

REF: consolidate in CSV parser module #39345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
arw2019 opened this issue Jan 23, 2021 · 3 comments
Open

REF: consolidate in CSV parser module #39345

arw2019 opened this issue Jan 23, 2021 · 3 comments
Labels
IO CSV read_csv, to_csv Refactor Internal refactoring of code

Comments

@arw2019
Copy link
Member

arw2019 commented Jan 23, 2021

Follow-up to #38930, mentioned in #39217 and #38370

We'd like to consolidate the parser classes (C, python and pyarrow). One way to go would be to move the logic common to all parsers into ParserBase and delegate parser-specific logic into methods on the subclasses that ParserBase calls

IMO this is best done after the initial pyarrow engine PR (#38370) is merged (so as to to kill all the birds with one stone) but ofc completely up to whoever is doing the work. Creating this issue to track the discussion(s) in either case

cc @jreback @phofl

@arw2019 arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member Refactor Internal refactoring of code IO CSV read_csv, to_csv and removed Bug labels Jan 23, 2021
@phofl
Copy link
Member

phofl commented Jan 23, 2021

Thanks for opening. I wanted to start with a few smaller steps working on single functions, If this has a big impact on the pyarrow case we could wait of course.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2021

we should do this now before the iyar row change (which certainly can inspire)

@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Feb 2, 2021
@phofl
Copy link
Member

phofl commented May 1, 2021

@arw2019 I came back to this today and looked into it a bit more. Since most of the C-stuff lives in the cdef class TextReader, we can not directly share with PythonParser. Have you already put any thought in this how to reorganize here to share more of the duplicated stuff? Like building a equivalent TextReader class for the python parser handling most of the stuff for the PythonParser as it is done in TextReader?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Refactor Internal refactoring of code
Projects
None yet
Development

No branches or pull requests

4 participants