Skip to content

REF: Unify _set_noconvert_dtype_columns for parsers #39365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 27, 2021

Conversation

phofl
Copy link
Member

@phofl phofl commented Jan 24, 2021

I unified the code which led to a bugfix. The test failed for the python parser case previously

@phofl phofl added IO CSV read_csv, to_csv Refactor Internal refactoring of code labels Jan 24, 2021
@@ -546,6 +548,65 @@ def _convert_to_ndarrays(
print(f"Filled {na_count} NA values in column {c!s}")
return result

def _set_noconvert_dtype_columns(self, col_indices, names):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type this in any way? (esp the return value) and add a doc-string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if self.usecols_dtype == "integer":
# A set of integers will be converted to a list in
# the correct order every single time.
usecols = list(self.usecols)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> sorted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe-sort, because could be mixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no could not, sorry. Used sorted

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. if you can try that suggestion and if it works ping (or comment that its a nogo)


# pandas\io\parsers.py:2030: error: Incompatible types in
# assignment (expression has type "None", variable has type
# "List[Any]") [assignment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if you predeclare at the top

usecols: Optional[List[Any]] = []

you can remove the ignore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this works, typed it a bit more specific.
I've got #39342 for the mypy errors in there, will rebase when this is merged

parse_dates=[1],
usecols=[1, 2],
thousands="-",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bug yes? can you add a whatsnew note

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, added

@phofl
Copy link
Member Author

phofl commented Jan 26, 2021

@jreback green

@jreback jreback added this to the 1.3 milestone Jan 27, 2021
@jreback jreback merged commit bc3adf2 into pandas-dev:master Jan 27, 2021
@jreback
Copy link
Contributor

jreback commented Jan 27, 2021

thanks !

@phofl phofl deleted the ref_noconvert branch January 27, 2021 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants