-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_csv adding additional columns as integers instead of strings #47137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1304,8 +1304,10 @@ cdef class TextReader: | |
if self.header is not None: | ||
j = i - self.leading_cols | ||
# generate extra (bogus) headers if there are more columns than headers | ||
# These should be strings, not integers, because otherwise we might get | ||
# issues with callables as usecols GH#46997 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if these are truely bogus, why are they even passed to the usecols callable? There are no circumstances where these "auto-generated" column names are/should be in the result? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see #47138, this is the bogus case You can access them when using an index-based usecols setting, but the column name is shifted afterwards. Edit: If we want to change this (I think I would be in favor of that), we have to deprecate first There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
right, so if they don't currently make their way into the result, we are not at risk of changing existing behavior with this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. At least I can not see a way in how this would impact anything. You can not select these columns by name via usecols. If you select them by position, an available name is used, not the bogus name. The only impact this should have is consistensy when feeding them into the usecols callable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
cool. |
||
if j >= len(self.header[0]): | ||
return j | ||
return str(j) | ||
elif self.has_mi_columns: | ||
return tuple(header_row[j] for header_row in self.header) | ||
else: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you reword so others are not confused like I was. "adding columns as integers" is misleading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but this is tricky. Technically this is an implementation detail that should not leak into the outside world...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm