-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: usecols kwarg accepts string when it should only allow list-like or callable. #20558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
b232fa7
1e23cb5
b74422b
14b0ac0
23f3e8e
2de72b1
8362885
b177e93
86cb16b
dafc442
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -97,11 +97,11 @@ | |||
MultiIndex is used. If you have a malformed file with delimiters at the end | ||||
of each line, you might consider index_col=False to force pandas to _not_ | ||||
use the first column as the index (row names) | ||||
usecols : array-like or callable, default None | ||||
Return a subset of the columns. If array-like, all elements must either | ||||
usecols : list-like or callable, default None | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leave this alone actually. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure. but below is_array_like will evaluate ['foo' ,'bar'] to False. so it's really a list-like IMHO. pandas/pandas/core/dtypes/inference.py Line 287 in 0a00365
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm...that's a fair point. Actually, leave that alone then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But if you're going to change it here, make sure to do it in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure. it's done. |
||||
Return a subset of the columns. If list-like, all elements must either | ||||
be positional (i.e. integer indices into the document columns) or strings | ||||
that correspond to column names provided either by the user in `names` or | ||||
inferred from the document header row(s). For example, a valid array-like | ||||
inferred from the document header row(s). For example, a valid list-like | ||||
`usecols` parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element | ||||
order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``. | ||||
To instantiate a DataFrame from ``data`` with element order preserved use | ||||
|
@@ -1177,7 +1177,7 @@ def _validate_usecols_arg(usecols): | |||
Parameters | ||||
---------- | ||||
usecols : array-like, callable, or None | ||||
usecols : list-like, callable, or None | ||||
List of columns to use when parsing or a callable that can be used | ||||
to filter a list of table columns. | ||||
|
@@ -1192,17 +1192,19 @@ def _validate_usecols_arg(usecols): | |||
'usecols_dtype` is the inferred dtype of 'usecols' if an array-like | ||||
is passed in or None if a callable or None is passed in. | ||||
""" | ||||
msg = ("'usecols' must either be all strings, all unicode, " | ||||
"all integers or a callable") | ||||
|
||||
msg = ("'usecols' must either be list-like of all strings, all unicode, " | ||||
"all integers or a callable.") | ||||
if usecols is not None: | ||||
if callable(usecols): | ||||
return usecols, None | ||||
usecols_dtype = lib.infer_dtype(usecols) | ||||
if usecols_dtype not in ('empty', 'integer', | ||||
'string', 'unicode'): | ||||
# GH20529, ensure is iterable container but not string. | ||||
elif not is_list_like(usecols): | ||||
raise ValueError(msg) | ||||
|
||||
else: | ||||
usecols_dtype = lib.infer_dtype(usecols) | ||||
if usecols_dtype not in ('empty', 'integer', | ||||
'string', 'unicode'): | ||||
raise ValueError(msg) | ||||
return set(usecols), usecols_dtype | ||||
return usecols, None | ||||
|
||||
|
@@ -1697,11 +1699,12 @@ def __init__(self, src, **kwds): | |||
# #2442 | ||||
kwds['allow_leading_cols'] = self.index_col is not False | ||||
|
||||
self._reader = parsers.TextReader(src, **kwds) | ||||
|
||||
# XXX | ||||
# GH20529, validate usecol arg before TextReader | ||||
self.usecols, self.usecols_dtype = _validate_usecols_arg( | ||||
self._reader.usecols) | ||||
kwds['usecols']) | ||||
kwds['usecols'] = self.usecols | ||||
|
||||
self._reader = parsers.TextReader(src, **kwds) | ||||
|
||||
passed_names = self.names is None | ||||
|
||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's expand the comment to this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.