Skip to content

ERR: improves message raised for bad names in usecols (GH14154) #14163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -471,6 +471,7 @@ Other enhancements
- :meth:`~DataFrame.to_html` now has a ``border`` argument to control the value in the opening ``<table>`` tag. The default is the value of the ``html.border`` option, which defaults to 1. This also affects the notebook HTML repr, but since Jupyter's CSS includes a border-width attribute, the visual effect is the same. (:issue:`11563`).
- Raise ``ImportError`` in the sql functions when ``sqlalchemy`` is not installed and a connection string is used (:issue:`11920`).
- Compatibility with matplotlib 2.0. Older versions of pandas should also work with matplotlib 2.0 (:issue:`13333`)
- When using the ``usecols`` argument in the ``read`` functions, specifying a column name that isn't found now generates a more helpful error message (:issue:`14154`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read -> pd.read_csv()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also move this to 0.20.0.txt


.. _whatsnew_0190.api:

Expand Down
26 changes: 18 additions & 8 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -981,8 +981,7 @@ def _validate_usecols_arg(usecols):

if usecols is not None:
usecols_dtype = lib.infer_dtype(usecols)
if usecols_dtype not in ('empty', 'integer',
'string', 'unicode'):
if usecols_dtype not in ('empty', 'integer', 'string', 'unicode'):
raise ValueError(msg)

return set(usecols)
Expand Down Expand Up @@ -1424,7 +1423,13 @@ def __init__(self, src, **kwds):
if (i in self.usecols or n in self.usecols)]

if len(self.names) < len(self.usecols):
raise ValueError("Usecols do not match names.")
bad_cols = [n for n in self.usecols if n not in self.names]
if len(bad_cols) > 0:
raise ValueError(("%s specified in usecols but not found "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use .format()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you prob also need a ','.join(bad_cols), need a test for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I'm trying to get the following test to work:

msg = ("c, d specified in usecols but not found in names.")
with tm.assertRaisesRegexp(ValueError, msg):
    self.read_csv(StringIO(data), names=['a', 'b'], usecols=['c', 'd'], header=None)

Currently this test fails because two exceptions are being raised: one by parsers.py, which produces the expected message; and one by parser.pyx, which produces a different message. But, when I try to play with this in my interpreter, I only get the first, expected, ValueError message. Where does the parser.pyxexception get raised to? Is it necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's possible we need to put this checking earlier

"in names.") % bad_cols)
else:
raise ValueError(("Number of usecols is greater than "
"number of names."))

self._set_noconvert_columns()

Expand Down Expand Up @@ -2185,16 +2190,21 @@ def _handle_usecols(self, columns, usecols_key):
usecols_key is used if there are string usecols.
"""
if self.usecols is not None:
if any([isinstance(u, string_types) for u in self.usecols]):
if any([isinstance(c, string_types) for c in self.usecols]):
if len(columns) > 1:
raise ValueError("If using multiple headers, usecols must "
"be integers.")
bad_cols = [n for n in self.usecols if n not in usecols_key]
if len(bad_cols) > 0:
raise ValueError(("%s specified in usecols but not found "
"in names.") % bad_cols)

col_indices = []
for u in self.usecols:
if isinstance(u, string_types):
col_indices.append(usecols_key.index(u))
for c in self.usecols:
if isinstance(c, string_types):
col_indices.append(usecols_key.index(c))
else:
col_indices.append(u)
col_indices.append(c)
else:
col_indices = self.usecols

Expand Down
2 changes: 2 additions & 0 deletions pandas/io/tests/parser/usecols.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ def test_usecols(self):
# length conflict, passed names and usecols disagree
self.assertRaises(ValueError, self.read_csv, StringIO(data),
names=['a', 'b'], usecols=[1], header=None)
self.assertRaises(ValueError, self.read_csv, StringIO(data),
names=['a', 'b'], usecols=['A'], header=None)
Copy link
Contributor

@jreback jreback Sep 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test with multiple bad ones (it will fail with a formatting error currently)


def test_usecols_index_col_False(self):
# see gh-9082
Expand Down