Skip to content

REGR: ValueError raised when both prefix and names are set to None #42690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 30, 2021
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.3.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Fixed regressions
- Bug in :class:`Series` constructor not accepting a ``dask.Array`` (:issue:`38645`)
- Fixed regression for ``SettingWithCopyWarning`` displaying incorrect stacklevel (:issue:`42570`)
- Fixed regression in :func:`to_datetime` returning pd.NaT for inputs that produce duplicated values, when ``cache=True`` (:issue:`42259`)

- Fixed regression where :meth:`pandas.read_csv` raised a ``ValueError`` when parameters ``names`` and ``prefix`` were both set to None (:issue:`42387`)

.. ---------------------------------------------------------------------------

Expand Down
18 changes: 9 additions & 9 deletions pandas/io/parsers/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,11 +505,11 @@ def read_csv(
delimiter=None,
# Column and Index Locations and Names
header="infer",
names=lib.no_default,
names=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather leave these as no_default and fix the check itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only have one way of representing none/null, so we should either use None or lib.no_default, not have both. I think I prefer None given that lib.no_default is private.

If we want to keep using lib.no_default then, we should probably close OP as wontfix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i think you can simply revert these back to lib.no_default and fix the check on L1305. the point is that these are still invalid if actual None is passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure iiuc, but pd.read_csv(xxx, names=None, prefix=None) should fail?

Copy link
Member Author

@lithomas1 lithomas1 Jul 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work. It worked at least in previous versions. If None is passed for both, it would just get the column names from header, or use the "0,1,2,..." naming scheme, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right i agree this should work, but i also think you can leave the no_default and it will work (just odify the comparison)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thx for clarifying, update incoming soon.

index_col=None,
usecols=None,
squeeze=False,
prefix=lib.no_default,
prefix=None,
mangle_dupe_cols=True,
# General Parsing Configuration
dtype: DtypeArg | None = None,
Expand Down Expand Up @@ -603,11 +603,11 @@ def read_table(
delimiter=None,
# Column and Index Locations and Names
header="infer",
names=lib.no_default,
names=None,
index_col=None,
usecols=None,
squeeze=False,
prefix=lib.no_default,
prefix=None,
mangle_dupe_cols=True,
# General Parsing Configuration
dtype: DtypeArg | None = None,
Expand Down Expand Up @@ -1224,8 +1224,8 @@ def _refine_defaults_read(
error_bad_lines: bool | None,
warn_bad_lines: bool | None,
on_bad_lines: str | None,
names: ArrayLike | None | object,
prefix: str | None | object,
names: ArrayLike | None,
prefix: str | None,
defaults: dict[str, Any],
):
"""Validate/refine default values of input parameters of read_csv, read_table.
Expand Down Expand Up @@ -1302,11 +1302,11 @@ def _refine_defaults_read(
if delimiter and (sep is not lib.no_default):
raise ValueError("Specified a sep and a delimiter; you can only specify one.")

if names is not lib.no_default and prefix is not lib.no_default:
if names is not None and prefix is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g here

raise ValueError("Specified named and prefix; you can only specify one.")

kwds["names"] = None if names is lib.no_default else names
kwds["prefix"] = None if prefix is lib.no_default else prefix
kwds["names"] = names
kwds["prefix"] = prefix

# Alias sep -> delimiter.
if delimiter is None:
Expand Down
17 changes: 13 additions & 4 deletions pandas/tests/io/parser/common/test_common_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -764,15 +764,24 @@ def test_read_table_delim_whitespace_non_default_sep(all_parsers, delimiter):


@pytest.mark.parametrize("func", ["read_csv", "read_table"])
@pytest.mark.parametrize("prefix", [None, "x"])
@pytest.mark.parametrize("names", [None, ["a"]])
def test_names_and_prefix_not_lib_no_default(all_parsers, names, prefix, func):
def test_names_and_prefix_not_None_raises(all_parsers, func):
# GH#39123
f = StringIO("a,b\n1,2")
parser = all_parsers
msg = "Specified named and prefix; you can only specify one."
with pytest.raises(ValueError, match=msg):
getattr(parser, func)(f, names=names, prefix=prefix)
getattr(parser, func)(f, names=["a", "b"], prefix="x")


@pytest.mark.parametrize("func", ["read_csv", "read_table"])
@pytest.mark.parametrize("prefix, names", [(None, ["x0", "x1"]), ("x", None)])
def test_names_and_prefix_explicit_None(all_parsers, names, prefix, func):
# GH42387
f = StringIO("a,b\n1,2")
expected = DataFrame({"x0": ["a", "1"], "x1": ["b", "2"]})
parser = all_parsers
result = getattr(parser, func)(f, names=names, sep=",", prefix=prefix, header=None)
tm.assert_frame_equal(result, expected)


def test_dict_keys_as_names(all_parsers):
Expand Down