Skip to content

DOC: Add to the io documentation of on_bad_lines to alert users of silently skipped lines. #50311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 3, 2023
15 changes: 15 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1255,6 +1255,21 @@ The bad line will be a list of strings that was split by the ``sep``:

.. versionadded:: 1.4.0

Note that the callable function will handle only a line with too many fields.
Bad lines caused by other errors will be silently skipped.

For example:

.. code-block:: ipython

def bad_lines_func(line):
print(line)

data = 'name,type\nname a,a is of type a\nname b,"b\" is of type b"'
data
pd.read_csv(data, on_bad_lines=bad_lines_func, engine="python")

The line was not processed in this case, as a "bad line" here is caused by an escape character.

You can also use the ``usecols`` parameter to eliminate extraneous column
data that appear in some lines but not others:
Expand Down