Skip to content

DOC: Improved the docstring of errors.ParserWarning #20076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 15, 2018

Conversation

joaoavf
Copy link
Contributor

@joaoavf joaoavf commented Mar 9, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################### Docstring (pandas.errors.ParserWarning)  ###################
################################################################################

Warning raised when reading a file that doesn't use the default parser.

Thrown by `pd.read_csv` and `pd.read_table` when it is necessary to
change parsers, generally from 'c' to 'python'.

It happens due to lack of support or functionality for parsing
particular attributes of a CSV file with the requested engine.

Currently, C-unsupported options include the following parameters:

1. `sep` other than a single character (e.g. regex separators)
2. `skipfooter` higher than 0
3. `sep=None` with `delim_whitespace=False`

The warning can be avoided by adding `engine='python'` as a parameter
in `pd.read_csv` and `pd.read_table` methods.

See Also
--------
pd.read_csv : Read CSV (comma-separated) file into DataFrame.
pd.read_table : Read general delimited file into DataFrame.

Examples
--------
Using a `sep` in `pd.read_csv` other than a single character:

>>> import io
>>> csv = u'''a;b;c
...           1;1,8
...           1;2,1'''
>>> df = pd.read_csv(io.StringIO(csv), sep='[;,]')
Traceback (most recent call last):
...
ParserWarning: Falling back to the 'python' engine...

Adding `engine='python'` to `pd.read_csv` removes the Warning:

>>> df = pd.read_csv(io.StringIO(csv), sep='[;,]', engine='python')
scripts/validate_docstrings.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  #!/usr/bin/env python

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        No returns section found
        Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 32, in pandas.errors.ParserWarning
Failed example:
    df = pd.read_csv(io.StringIO(csv), sep='[;,]')
Expected:
    Traceback (most recent call last):
    ...
    ParserWarning: Falling back to the 'python' engine...
Got nothing

I am documenting a Warning and I could not find a better way to display the warning in the html example other than using a "Traceback (most recent call last):" followed by "ParserWarning: Falling back to the 'python' engine..." in the docstring.

It also says that it found errors about "No returns sections found". On what I understood this is not relevant to the docstring in hand.

@pep8speaks
Copy link

pep8speaks commented Mar 9, 2018

Hello @joaoavf! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 15, 2018 at 19:28 Hours UTC

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, added couple of comments.

>>> df = pd.read_csv(io.StringIO(csv), sep='[;,]')
Traceback (most recent call last):
...
ParserWarning: Falling back to the 'python' engine...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check why the validation says that this test didn't pass, and that the read_csv returned nothing?

Copy link
Contributor Author

@joaoavf joaoavf Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran the code in my console I had this warning displayed: 'ParserWarning: Falling back to the 'python' engine...'

I thought it might have something to do as it is a warning and not an error. Something along the lines that the kind of output generated by an error could be caught by Traceback but not the output of a warning.

Any ideas on how to fix and approach this?

parsing particular attributes of a CSV file with the requested engine.
Warning raised in `pd.read_csv` and `pd.read_table` when it is
necessary to change parsers, generally from 'c' to 'python'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line needs to fit in a line. Can you write something more concise please? This paragraph is really useful, and it surely needs to be in the description, but the first line is used in some summaries that should be shorter. Something like Warning raised when reading a table does not use the default parser. Not sure if it's accurate or fits in one line, but to give you an idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Mark! Thanks for the suggestion. Already commited my version of it.


The warning can be avoided by adding `engine='python'` as a parameter
in `pd.read_csv` and `pd.read_table` methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read_csv and read_table are good candidates for a See Also section, as you're mentioning them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a See Also section with read_csv and read_table.

@codecov
Copy link

codecov bot commented Mar 9, 2018

Codecov Report

Merging #20076 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20076      +/-   ##
==========================================
+ Coverage    91.7%    91.7%   +<.01%     
==========================================
  Files         150      150              
  Lines       49122    49152      +30     
==========================================
+ Hits        45045    45074      +29     
- Misses       4077     4078       +1
Flag Coverage Δ
#multiple 90.08% <ø> (ø) ⬆️
#single 41.84% <ø> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/errors/__init__.py 92.3% <ø> (ø) ⬆️
pandas/core/base.py 96.78% <0%> (-0.02%) ⬇️
pandas/core/indexes/datetimes.py 95.64% <0%> (-0.01%) ⬇️
pandas/core/series.py 93.85% <0%> (-0.01%) ⬇️
pandas/core/groupby.py 92.14% <0%> (-0.01%) ⬇️
pandas/core/indexes/base.py 96.66% <0%> (-0.01%) ⬇️
pandas/core/generic.py 95.84% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.06% <0%> (ø) ⬆️
pandas/core/strings.py 98.32% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 91.03% <0%> (ø) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 731d971...17687b5. Read the comment docs.

parsing particular attributes of a CSV file with the requested engine.
Warning raised when reading a file that doesn't use the default parser.

Thrown by `pd.read_csv` and `pd.read_table` when it is necessary to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thrown -> Raised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff, I made all the changes you requested. Thank you for the feedback.

to change parsers (generally from 'c' to 'python') contrary to the
one specified by the user due to lack of support or functionality for
parsing particular attributes of a CSV file with the requested engine.
Warning raised when reading a file that doesn't use the default parser.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say default is the c parser

Thrown by `pd.read_csv` and `pd.read_table` when it is necessary to
change parsers, generally from 'c' to 'python'.

It happens due to lack of support or functionality for parsing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to a lack

for parsing a particular attribute

It happens due to lack of support or functionality for parsing
particular attributes of a CSV file with the requested engine.

Currently, C-unsupported options include the following parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'c' unsupported options

@jreback jreback added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels Mar 10, 2018
@jreback jreback added this to the 0.23.0 milestone Mar 10, 2018
@TomAugspurger
Copy link
Contributor

#20309 (comment) for doctesting warnings. Thanks @joaoavf

@TomAugspurger TomAugspurger merged commit 30e0006 into pandas-dev:master Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants