Skip to content

DOC: Updating the docstring of read_csv and related functions #23517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Nov 21, 2018

Conversation

thoo
Copy link
Contributor

@thoo thoo commented Nov 5, 2018

Make docstrings of the following modules to comply with pandas docstring

  • pandas.read_table
  • pandas.read_csv
  • pandas.read_fwf

thoo added 3 commits November 4, 2018 22:36
…fixed

* upstream/master:
  Run Isort on tests-> util,sereis,arrays (pandas-dev#23501)
  DOC: Fix syntax error in groupby docs (pandas-dev#23498)
  DOC: Fix DataFrame.nlargest and DataFrame.nsmallest doctests (pandas-dev#23202)
  DOC: Remove dead link and update links to https (pandas-dev#23476)
@pep8speaks
Copy link

Hello @thoo! Thanks for submitting the PR.

  • There are no PEP8 issues in the file pandas/io/parsers.py !

  • Complete extra results for this file :

file_to_check.py:310:-6532: W605 invalid escape sequence '\s'


@gfyoung gfyoung added Docs IO CSV read_csv, to_csv labels Nov 5, 2018
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fixes. I think besides fixing the parameters, that look great, we need examples and see also sections for these docstrings, right?

Sorry it's a bit tricky to find out myself the way those docstrings are created.

If a filepath is provided for `filepath_or_buffer`, map the file object
directly onto memory and access the data directly from there. Using this
option can improve performance because there is no longer any I/O overhead.
float_precision : str, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
float_precision : str, default None
float_precision : str, optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista Should I write a file like object to memory using io.StringIO() for the examples?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best option we found for showing functions that save to disk is df.to_csv('/tmp/data.csv') # doctest: +SKIP.

I'm not sure when reading, I think there is a directory with some files that are used for that. Can you take a look? Or may be @TomAugspurger can help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's best here. We could make an HTTP request to https://github.com/pandas-dev/pandas/blob/master/doc/data/tips.csv, but I'd rather avoid that on every test run. I'm fine with just skipping.


Returns
-------
result : DataFrame or TextParser
"""
result : DataFrame or TextParser"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result : DataFrame or TextParser"""
DataFrame or TextParser"""

delimiter : str, default ``None``
Alternative argument name for sep."""
Alternative argument name for sep.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Alternative argument name for sep.
Alias for sep.

@datapythonista datapythonista changed the title Io csv docstring fixed DOC: Updating the docstring of read_csv and related functions Nov 5, 2018
thoo added 2 commits November 9, 2018 21:21
…fixed

* upstream/master: (47 commits)
  CLN: remove values attribute from datetimelike EAs (pandas-dev#23603)
  DOC/CI: Add linting to rst files, and fix issues (pandas-dev#23381)
  PERF: Speeds up creation of Period, PeriodArray, with Offset freq (pandas-dev#23589)
  PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex (pandas-dev#23591)
  TST: Tests and Helpers for Datetime/Period Arrays (pandas-dev#23502)
  Update description of Index._values/values/ndarray_values (pandas-dev#23507)
  Fixes to make validate_docstrings.py not generate warnings or unwanted output (pandas-dev#23552)
  DOC: Added note about groupby excluding Decimal columns by default (pandas-dev#18953)
  ENH: Support writing timestamps with timezones with to_sql (pandas-dev#22654)
  CI: Auto-cancel redundant builds (pandas-dev#23523)
  Preserve EA dtype in DataFrame.stack (pandas-dev#23285)
  TST: Fix dtype mismatch on 32bit in IntervalTree get_indexer test (pandas-dev#23468)
  BUG: raise if invalid freq is passed (pandas-dev#23546)
  remove uses of (ts)?lib.(NaT|iNaT|Timestamp) (pandas-dev#23562)
  BUG: Fix error message for invalid HTML flavor (pandas-dev#23550)
  ENH: Support EAs in Series.unstack (pandas-dev#23284)
  DOC: Updating DataFrame.join docstring (pandas-dev#23471)
  TST: coverage for skipped tests in io/formats/test_to_html.py (pandas-dev#22888)
  BUG: Return KeyError for invalid string key (pandas-dev#23540)
  BUG: DatetimeIndex slicing with boolean Index raises TypeError (pandas-dev#22852)
  ...
@codecov
Copy link

codecov bot commented Nov 10, 2018

Codecov Report

Merging #23517 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23517      +/-   ##
==========================================
- Coverage   92.29%   92.29%   -0.01%     
==========================================
  Files         161      161              
  Lines       51493    51486       -7     
==========================================
- Hits        47524    47517       -7     
  Misses       3969     3969
Flag Coverage Δ
#multiple 90.68% <100%> (-0.01%) ⬇️
#single 42.31% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.55% <100%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99df7da...766d73f. Read the comment docs.

@thoo
Copy link
Contributor Author

thoo commented Nov 10, 2018

@datapythonista Could you re-review this? So should I leave {**kwds} errors from pandas.read_fwf for now?? Thanks.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just added couple of things I saw, not sure if they are right.

"""

_see_also = ("to_csv : Write DataFrame to "
"a comma-separated values (csv) file.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not missing something, in the 3 functions reusing _parser_params this same _see_also is used.

Can you check if that's the case, and if it is move the content inside _parser_params? Also, it'd be nice to have the 3 functions self-reference themselves, and also the equivalent to_* functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add more in See also section to cross-reference each other except read_table due to being deprecated in 0.24.0. I don't think we have to_table or to_fwf other than to_csv.

thoo added 2 commits November 10, 2018 22:07
…fixed

* upstream/master:
  DOC: Fixes to docstring to add validation to CI (pandas-dev#23560)
  DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600)
  MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592)
  DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)
  ENH: Support for partition_cols in to_parquet (pandas-dev#23321)
  TST: Use intp as expected dtype in IntervalIndex indexing tests (pandas-dev#23609)
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some ideas to make the rendering of the docstrings simpler. I think it was extremely complex for what it could be.

See Also
--------
to_csv : Write DataFrame to a comma-separated values (csv) file.
%s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear before. What I'd do is add read_csv and read_fwf here, so we remove a bit of the complexity in the rendering. Referencing a function to itself is not ideal, but I think it'd better than having too much complexity in the code.


Examples
--------
%s # doctest: +SKI"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be I'm missing something, but I think we should be able to do:

Suggested change
%s # doctest: +SKI"""
>>> pd.{func_name}('data.csv') # doctest: +SKIP"""

Besides removing the need to _example_doc, note also the >>>, the extra space before #, the missing P in SKIP, and I think I suggested myself using /tmp/, but as we're skipping it anyway, it's probably clearer for the user just using data.csv.

@@ -71,14 +72,6 @@
By file-like object, we refer to objects with a ``read()`` method, such as
a file handler (e.g. via builtin ``open`` function) or ``StringIO``.
%s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to name all the %s. Having them positional ames things a bit difficult to follow, at least for me.


%s
""" % (_parser_params % (_fwf_widths, ''))
""" % (_parser_params % (_fwf_widths, '',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to have {summary} in the first line of _parser_params than this.

Sorry to be picky will all these, but I'd really want to simplify all this. The code of these functions is already very complex, having all this extra complexity for the docstrings makes things much worse IMO. I know it's from the original code, thanks for the help cleaning them.

%s
""" % (_parser_params % (_sep_doc.format(default="\\t (tab-stop)"),
_engine_doc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah changing these to use named paramaters with .format() would be good

thoo added 3 commits November 11, 2018 22:15
…fixed

* upstream/master:
  DOC: Enhancing pivot / reshape docs (pandas-dev#21038)
  TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620)
  BUILD: Simplifying contributor dependencies (pandas-dev#23522)
  BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539)
  BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524)
  BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544)
  API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561)
  CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583)
  DOC: Fix Order of parameters in docstrings (pandas-dev#23611)
  TST: Unskip some Categorical Tests (pandas-dev#23613)
  TST: Fix integer ops comparison test (pandas-dev#23619)
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work, much much clearer now.

Just couple of small things that IMO would still improve the docstring. The first is renaming _parser_params to _parser_doc, as I think it better represents the content now.

Then, in the line 711 and around, the docstring is injected to the functions. I'd do the rendering of the docstring there, instead of creating intermediate variables _summary_read_csv and _read_csv_doc:

read_csv = _make_parser_function('read_csv', default_sep=',')
read_csv = Appender(_parser_doc.format(
    func_name='read_csv'
    summary='Read a comma-separated values (csv) file into DataFrame.',
    sep_doc=_sep_doc.format(default="','"),
    engine_doc=_engine_doc)
)(read_csv)

Feel free to disagree if you don't think that's an improvement, but I think it makes the code neater, and things easier to find.

Thanks!

thoo added 2 commits November 12, 2018 13:44
…fixed

* upstream/master:
  DOC: avoid SparseArray.take error (pandas-dev#23637)
  CLN: remove incorrect usages of com.AbstractMethodError (pandas-dev#23625)
  DOC: Adding validation of the section order in docstrings (pandas-dev#23607)
  BUG: Don't over-optimize memory with jagged CSV (pandas-dev#23527)
  DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635)
  More helpful Stata string length error. (pandas-dev#23629)
  BUG: astype fill_value for SparseArray.astype (pandas-dev#23547)
  CLN: datetimelike arrays: isort, small reorg (pandas-dev#23587)
  CI: Check in the CI that assert_raises_regex is not being used (pandas-dev#23627)
  CLN:Remove unused **kwargs from user facing methods (pandas-dev#23249)
""" % (_parser_params % (_sep_doc.format(default="\\t (tab-stop)"),
_engine_doc))
Alias for sep.
"""

_fwf_widths = """\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great.

I just think this is not being used anymore. Can you check please? And also run the validate_docstrings.py on the 3 docstrings edited here, to see that everything is ok? I guess the script should be reporting that these params are missing in the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is not used anymore. I run both validate_docstrings.py and python make.py --single, and they all look fine except {**kwds} error from read_fwf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. There is something that just by the code I don't understand. I see the signature of read_fwf is def read_fwf(filepath_or_buffer, colspecs='infer', widths=None, **kwds):, but we're using the docstring in _parser_doc that has lots of other parameters like header... That doesn't seem correct. Is the script not complaining of unknown parameters?

Or I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to add _fwf_widths back to read_fwf. But the script is still complaining of unknow parameters.

Parameters {**kwds} not documented
        Unknown parameters {decimal, float_precision, ...

I thought there will be a fix in the test to accommodate {**kwds}.

thoo added 2 commits November 12, 2018 22:28
…fixed

* upstream/master:
  CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655)
  DOC: Fix name of the See Also section titles in docstrings (pandas-dev#23653)
  DOC: clean-up recent doc errors/warnings (pandas-dev#23636)
@datapythonista
Copy link
Member

**kwds should also be documented, explaining why there are there (they are usually passed to another function). Any reason for not documenting them?

We can leave that for another PR if it's not trivial to add.

thoo added 2 commits November 14, 2018 20:47
…fixed

* upstream/master:
  DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651)
  DOC: Change release and whatsnew (pandas-dev#21599)
  DOC: Fix format of the See Also descriptions (pandas-dev#23654)
  DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374)
  ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692)
  CLN: Remove unnecessary code (pandas-dev#23696)
  Pin flake8-rst version (pandas-dev#23699)
  Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)
  CI: raise clone depth limit on CI
  BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688)
  REF: Move Excel names parameter handling to CSV (pandas-dev#23690)
  DOC: Accessing files from a S3 bucket. (pandas-dev#23639)
  Fix errorbar visualization (pandas-dev#23674)
  DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)
  DOC: Update is_sparse docstring (pandas-dev#19983)
  BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661)
  Add to_flat_index method to MultiIndex (pandas-dev#22866)
  CLN: Move to_excel to generic.py (pandas-dev#23656)
  TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
@thoo
Copy link
Contributor Author

thoo commented Nov 15, 2018

I added the doc for **kwds.
The only error left is from read_fwf:

1 Errors found:
        Unknown parameters {usecols, index_col, warn_bad_lines, chunksize, float_precision, false_values, na_values, skiprows, names, infer_datetime_format, parse_dates, delimiter, nrows, thousands, compression, escapechar, dialect, error_bad_lines, dayfirst, low_memory, header, skipfooter, quotechar, verbose, keep_date_col, converters, prefix, lineterminator, doublequote, skipinitialspace, mangle_dupe_cols, true_values, memory_map, quoting, decimal, comment, dtype, encoding, skip_blank_lines, iterator, delim_whitespace, tupleize_cols, na_filter, date_parser, keep_default_na, squeeze}

Is there anything I could do?

@datapythonista
Copy link
Member

read_fwf signature doesn't have all the parameters in the shared docstring, see its signature: read_fwf(filepath_or_buffer, colspecs='infer', widths=None, **kwds):

I don't think we should be reusing the docstring of the others for it, or have all those parameters in a separate variable that is only added in the other docstrings.

The point here is that the docstring for every function/method has to have the right information. You can render the documentation of read_fwf with ./doc/make.html --single=pandas.read_fwf. If you check the html, you should see that we're documenting parameters that the function doesn't have, and that obviously doesn't make sense.

thoo added 4 commits November 19, 2018 12:48
…fixed

* upstream/master: (46 commits)
  DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774)
  BUG: Don't warn if default conflicts with dialect (pandas-dev#23775)
  BUG: Fixing memory leaks in read_csv (pandas-dev#23072)
  TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771)
  STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370)
  ENH: between_time, at_time accept axis parameter (pandas-dev#21799)
  PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772)
  CLN: io/formats/html.py: refactor (pandas-dev#22726)
  API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466)
  TST: Add test case for GH14080 for overflow exception (pandas-dev#23762)
  BUG: Don't extract header names if none specified (pandas-dev#23703)
  BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618)
  DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621)
  PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634)
  TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761)
  DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650)
  ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662)
  DOC: Improve error message to show correct order (pandas-dev#23652)
  ENH: Improve error message for empty object array (pandas-dev#23718)
  ...
.. deprecated:: 0.24.0
Use :func:`pandas.read_csv` instead, passing ``sep='\\t'`` if necessary.""",
sep_doc=_sep_doc.format(default="\\t (tab-stop)"),
engine_doc=_engine_doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only these two functions read_csv and read_table use _parser_params now. So, if that's the case, and both have the same value for _engine_doc, I think it makes more sense to have the content directly inside _parser_params. And not have this variable anymore, right?

Other than that looks good to me. I think now the validation should be happy with the 3 functions, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also remove _sep_doc variable and merge it inside _parser_params. Let me know if you want the other way. Now the validation is fine with all these three 👍 .

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the common sep_doc stuff into _parser_params sounds like a good idea.

I just realized while writing this that _parser_params is not a good name anymore. Do you mind renaming it to _doc_read_csv_and_table or something like that?

Other than that looks good to me.

thoo added 3 commits November 19, 2018 19:30
…fixed

* upstream/master:
  DOC: more consistent flake8-commands in contributing.rst (pandas-dev#23724)
  DOC: Fixed the doctsring for _set_axis_name (GH 22895) (pandas-dev#22969)
  DOC: Improve GL03 message re: blank lines at end of docstrings. (pandas-dev#23649)
  TST: add tests for keeping dtype in Series.update (pandas-dev#23604)
  TST: For GH4861, Period and datetime in multiindex (pandas-dev#23776)
  TST: move .str-test to strings.py & parametrize it; precursor to pandas-dev#23582 (pandas-dev#23777)
  STY: isort tests/scalar, tests/tslibs, import libwindow instead of _window (pandas-dev#23787)
  BUG: fixed .str.contains(..., na=False) for categorical series (pandas-dev#22170)
  BUG: Maintain column order with groupby.nth (pandas-dev#22811)
  API/DEPR: replace kwarg "pat" with "sep" in str.[r]partition (pandas-dev#23767)
  CLN: Finish isort core (pandas-dev#23765)
  TST: Mark test_pct_max_many_rows with pytest.mark.single (pandas-dev#23799)
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, made suggestions for changes to the types, as we'll start validating them more strictly soon, but other than that looks good. If you can also remove one unnecessary item from the See Also, and let's merge afterwards.

--------
to_csv : Write DataFrame to a comma-separated values (csv) file.
read_csv : Read a comma-separated values (csv) file into DataFrame.
read_fwf : Read a table of fixed-width formatted lines into DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see that before, can you remove this line? As this docstring is only used for read_fwf, there is no value in self-referencing itself.

@@ -270,99 +273,70 @@
encoding : str, default None
Encoding to use for UTF when reading/writing (ex. 'utf-8'). `List of Python
standard encodings
<https://docs.python.org/3/library/codecs.html#standard-encodings>`_
<https://docs.python.org/3/library/codecs.html#standard-encodings>`_ .
dialect : str or csv.Dialect instance, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dialect : str or csv.Dialect instance, default None
dialect : str or csv.Dialect, optional

@@ -237,14 +245,9 @@
.. versionadded:: 0.18.1 support for 'zip' and 'xz' compression.

thousands : str, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
thousands : str, default None
thousands : str, optional

iterator : boolean, default False
dayfirst : bool, default False
DD/MM format dates, international and European format.
iterator : bool, default False
Return TextFileReader object for iteration or getting chunks with
``get_chunk()``.
chunksize : int, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
chunksize : int, default None
chunksize : int, optional

Skip spaces after delimiter.
skiprows : list-like or integer or callable, default None
skiprows : list-like or int or callable, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
skiprows : list-like or int or callable, default None
skiprows : list-like, int or callable, optional

%s
engine : {{'c', 'python'}}, optional
Parser engine to use. The C engine is faster while the python engine is
currently more feature-complete.
converters : dict, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
converters : dict, default None
converters : dict, optional

data rather than the first line of the file.
names : array-like, default None
List of column names to use. If file contains no header row, then you
should explicitly pass header=None. Duplicates in this list will cause
should explicitly pass ``header=None``. Duplicates in this list will cause
a ``UserWarning`` to be issued.
index_col : int or sequence or False, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
index_col : int or sequence or False, default None
index_col : int, sequence or bool, default None

will also force the use of the Python parsing engine. Note that regex
delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``.
delimiter : str, default ``None``
Alias for sep.
header : int or list of ints, default 'infer'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
header : int or list of ints, default 'infer'
header : int, str or list of int, default 'infer'

String value 'infer' can be used to instruct the parser to try
detecting the column specifications from the first 100 rows of
the data which are not being skipped via skiprows (default='infer').
widths : list of ints. optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
widths : list of ints. optional
widths : list of int, optional

By file-like object, we refer to objects with a ``read()`` method,
such as a file handler (e.g. via builtin ``open`` function)
or ``StringIO``.
colspecs : list of pairs (int, int) or 'infer'. optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
colspecs : list of pairs (int, int) or 'infer'. optional
colspecs : list of tuple (int, int) or 'infer'. optional

thoo added 2 commits November 20, 2018 11:31
…fixed

* upstream/master:
  DOC: Removing rpy2 dependencies, and converting examples using it to regular code blocks (pandas-dev#23737)
  BUG: Fix dtype=str converts NaN to 'n' (pandas-dev#22564)
  DOC: update pandas.core.resample.Resampler.nearest docstring (pandas-dev#20381)
  REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23810)
  Added support for Fraction and Number (PEP 3141) to pandas.api.types.is_scalar (pandas-dev#22952)
  DOC: Updating to_timedelta docstring (pandas-dev#23259)
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @thoo, looks perfect now.

@jreback jreback added this to the 0.24.0 milestone Nov 21, 2018
@jreback jreback merged commit f2ff633 into pandas-dev:master Nov 21, 2018
@jreback
Copy link
Contributor

jreback commented Nov 21, 2018

thanks @thoo pls check the rendered dev docs for correctness when they are built..

@thoo thoo deleted the io_csv_docstring_fixed branch January 2, 2019 20:26
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Fix docstring of read_csv and related methods
6 participants