Skip to content

Commit a64ff48

Browse files
DOC: Enforce Numpy Docstring Validation for pandas.ExcelFile, pandas.ExcelFile.parse and pandas.ExcelWriter (#58235)
* fixed docstring for pandas.ExcelFile * fixed docstring for pandas.ExcelFile.parse * fixed docstring for pandas.ExcelWriter * removed methods pandas.ExcelFile, pandas.ExcelFile.parse and pandas.ExcelWriter * fixed E501 Line too long for pandas.ExcelFile.parse * used storage_options definition from _shared_docs[storage_options]
1 parent d8c7e85 commit a64ff48

File tree

2 files changed

+142
-7
lines changed

2 files changed

+142
-7
lines changed

ci/code_checks.sh

-3
Original file line numberDiff line numberDiff line change
@@ -153,9 +153,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
153153
-i "pandas.DatetimeTZDtype SA01" \
154154
-i "pandas.DatetimeTZDtype.tz SA01" \
155155
-i "pandas.DatetimeTZDtype.unit SA01" \
156-
-i "pandas.ExcelFile PR01,SA01" \
157-
-i "pandas.ExcelFile.parse PR01,SA01" \
158-
-i "pandas.ExcelWriter SA01" \
159156
-i "pandas.Float32Dtype SA01" \
160157
-i "pandas.Float64Dtype SA01" \
161158
-i "pandas.Grouper PR02,SA01" \

pandas/io/excel/_base.py

+142-4
Original file line numberDiff line numberDiff line change
@@ -979,6 +979,12 @@ class ExcelWriter(Generic[_WorkbookT]):
979979
980980
.. versionadded:: 1.3.0
981981
982+
See Also
983+
--------
984+
read_excel : Read an Excel sheet values (xlsx) file into DataFrame.
985+
read_csv : Read a comma-separated values (csv) file into DataFrame.
986+
read_fwf : Read a table of fixed-width formatted lines into DataFrame.
987+
982988
Notes
983989
-----
984990
For compatibility with CSV writers, ExcelWriter serializes lists
@@ -1434,6 +1440,7 @@ def inspect_excel_format(
14341440
return "zip"
14351441

14361442

1443+
@doc(storage_options=_shared_docs["storage_options"])
14371444
class ExcelFile:
14381445
"""
14391446
Class for parsing tabular Excel sheets into DataFrame objects.
@@ -1472,19 +1479,27 @@ class ExcelFile:
14721479
- Otherwise if ``path_or_buffer`` is in xlsb format,
14731480
`pyxlsb <https://pypi.org/project/pyxlsb/>`_ will be used.
14741481
1475-
.. versionadded:: 1.3.0
1482+
.. versionadded:: 1.3.0
14761483
14771484
- Otherwise if `openpyxl <https://pypi.org/project/openpyxl/>`_ is installed,
14781485
then ``openpyxl`` will be used.
14791486
- Otherwise if ``xlrd >= 2.0`` is installed, a ``ValueError`` will be raised.
14801487
1481-
.. warning::
1488+
.. warning::
14821489
1483-
Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.
1484-
This is not supported, switch to using ``openpyxl`` instead.
1490+
Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.
1491+
This is not supported, switch to using ``openpyxl`` instead.
1492+
{storage_options}
14851493
engine_kwargs : dict, optional
14861494
Arbitrary keyword arguments passed to excel engine.
14871495
1496+
See Also
1497+
--------
1498+
DataFrame.to_excel : Write DataFrame to an Excel file.
1499+
DataFrame.to_csv : Write DataFrame to a comma-separated values (csv) file.
1500+
read_csv : Read a comma-separated values (csv) file into DataFrame.
1501+
read_fwf : Read a table of fixed-width formatted lines into DataFrame.
1502+
14881503
Examples
14891504
--------
14901505
>>> file = pd.ExcelFile("myfile.xlsx") # doctest: +SKIP
@@ -1595,11 +1610,134 @@ def parse(
15951610
Equivalent to read_excel(ExcelFile, ...) See the read_excel
15961611
docstring for more info on accepted parameters.
15971612
1613+
Parameters
1614+
----------
1615+
sheet_name : str, int, list, or None, default 0
1616+
Strings are used for sheet names. Integers are used in zero-indexed
1617+
sheet positions (chart sheets do not count as a sheet position).
1618+
Lists of strings/integers are used to request multiple sheets.
1619+
Specify ``None`` to get all worksheets.
1620+
header : int, list of int, default 0
1621+
Row (0-indexed) to use for the column labels of the parsed
1622+
DataFrame. If a list of integers is passed those row positions will
1623+
be combined into a ``MultiIndex``. Use None if there is no header.
1624+
names : array-like, default None
1625+
List of column names to use. If file contains no header row,
1626+
then you should explicitly pass header=None.
1627+
index_col : int, str, list of int, default None
1628+
Column (0-indexed) to use as the row labels of the DataFrame.
1629+
Pass None if there is no such column. If a list is passed,
1630+
those columns will be combined into a ``MultiIndex``. If a
1631+
subset of data is selected with ``usecols``, index_col
1632+
is based on the subset.
1633+
1634+
Missing values will be forward filled to allow roundtripping with
1635+
``to_excel`` for ``merged_cells=True``. To avoid forward filling the
1636+
missing values use ``set_index`` after reading the data instead of
1637+
``index_col``.
1638+
usecols : str, list-like, or callable, default None
1639+
* If None, then parse all columns.
1640+
* If str, then indicates comma separated list of Excel column letters
1641+
and column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of
1642+
both sides.
1643+
* If list of int, then indicates list of column numbers to be parsed
1644+
(0-indexed).
1645+
* If list of string, then indicates list of column names to be parsed.
1646+
* If callable, then evaluate each column name against it and parse the
1647+
column if the callable returns ``True``.
1648+
1649+
Returns a subset of the columns according to behavior above.
1650+
converters : dict, default None
1651+
Dict of functions for converting values in certain columns. Keys can
1652+
either be integers or column labels, values are functions that take one
1653+
input argument, the Excel cell content, and return the transformed
1654+
content.
1655+
true_values : list, default None
1656+
Values to consider as True.
1657+
false_values : list, default None
1658+
Values to consider as False.
1659+
skiprows : list-like, int, or callable, optional
1660+
Line numbers to skip (0-indexed) or number of lines to skip (int) at the
1661+
start of the file. If callable, the callable function will be evaluated
1662+
against the row indices, returning True if the row should be skipped and
1663+
False otherwise. An example of a valid callable argument would be ``lambda
1664+
x: x in [0, 2]``.
1665+
nrows : int, default None
1666+
Number of rows to parse.
1667+
na_values : scalar, str, list-like, or dict, default None
1668+
Additional strings to recognize as NA/NaN. If dict passed, specific
1669+
per-column NA values.
1670+
parse_dates : bool, list-like, or dict, default False
1671+
The behavior is as follows:
1672+
1673+
* ``bool``. If True -> try parsing the index.
1674+
* ``list`` of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3
1675+
each as a separate date column.
1676+
* ``list`` of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and
1677+
parse as a single date column.
1678+
* ``dict``, e.g. {{'foo' : [1, 3]}} -> parse columns 1, 3 as date and call
1679+
result 'foo'
1680+
1681+
If a column or index contains an unparsable date, the entire column or
1682+
index will be returned unaltered as an object data type. If you
1683+
don`t want to parse some cells as date just change their type
1684+
in Excel to "Text".For non-standard datetime parsing, use
1685+
``pd.to_datetime`` after ``pd.read_excel``.
1686+
1687+
Note: A fast-path exists for iso8601-formatted dates.
1688+
date_parser : function, optional
1689+
Function to use for converting a sequence of string columns to an array of
1690+
datetime instances. The default uses ``dateutil.parser.parser`` to do the
1691+
conversion. Pandas will try to call `date_parser` in three different ways,
1692+
advancing to the next if an exception occurs: 1) Pass one or more arrays
1693+
(as defined by `parse_dates`) as arguments; 2) concatenate (row-wise) the
1694+
string values from the columns defined by `parse_dates` into a single array
1695+
and pass that; and 3) call `date_parser` once for each row using one or
1696+
more strings (corresponding to the columns defined by `parse_dates`) as
1697+
arguments.
1698+
1699+
.. deprecated:: 2.0.0
1700+
Use ``date_format`` instead, or read in as ``object`` and then apply
1701+
:func:`to_datetime` as-needed.
1702+
date_format : str or dict of column -> format, default ``None``
1703+
If used in conjunction with ``parse_dates``, will parse dates
1704+
according to this format. For anything more complex,
1705+
please read in as ``object`` and then apply :func:`to_datetime` as-needed.
1706+
thousands : str, default None
1707+
Thousands separator for parsing string columns to numeric. Note that
1708+
this parameter is only necessary for columns stored as TEXT in Excel,
1709+
any numeric columns will automatically be parsed, regardless of display
1710+
format.
1711+
comment : str, default None
1712+
Comments out remainder of line. Pass a character or characters to this
1713+
argument to indicate comments in the input file. Any data between the
1714+
comment string and the end of the current line is ignored.
1715+
skipfooter : int, default 0
1716+
Rows at the end to skip (0-indexed).
1717+
dtype_backend : {{'numpy_nullable', 'pyarrow'}}, default 'numpy_nullable'
1718+
Back-end data type applied to the resultant :class:`DataFrame`
1719+
(still experimental). Behaviour is as follows:
1720+
1721+
* ``"numpy_nullable"``: returns nullable-dtype-backed :class:`DataFrame`
1722+
(default).
1723+
* ``"pyarrow"``: returns pyarrow-backed nullable :class:`ArrowDtype`
1724+
DataFrame.
1725+
1726+
.. versionadded:: 2.0
1727+
**kwds : dict, optional
1728+
Arbitrary keyword arguments passed to excel engine.
1729+
15981730
Returns
15991731
-------
16001732
DataFrame or dict of DataFrames
16011733
DataFrame from the passed in Excel file.
16021734
1735+
See Also
1736+
--------
1737+
read_excel : Read an Excel sheet values (xlsx) file into DataFrame.
1738+
read_csv : Read a comma-separated values (csv) file into DataFrame.
1739+
read_fwf : Read a table of fixed-width formatted lines into DataFrame.
1740+
16031741
Examples
16041742
--------
16051743
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])

0 commit comments

Comments
 (0)