Skip to content

Commit c4f2be6

Browse files
committed
Merge pull request #9450 from jnmclarty/multixlsheet
ENH Read mutiple excel sheets in single API call
2 parents 1050adb + d8a2893 commit c4f2be6

File tree

5 files changed

+300
-112
lines changed

5 files changed

+300
-112
lines changed

doc/source/io.rst

+86-32
Original file line numberDiff line numberDiff line change
@@ -1949,56 +1949,106 @@ module and use the same parsing code as the above to convert tabular data into
19491949
a DataFrame. See the :ref:`cookbook<cookbook.excel>` for some
19501950
advanced strategies
19511951

1952-
Besides ``read_excel`` you can also read Excel files using the ``ExcelFile``
1953-
class. The following two commands are equivalent:
1952+
Reading Excel Files
1953+
~~~~~~~~~~~~~~~~~~~
1954+
1955+
.. versionadded:: 0.16
1956+
1957+
``read_excel`` can read more than one sheet, by setting ``sheetname`` to either
1958+
a list of sheet names, a list of sheet positions, or ``None`` to read all sheets.
1959+
1960+
.. versionadded:: 0.13
1961+
1962+
Sheets can be specified by sheet index or sheet name, using an integer or string,
1963+
respectively.
1964+
1965+
.. versionadded:: 0.12
1966+
1967+
``ExcelFile`` has been moved to the top level namespace.
1968+
1969+
There are two approaches to reading an excel file. The ``read_excel`` function
1970+
and the ``ExcelFile`` class. ``read_excel`` is for reading one file
1971+
with file-specific arguments (ie. identical data formats across sheets).
1972+
``ExcelFile`` is for reading one file with sheet-specific arguments (ie. various data
1973+
formats across sheets). Choosing the approach is largely a question of
1974+
code readability and execution speed.
1975+
1976+
Equivalent class and function approaches to read a single sheet:
19541977

19551978
.. code-block:: python
19561979
19571980
# using the ExcelFile class
19581981
xls = pd.ExcelFile('path_to_file.xls')
1959-
xls.parse('Sheet1', index_col=None, na_values=['NA'])
1982+
data = xls.parse('Sheet1', index_col=None, na_values=['NA'])
19601983
19611984
# using the read_excel function
1962-
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
1985+
data = read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
19631986
1964-
The class based approach can be used to read multiple sheets or to introspect
1965-
the sheet names using the ``sheet_names`` attribute.
1987+
Equivalent class and function approaches to read multiple sheets:
19661988

1967-
.. note::
1989+
.. code-block:: python
19681990
1969-
The prior method of accessing ``ExcelFile`` has been moved from
1970-
``pandas.io.parsers`` to the top level namespace starting from pandas
1971-
0.12.0.
1991+
data = {}
1992+
# For when Sheet1's format differs from Sheet2
1993+
xls = pd.ExcelFile('path_to_file.xls')
1994+
data['Sheet1'] = xls.parse('Sheet1', index_col=None, na_values=['NA'])
1995+
data['Sheet2'] = xls.parse('Sheet2', index_col=1)
1996+
1997+
# For when Sheet1's format is identical to Sheet2
1998+
data = read_excel('path_to_file.xls', ['Sheet1','Sheet2'], index_col=None, na_values=['NA'])
1999+
2000+
Specifying Sheets
2001+
+++++++++++++++++
2002+
.. _io.specifying_sheets:
19722003

1973-
.. versionadded:: 0.13
2004+
.. note :: The second argument is ``sheetname``, not to be confused with ``ExcelFile.sheet_names``
19742005
1975-
There are now two ways to read in sheets from an Excel file. You can provide
1976-
either the index of a sheet or its name to by passing different values for
1977-
``sheet_name``.
2006+
.. note :: An ExcelFile's attribute ``sheet_names`` provides access to a list of sheets.
19782007
2008+
- The arguments ``sheetname`` allows specifying the sheet or sheets to read.
2009+
- The default value for ``sheetname`` is 0, indicating to read the first sheet
19792010
- Pass a string to refer to the name of a particular sheet in the workbook.
19802011
- Pass an integer to refer to the index of a sheet. Indices follow Python
19812012
convention, beginning at 0.
1982-
- The default value is ``sheet_name=0``. This reads the first sheet.
1983-
1984-
Using the sheet name:
2013+
- Pass a list of either strings or integers, to return a dictionary of specified sheets.
2014+
- Pass a ``None`` to return a dictionary of all available sheets.
19852015

19862016
.. code-block:: python
19872017
2018+
# Returns a DataFrame
19882019
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
19892020
19902021
Using the sheet index:
19912022

19922023
.. code-block:: python
19932024
1994-
read_excel('path_to_file.xls', 0, index_col=None, na_values=['NA'])
2025+
# Returns a DataFrame
2026+
read_excel('path_to_file.xls', 0, index_col=None, na_values=['NA'])
19952027
19962028
Using all default values:
19972029

19982030
.. code-block:: python
19992031
2032+
# Returns a DataFrame
20002033
read_excel('path_to_file.xls')
20012034
2035+
Using None to get all sheets:
2036+
2037+
.. code-block:: python
2038+
2039+
# Returns a dictionary of DataFrames
2040+
read_excel('path_to_file.xls',sheetname=None)
2041+
2042+
Using a list to get multiple sheets:
2043+
2044+
.. code-block:: python
2045+
2046+
# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
2047+
read_excel('path_to_file.xls',sheetname=['Sheet1',3])
2048+
2049+
Parsing Specific Columns
2050+
++++++++++++++++++++++++
2051+
20022052
It is often the case that users will insert columns to do temporary computations
20032053
in Excel and you may not want to read in those columns. `read_excel` takes
20042054
a `parse_cols` keyword to allow you to specify a subset of columns to parse.
@@ -2017,26 +2067,30 @@ indices to be parsed.
20172067
20182068
read_excel('path_to_file.xls', 'Sheet1', parse_cols=[0, 2, 3])
20192069
2020-
.. note::
2070+
Cell Converters
2071+
+++++++++++++++
20212072

2022-
It is possible to transform the contents of Excel cells via the `converters`
2023-
option. For instance, to convert a column to boolean:
2073+
It is possible to transform the contents of Excel cells via the `converters`
2074+
option. For instance, to convert a column to boolean:
20242075

2025-
.. code-block:: python
2076+
.. code-block:: python
20262077
2027-
read_excel('path_to_file.xls', 'Sheet1', converters={'MyBools': bool})
2078+
read_excel('path_to_file.xls', 'Sheet1', converters={'MyBools': bool})
20282079
2029-
This options handles missing values and treats exceptions in the converters
2030-
as missing data. Transformations are applied cell by cell rather than to the
2031-
column as a whole, so the array dtype is not guaranteed. For instance, a
2032-
column of integers with missing values cannot be transformed to an array
2033-
with integer dtype, because NaN is strictly a float. You can manually mask
2034-
missing data to recover integer dtype:
2080+
This options handles missing values and treats exceptions in the converters
2081+
as missing data. Transformations are applied cell by cell rather than to the
2082+
column as a whole, so the array dtype is not guaranteed. For instance, a
2083+
column of integers with missing values cannot be transformed to an array
2084+
with integer dtype, because NaN is strictly a float. You can manually mask
2085+
missing data to recover integer dtype:
20352086

2036-
.. code-block:: python
2087+
.. code-block:: python
20372088
2038-
cfun = lambda x: int(x) if x else -1
2039-
read_excel('path_to_file.xls', 'Sheet1', converters={'MyInts': cfun})
2089+
cfun = lambda x: int(x) if x else -1
2090+
read_excel('path_to_file.xls', 'Sheet1', converters={'MyInts': cfun})
2091+
2092+
Writing Excel Files
2093+
~~~~~~~~~~~~~~~~~~~
20402094

20412095
To write a DataFrame object to a sheet of an Excel file, you can use the
20422096
``to_excel`` instance method. The arguments are largely the same as ``to_csv``

doc/source/whatsnew/v0.16.0.txt

+8
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,14 @@ Enhancements
190190
- Added ``StringMethods.find()`` and ``rfind()`` which behave as the same as standard ``str`` (:issue:`9386`)
191191

192192
- Added ``StringMethods.isnumeric`` and ``isdecimal`` which behave as the same as standard ``str`` (:issue:`9439`)
193+
- The ``read_excel()`` function's :ref:`sheetname <_io.specifying_sheets>` argument now accepts a list and ``None``, to get multiple or all sheets respectively. If more than one sheet is specified, a dictionary is returned. (:issue:`9450`)
194+
195+
.. code-block:: python
196+
197+
# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
198+
pd.read_excel('path_to_file.xls',sheetname=['Sheet1',3])
199+
200+
- A ``verbose`` argument has been augmented in ``io.read_excel()``, defaults to False. Set to True to print sheet names as they are parsed. (:issue:`9450`)
193201
- Added ``StringMethods.ljust()`` and ``rjust()`` which behave as the same as standard ``str`` (:issue:`9352`)
194202
- ``StringMethods.pad()`` and ``center()`` now accept ``fillchar`` option to specify filling character (:issue:`9352`)
195203
- Added ``StringMethods.zfill()`` which behave as the same as standard ``str`` (:issue:`9387`)

0 commit comments

Comments
 (0)