@@ -1949,56 +1949,106 @@ module and use the same parsing code as the above to convert tabular data into
1949
1949
a DataFrame. See the :ref: `cookbook<cookbook.excel> ` for some
1950
1950
advanced strategies
1951
1951
1952
- Besides ``read_excel `` you can also read Excel files using the ``ExcelFile ``
1953
- class. The following two commands are equivalent:
1952
+ Reading Excel Files
1953
+ ~~~~~~~~~~~~~~~~~~~
1954
+
1955
+ .. versionadded :: 0.16
1956
+
1957
+ ``read_excel `` can read more than one sheet, by setting ``sheetname `` to either
1958
+ a list of sheet names, a list of sheet positions, or ``None `` to read all sheets.
1959
+
1960
+ .. versionadded :: 0.13
1961
+
1962
+ Sheets can be specified by sheet index or sheet name, using an integer or string,
1963
+ respectively.
1964
+
1965
+ .. versionadded :: 0.12
1966
+
1967
+ ``ExcelFile `` has been moved to the top level namespace.
1968
+
1969
+ There are two approaches to reading an excel file. The ``read_excel `` function
1970
+ and the ``ExcelFile `` class. ``read_excel `` is for reading one file
1971
+ with file-specific arguments (ie. identical data formats across sheets).
1972
+ ``ExcelFile `` is for reading one file with sheet-specific arguments (ie. various data
1973
+ formats across sheets). Choosing the approach is largely a question of
1974
+ code readability and execution speed.
1975
+
1976
+ Equivalent class and function approaches to read a single sheet:
1954
1977
1955
1978
.. code-block :: python
1956
1979
1957
1980
# using the ExcelFile class
1958
1981
xls = pd.ExcelFile(' path_to_file.xls' )
1959
- xls.parse(' Sheet1' , index_col = None , na_values = [' NA' ])
1982
+ data = xls.parse(' Sheet1' , index_col = None , na_values = [' NA' ])
1960
1983
1961
1984
# using the read_excel function
1962
- read_excel(' path_to_file.xls' , ' Sheet1' , index_col = None , na_values = [' NA' ])
1985
+ data = read_excel(' path_to_file.xls' , ' Sheet1' , index_col = None , na_values = [' NA' ])
1963
1986
1964
- The class based approach can be used to read multiple sheets or to introspect
1965
- the sheet names using the ``sheet_names `` attribute.
1987
+ Equivalent class and function approaches to read multiple sheets:
1966
1988
1967
- .. note ::
1989
+ .. code-block :: python
1968
1990
1969
- The prior method of accessing ``ExcelFile `` has been moved from
1970
- ``pandas.io.parsers `` to the top level namespace starting from pandas
1971
- 0.12.0.
1991
+ data = {}
1992
+ # For when Sheet1's format differs from Sheet2
1993
+ xls = pd.ExcelFile(' path_to_file.xls' )
1994
+ data[' Sheet1' ] = xls.parse(' Sheet1' , index_col = None , na_values = [' NA' ])
1995
+ data[' Sheet2' ] = xls.parse(' Sheet2' , index_col = 1 )
1996
+
1997
+ # For when Sheet1's format is identical to Sheet2
1998
+ data = read_excel(' path_to_file.xls' , [' Sheet1' ,' Sheet2' ], index_col = None , na_values = [' NA' ])
1999
+
2000
+ Specifying Sheets
2001
+ +++++++++++++++++
2002
+ .. _io.specifying_sheets :
1972
2003
1973
- .. versionadded :: 0.13
2004
+ .. note :: The second argument is ``sheetname``, not to be confused with ``ExcelFile.sheet_names``
1974
2005
1975
- There are now two ways to read in sheets from an Excel file. You can provide
1976
- either the index of a sheet or its name to by passing different values for
1977
- ``sheet_name ``.
2006
+ .. note :: An ExcelFile's attribute ``sheet_names`` provides access to a list of sheets.
1978
2007
2008
+ - The arguments ``sheetname `` allows specifying the sheet or sheets to read.
2009
+ - The default value for ``sheetname `` is 0, indicating to read the first sheet
1979
2010
- Pass a string to refer to the name of a particular sheet in the workbook.
1980
2011
- Pass an integer to refer to the index of a sheet. Indices follow Python
1981
2012
convention, beginning at 0.
1982
- - The default value is ``sheet_name=0 ``. This reads the first sheet.
1983
-
1984
- Using the sheet name:
2013
+ - Pass a list of either strings or integers, to return a dictionary of specified sheets.
2014
+ - Pass a ``None `` to return a dictionary of all available sheets.
1985
2015
1986
2016
.. code-block :: python
1987
2017
2018
+ # Returns a DataFrame
1988
2019
read_excel(' path_to_file.xls' , ' Sheet1' , index_col = None , na_values = [' NA' ])
1989
2020
1990
2021
Using the sheet index:
1991
2022
1992
2023
.. code-block :: python
1993
2024
1994
- read_excel(' path_to_file.xls' , 0 , index_col = None , na_values = [' NA' ])
2025
+ # Returns a DataFrame
2026
+ read_excel(' path_to_file.xls' , 0 , index_col = None , na_values = [' NA' ])
1995
2027
1996
2028
Using all default values:
1997
2029
1998
2030
.. code-block :: python
1999
2031
2032
+ # Returns a DataFrame
2000
2033
read_excel(' path_to_file.xls' )
2001
2034
2035
+ Using None to get all sheets:
2036
+
2037
+ .. code-block :: python
2038
+
2039
+ # Returns a dictionary of DataFrames
2040
+ read_excel(' path_to_file.xls' ,sheetname = None )
2041
+
2042
+ Using a list to get multiple sheets:
2043
+
2044
+ .. code-block :: python
2045
+
2046
+ # Returns the 1st and 4th sheet, as a dictionary of DataFrames.
2047
+ read_excel(' path_to_file.xls' ,sheetname = [' Sheet1' ,3 ])
2048
+
2049
+ Parsing Specific Columns
2050
+ ++++++++++++++++++++++++
2051
+
2002
2052
It is often the case that users will insert columns to do temporary computations
2003
2053
in Excel and you may not want to read in those columns. `read_excel ` takes
2004
2054
a `parse_cols ` keyword to allow you to specify a subset of columns to parse.
@@ -2017,26 +2067,30 @@ indices to be parsed.
2017
2067
2018
2068
read_excel(' path_to_file.xls' , ' Sheet1' , parse_cols = [0 , 2 , 3 ])
2019
2069
2020
- .. note ::
2070
+ Cell Converters
2071
+ +++++++++++++++
2021
2072
2022
- It is possible to transform the contents of Excel cells via the `converters `
2023
- option. For instance, to convert a column to boolean:
2073
+ It is possible to transform the contents of Excel cells via the `converters `
2074
+ option. For instance, to convert a column to boolean:
2024
2075
2025
- .. code-block :: python
2076
+ .. code-block :: python
2026
2077
2027
- read_excel(' path_to_file.xls' , ' Sheet1' , converters = {' MyBools' : bool })
2078
+ read_excel(' path_to_file.xls' , ' Sheet1' , converters = {' MyBools' : bool })
2028
2079
2029
- This options handles missing values and treats exceptions in the converters
2030
- as missing data. Transformations are applied cell by cell rather than to the
2031
- column as a whole, so the array dtype is not guaranteed. For instance, a
2032
- column of integers with missing values cannot be transformed to an array
2033
- with integer dtype, because NaN is strictly a float. You can manually mask
2034
- missing data to recover integer dtype:
2080
+ This options handles missing values and treats exceptions in the converters
2081
+ as missing data. Transformations are applied cell by cell rather than to the
2082
+ column as a whole, so the array dtype is not guaranteed. For instance, a
2083
+ column of integers with missing values cannot be transformed to an array
2084
+ with integer dtype, because NaN is strictly a float. You can manually mask
2085
+ missing data to recover integer dtype:
2035
2086
2036
- .. code-block :: python
2087
+ .. code-block :: python
2037
2088
2038
- cfun = lambda x : int (x) if x else - 1
2039
- read_excel(' path_to_file.xls' , ' Sheet1' , converters = {' MyInts' : cfun})
2089
+ cfun = lambda x : int (x) if x else - 1
2090
+ read_excel(' path_to_file.xls' , ' Sheet1' , converters = {' MyInts' : cfun})
2091
+
2092
+ Writing Excel Files
2093
+ ~~~~~~~~~~~~~~~~~~~
2040
2094
2041
2095
To write a DataFrame object to a sheet of an Excel file, you can use the
2042
2096
``to_excel `` instance method. The arguments are largely the same as ``to_csv ``
0 commit comments