Skip to content

Commit 445d94a

Browse files
committed
BUG: read_excel return empty dataframe when using usecols
- [x] closes #18273 - [x] tests added / passed - [x] passes git diff master --name-only -- "*.py" | grep "pandas/" | xargs -r flake8 - [x] whatsnew entry As mentioned read_excel returns an empty DataFrame when usecols argument is a list of strings. Now lists of strings are correctly interpreted by read_excel function.
1 parent 0c9192d commit 445d94a

File tree

3 files changed

+32
-2
lines changed

3 files changed

+32
-2
lines changed

doc/source/whatsnew/v0.23.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -979,6 +979,7 @@ I/O
979979
- Bug in :func:`DataFrame.to_latex()` where pairs of braces meant to serve as invisible placeholders were escaped (:issue:`18667`)
980980
- Bug in :func:`read_json` where large numeric values were causing an ``OverflowError`` (:issue:`18842`)
981981
- Bug in :func:`DataFrame.to_parquet` where an exception was raised if the write destination is S3 (:issue:`19134`)
982+
- Bug in :func:`read_excel` where `usecols` named argument as a list of strings were returning a empty DataFrame (:issue:`18273`)
982983
- :class:`Interval` now supported in :func:`DataFrame.to_excel` for all Excel file types (:issue:`19242`)
983984
- :class:`Timedelta` now supported in :func:`DataFrame.to_excel` for all Excel file types (:issue:`19242`, :issue:`9155`, :issue:`19900`)
984985
- Bug in :meth:`pandas.io.stata.StataReader.value_labels` raising an ``AttributeError`` when called on very old files. Now returns an empty dict (:issue:`19417`)

pandas/io/excel.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,11 @@
9696
* If int then indicates last column to be parsed
9797
* If list of ints then indicates list of column numbers to be parsed
9898
* If string then indicates comma separated list of Excel column letters and
99-
column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of
100-
both sides.
99+
column ranges (e.g. "A:E" or "A,C,E:F") to be parsed. Ranges are inclusive
100+
of both sides.
101+
* If list of strings each string shall be a Excel column letter or column
102+
range (e.g. "A:E" or "A,C,E:F") to be parsed. Ranges are inclusive of both
103+
sides.
101104
squeeze : boolean, default False
102105
If the parsed data only contains one column then return a Series
103106
dtype : Type name or dict of column -> type, default None

pandas/tests/io/test_excel.py

+26
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,32 @@ def test_usecols_str(self, ext):
179179
tm.assert_frame_equal(df2, df1, check_names=False)
180180
tm.assert_frame_equal(df3, df1, check_names=False)
181181

182+
@pytest.mark.parametrize("columns,usecols,parse_cols", [
183+
(['A', 'B', 'C'], ['A:D'], ['A:D']),
184+
(['B', 'C'], ['A', 'C', 'D'], ['A', 'C', 'D']),
185+
(['B', 'C'], ['A', 'C:D'], ['A', 'C:D'])
186+
])
187+
# GH18273 - read_excel return empty dataframe when using usecols as a list
188+
# of strings
189+
def test_usecols_str_list(self, ext, columns, usecols, parse_cols):
190+
191+
dfref = self.get_csv_refdf('test1')
192+
193+
df1 = dfref.reindex(columns=columns)
194+
df2 = self.get_exceldf('test1', ext, 'Sheet1', index_col=0,
195+
usecols=usecols)
196+
df3 = self.get_exceldf('test1', ext, 'Sheet2', skiprows=[1],
197+
index_col=0, usecols=usecols)
198+
199+
with tm.assert_produces_warning(FutureWarning):
200+
df4 = self.get_exceldf('test1', ext, 'Sheet2', skiprows=[1],
201+
index_col=0, parse_cols=parse_cols)
202+
203+
# TODO add index to xls, read xls ignores index name ?
204+
tm.assert_frame_equal(df2, df1, check_names=False)
205+
tm.assert_frame_equal(df3, df1, check_names=False)
206+
tm.assert_frame_equal(df4, df1, check_names=False)
207+
182208
def test_excel_stop_iterator(self, ext):
183209

184210
parsed = self.get_exceldf('test2', ext, 'Sheet1')

0 commit comments

Comments
 (0)