Skip to content

Commit fe52d9f

Browse files
authored
BUG: Fix read_excel w/parse_cols & empty dataset (#23661)
Closes gh-9208.
1 parent 454ecfc commit fe52d9f

File tree

6 files changed

+24
-11
lines changed

6 files changed

+24
-11
lines changed

doc/source/whatsnew/v0.24.0.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -1264,9 +1264,6 @@ MultiIndex
12641264
I/O
12651265
^^^
12661266

1267-
- Bug in :meth:`to_sql` when writing timezone aware data (``datetime64[ns, tz]`` dtype) would raise a ``TypeError`` (:issue:`9086`)
1268-
- Bug in :meth:`to_sql` where a naive DatetimeIndex would be written as ``TIMESTAMP WITH TIMEZONE`` type in supported databases, e.g. PostgreSQL (:issue:`23510`)
1269-
12701267
.. _whatsnew_0240.bug_fixes.nan_with_str_dtype:
12711268

12721269
Proper handling of `np.NaN` in a string data-typed column with the Python engine
@@ -1302,6 +1299,9 @@ Current Behavior:
13021299

13031300
Notice how we now instead output ``np.nan`` itself instead of a stringified form of it.
13041301

1302+
- Bug in :meth:`to_sql` when writing timezone aware data (``datetime64[ns, tz]`` dtype) would raise a ``TypeError`` (:issue:`9086`)
1303+
- Bug in :meth:`to_sql` where a naive DatetimeIndex would be written as ``TIMESTAMP WITH TIMEZONE`` type in supported databases, e.g. PostgreSQL (:issue:`23510`)
1304+
- Bug in :meth:`read_excel()` when ``parse_cols`` is specified with an empty dataset (:issue:`9208`)
13051305
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
13061306
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
13071307
- :func:`read_csv()` and func:`read_table()` will throw ``UnicodeError`` and not coredump on badly encoded strings (:issue:`22748`)

pandas/io/excel.py

+11-8
Original file line numberDiff line numberDiff line change
@@ -634,14 +634,17 @@ def _parse_cell(cell_contents, cell_typ):
634634
else:
635635
offset = 1 + max(header)
636636

637-
for col in index_col:
638-
last = data[offset][col]
639-
640-
for row in range(offset + 1, len(data)):
641-
if data[row][col] == '' or data[row][col] is None:
642-
data[row][col] = last
643-
else:
644-
last = data[row][col]
637+
# Check if we have an empty dataset
638+
# before trying to collect data.
639+
if offset < len(data):
640+
for col in index_col:
641+
last = data[offset][col]
642+
643+
for row in range(offset + 1, len(data)):
644+
if data[row][col] == '' or data[row][col] is None:
645+
data[row][col] = last
646+
else:
647+
last = data[row][col]
645648

646649
has_index_names = is_list_like(header) and len(header) > 1
647650

pandas/tests/io/data/test1.xls

-2.5 KB
Binary file not shown.

pandas/tests/io/data/test1.xlsm

-31.2 KB
Binary file not shown.

pandas/tests/io/data/test1.xlsx

-31.2 KB
Binary file not shown.

pandas/tests/io/test_excel.py

+10
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,16 @@ def test_index_col_label_error(self, ext):
235235
self.get_exceldf("test1", ext, "Sheet1", index_col=["A"],
236236
usecols=["A", "C"])
237237

238+
def test_index_col_empty(self, ext):
239+
# see gh-9208
240+
result = self.get_exceldf("test1", ext, "Sheet3",
241+
index_col=["A", "B", "C"])
242+
expected = DataFrame(columns=["D", "E", "F"],
243+
index=MultiIndex(levels=[[]] * 3,
244+
labels=[[]] * 3,
245+
names=["A", "B", "C"]))
246+
tm.assert_frame_equal(result, expected)
247+
238248
def test_usecols_pass_non_existent_column(self, ext):
239249
msg = ("Usecols do not match columns, "
240250
"columns expected but not found: " + r"\['E'\]")

0 commit comments

Comments
 (0)