Skip to content

Commit 004b4c5

Browse files
BUG: error in read_excel with some ods files pandas-dev#45598 (pandas-dev#46050)
* BUG: error in read_excel with some ods files pandas-dev#45598 * BUG: use hasattr instead of dir * DOC: add issue number in new test case * DOC: remove comment Co-authored-by: Dimitra Karadima <[email protected]>
1 parent e4162cd commit 004b4c5

File tree

4 files changed

+19
-2
lines changed

4 files changed

+19
-2
lines changed

doc/source/whatsnew/v1.5.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -406,6 +406,7 @@ I/O
406406
- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`)
407407
- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`)
408408
- Bug in Parquet roundtrip for Interval dtype with ``datetime64[ns]`` subtype (:issue:`45881`)
409+
- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements(:issue:`45598`)
409410

410411
Period
411412
^^^^^^

pandas/io/excel/_odfreader.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,11 @@ def get_sheet_data(
112112
table: list[list[Scalar | NaTType]] = []
113113

114114
for sheet_row in sheet_rows:
115-
sheet_cells = [x for x in sheet_row.childNodes if x.qname in cell_names]
115+
sheet_cells = [
116+
x
117+
for x in sheet_row.childNodes
118+
if hasattr(x, "qname") and x.qname in cell_names
119+
]
116120
empty_cells = 0
117121
table_row: list[Scalar | NaTType] = []
118122

@@ -243,5 +247,5 @@ def _get_cell_string_value(self, cell) -> str:
243247
# https://github.com/pandas-dev/pandas/pull/36175#discussion_r484639704
244248
value.append(self._get_cell_string_value(fragment))
245249
else:
246-
value.append(str(fragment))
250+
value.append(str(fragment).strip("\n"))
247251
return "".join(value)
2.21 KB
Binary file not shown.

pandas/tests/io/excel/test_odf.py

+12
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,15 @@ def test_read_writer_table():
3636
result = pd.read_excel("writertable.odt", sheet_name="Table1", index_col=0)
3737

3838
tm.assert_frame_equal(result, expected)
39+
40+
41+
def test_read_newlines_between_xml_elements_table():
42+
# GH#45598
43+
expected = pd.DataFrame(
44+
[[1.0, 4.0, 7], [np.nan, np.nan, 8], [3.0, 6.0, 9]],
45+
columns=["Column 1", "Column 2", "Column 3"],
46+
)
47+
48+
result = pd.read_excel("test_newlines.ods")
49+
50+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)