-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_excel for ods files raising UnboundLocalError in certain cases #36175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
3fcbc01
3738fd1
c4edbd2
bfd6069
9f77b56
29b8cda
334a528
21cc458
0e6b2e1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -197,22 +197,24 @@ def _get_cell_string_value(self, cell) -> str: | |
Find and decode OpenDocument text:s tags that represent | ||
a run length encoded sequence of space characters. | ||
""" | ||
from odf.element import Element, Text | ||
from odf.element import Element | ||
from odf.namespaces import TEXTNS | ||
from odf.text import P, S | ||
from odf.text import S | ||
|
||
text_p = P().qname | ||
text_s = S().qname | ||
|
||
p = cell.childNodes[0] | ||
|
||
value = [] | ||
if p.qname == text_p: | ||
for k, fragment in enumerate(p.childNodes): | ||
if isinstance(fragment, Text): | ||
value.append(fragment.data) | ||
elif isinstance(fragment, Element): | ||
if fragment.qname == text_s: | ||
spaces = int(fragment.attributes.get((TEXTNS, "c"), 1)) | ||
|
||
for fragment in cell.childNodes: | ||
if isinstance(fragment, Element): | ||
if fragment.qname == text_s: | ||
spaces = int(fragment.attributes.get((TEXTNS, "c"), 1)) | ||
value.append(" " * spaces) | ||
else: | ||
# recursive impl needed in case of nested fragments | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a reader it isn't clear to me what this means; is this a bug that needs to be fixed upstream? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. before #33233, the value was #33233 attempted to fix cases where multiple spaces was getting skipped in the output. That is because odfpy's str implementations for Nodes/Elements that are of that specific type ( #33233 also introduced the bug that is causing the current UnboundLocalError as one line is misaligned. But fixing the misalignment adds in more bugs. With just the indentation fix, the output will still be missing certain fragments from the cell (for the file in #36122). This was my original reason for doing a monkeypatch that patched Sorry for the wall of text. |
||
# with multiple spaces | ||
# https://github.com/pandas-dev/pandas/pull/36175#discussion_r484639704 | ||
value.append(self._get_cell_string_value(fragment)) | ||
else: | ||
value.append(str(fragment)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any reason to call str(fragment) instead of always just calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then we'd need to handle the |
||
return "".join(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double back ticks on UnboundLocalError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done