-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Was trying to read an ods file and ran into UnboundLocalError in odfreader.py #35802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Your proposed fix seems reasonable - want to push a PR with a test case? |
I believe I can do that for you. Assuming I can sleuth out what exactly the file is crashing on would a test case similar to this test work? |
hard to say what the best test would be without a reproducible failing code sample. It maybe that a simple roundtrip test could suffice. |
find a minimal .ods file here that has one column with a header and one data-cell with content
I think, the fail is related to the type of the data-cell where I could not reproduce creating a cell of type Sorry for linking to the .ods file but I was not able to upload it here as .ods is not a supported file type |
I was working with some bad csv data and using libreoffice to deal with it so that could be how a span, or other element cause I think it might be a line-break, got in there. I got around the issue by returning the data to a csv file and continuing as normal. I'll be taking a look into implementing this test in the coming days now that my work is done. Judging by the spec there is quite a few elements that aren't being checked for. I'm not all that sure what sort of guarantees pandas makes for reading data so could I get some feedback on whether we should handle all these cases or just set something of a default for spaces so that it doesn't throw? |
Hi. New to coding, but not new to problem solving. Also have this problem as of this week when I updated my system from Kubuntu 18.04 LTS to 20.04 LTS. With that, the python version changed from 3.6 to 3.8. The pandas version pre- and post- update was version 1.1.x So, I will tell you my experience and what I have done to trouble-shoot in the hopes that this can help a fix, since the pandas read_excel / odf engine gives me exactly the same errors (line numbers and messages) as originally posted here as well as a different error if the read_excel code is within a try/except block. Adding to my problem is that I do the same thing in 2 different python3 scripts with the same read_excel code block and I get different results, so there is little consistency in error reproduction other than Pre-update, everything worked fine. Post-update, everything fell apart and nothing works now. Project is to graph financials and daily trading data for a number of companies. financial data for each company is hand-inserted into spreadsheets saved as .ods files using libreoffice 6. One program reads and graphs the various financial information I want to compare, and another graphs the daily trading close price and volume. Financial information is also used to create a graph of the various financials on each trading day to see how they change over time as the trading price varies. This is the code being used to read in the various sheets which are then combined into one large pandas dataframe which is used for all the graphs. I have only included the essential lines,
Of the 100+ securities being read and graphed, some are read in correctly, most are not. Error # 1: Traceback (most recent call last): I see 2 trends in the spreadsheets that cause problems.
It seems that mixed formatting within the cells causes this error. However, there are some spreadsheets without data (just templates containing column headers and formulas) that cause this error, but there are also some 'empty' spreadsheets that are read in properly. So, no real consistency that I can see. The other problem I encounter is if the spreadsheet being read in is within the Try / Except block where the file may have a spreadsheet with one of 2 names. In this case I get the following errors: Error # 2: Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): I do not see any similar trend in this spreadsheet. No mixed fonts, etc. the same code worked fine in Python 3.6 with the same / latest version of pandas (1.1.2). Or maybe it isn't since I get different pandas versions when I try to list/upgrade the module. I am using these versions of the modules being imported, according to 'pip3 freeze': numpy==1.17.4 If I try to install and upgrade pandas using pip, I get the following results: As with the original post, I can't attach an offending spreadsheet for inspection since GitHub does not support the .ods file type. |
@thisnamenotavailable this is fixed in pandas v1.1.3 if it still happens - I believe you can upload a zip that has the .ods file in it. |
Upgraded pandas to 1.1.3 and ran the program again.
No UnboundLocalError, so it seems to be working again.
thanks for the update.
…On Wed, Oct 7, 2020 at 7:21 AM Asish Mahapatra ***@***.***> wrote:
@thisnamenotavailable <https://github.com/thisnamenotavailable> this is
fixed in pandas v1.1.3
Can you check after upgrading?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#35802 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARBNTZNJG7XC5VOXCBLVC3TSJP3GXANCNFSM4QE5CSMQ>
.
|
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Sorry I don't have a minimal data example at this time.
Problem description
Was trying to test pandas reading a collection of ods files and ran into this error.
I took a look at the code in question and it seems like the line may be on the wrong indent level?
Expected Output
The usual dataframes 👍
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.8.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: