-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_excel not accepting encoding on 1.1.0 #35753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @staticdev for the report. This is documented in the release notes. from https://pandas.pydata.org/docs/whatsnew/v1.1.0.html
were the constructed DataFrames different with and without the encoding parameter in 1.0.5? updated: incorrect link xref #34464 |
There have been similar requests about an encoding parameter: #25523 and #23444. I haven't used with open(path_to_excel_file, mode="r", encoding="iso-8859-1") as file:
df = pd.read_excel(file, **other_kwargs) might be a temporary work around? I haven't tested whether that works. |
Yes, and I was using 1.0.5 before 1.1.0 working fine.
Another odd thing to add about this change is that I have another point in my code that hasn't broken and I also using kwargs and also encoding:
|
I also tried that, but created many side-effects to my code. I have a test with a file-fake that checks if it is an empty file, it should raise |
I just noticed that my suggestion doesn't make much sense. The excel file needs to be opened in binary mode but binary files don't take an encoding argument. |
Does someone know how to create an excel file with a specific encoding using either LibreOffice or gnumeric? There are no excel files needing a non-default encoding in pandas/tests/io/data/excel/. Need to have a test for the encoding argument. |
@twoertwein what I can do is send you one with this "iso-8859-1" encoding. Would it suffice? |
@staticdev thank you, that would probably be the best. Ideally, just one sheet with only one cell filled with a bunch of special characters :) |
Please rename it to xls, since github complains about extension. |
thank you! I seem to be able to open this Excel file in pandas without specifying an encoding. Can you confirm that you can open it as well? Let me know if you have a file that needs an encoding argument! Please try to create a minimal Excel file, ideally only one cell having content. |
@twoertwein you are correct. I tried removing |
Does it solve for #23444? |
I saw it in your PR.. so I think the most important question here is why the behavior changed. I didn't find a clear explanation from the https://pandas.pydata.org/docs/whatsnew/v1.1.0.html#deprecations. The only related thing is this one: Passing any arguments but the first two to read_excel() as positional arguments is deprecated. All other arguments should be given as keyword arguments (GH27573). When I click it, it is not even related to |
Do you mind trying the old pandas version to see whether you actually needed the encoding argument? imho, |
@twoertwein I can try, but I really doubt that one. I would never add this argument if it wasn't necessary and the code for this component was 100% written by me. I remember encoding was one of the first problems I had to solve. |
So we assume encoded is not needed anymore? I am ok with that if other people are not facing issues with that. |
@twoertwein In my case, the files ended up being successfully read. |
moved off 1.1.4 milestone (scheduled for release tomorrow) |
was encoding ever part of the documented api. we could either close this issue or update read_excel to accept the encoding parameter with a FutureWarning. |
At least since excel 97 text in excel documents has a well defined encoding. https://xlrd.readthedocs.io/en/latest/unicode.html?highlight=encoding_override#handling-of-unicode I don't think it is worth supporting Excel 95 and earlier file formats at this point |
Closing as a non-issue. Providing encoding as a keyword argument is unnecessary and only "worked" before because we silently accepted kwargs and discard it; it never had any functionality and should be removed from code |
I find I need to use the encoding for keeping the UTF-8 encoding on my files when other people on different platforms open them. I was having issues with the formatting of special characters on Windows machines but not on Macs due to the file changing encodings. Forcing the UTF-8 encodings on opening and saving documents meant my files work on all platforms. Without this encoding working I could run into problems again. Is there another workaround if this is being removed from the pandas code?? |
I have language accents in my excel, when reading from csv I can pass encoding='utf-8'. Why would Pandas remove this ability from reading excel files? This is a blocker for me now, all of my accented words are getting garbled when read in from excel. The after my wrangling I am writing to csv and get UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f3c0' in position 69: character maps to Note my excel is a .xlsx export from Google sheets. I am disappointed nobody responded to Matthew over 2 months ago, what hope is there for me? Since this is closed should I open a new issue? I will wait a couple of hours. |
As far as I remember: It was removed because it was never actually used internally. You were able to specify it but it didn't make a difference. Do you have an excel file for which you needed the encoding argument on 1.0.x (and you verfiy that it didn't work without the argument)? Please create a new issue for the CSV problem you are encountering. [If you have this issue with 1.2.0, please upgrade first to 1.2.1 as your problem might have been fixed - sound similar to #38989] |
Thanks for the reply! I found https://github.com/python/cpython but could not figure out how to open an issue against core python |
If you believe there is a bug with the python stdlib modules - the bug tracker is at https://bugs.python.org/ |
Still no work around for this issue. Means if I switch from Mac to a PC my file is unable to run due to the 'charmap' code issue even with 1.2.3 How has this not been fixed yet?? |
@matthewahillman before this issue can be fixed, it needs to be re-produced. Do you have an excel file that you can't open without the |
Hello all, I'm new to Pandas and I can't solve these errors I'm getting. Any assistance would be appreciated. Traceback (most recent call last): C:\Users\HA_Report_Generation\Report Generator Legacy>python3 report_generator_1_5_3.py Traceback (most recent call last): |
I don't think your error is related to encoding. Please have a look at the possible arguments for |
Turns out instead of 'index=False' I needed to change it to 'index_cols'=False. That seemed to work. Many thanks! |
Hi all, |
The encoding argument is very important. I want to come back "encoding=". Excel application windows edition's default encoding ShiftJIS(cp932). The export file(.xlsx) causes character corruption in dealing with pandas in MacOS because the os default encoding UTF8. |
The workaround suggested here seems to work.
|
I am not sure whether the developers see these comments since this issue is already closed. The encoding is very important for the program to interpret special characters. So, I would be pleased if there would be a solution to this problem (either the encoding or any other alternatives should be fine for me) |
encoding parameter is needed !!! |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
TypeError: read_excel() got an unexpected keyword argument 'encoding'.
Expected Output
No error, as in previous versions.
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: