Skip to content

BUG: read_excel not accepting encoding on 1.1.0 #35753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
staticdev opened this issue Aug 16, 2020 · 45 comments
Closed
2 of 3 tasks

BUG: read_excel not accepting encoding on 1.1.0 #35753

staticdev opened this issue Aug 16, 2020 · 45 comments
Labels
IO Excel read_excel, to_excel

Comments

@staticdev
Copy link
Contributor

staticdev commented Aug 16, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
def date_parse(value: str) -> datetime.datetime:
    return datetime.datetime.strptime(value.strip(), "%d/%m/%y")

df = pd.read_excel(
    filepath,
    encoding="iso-8859-1",
    usecols="B",
    date_parser=date_parse,
    skiprows=4,
)

Problem description

TypeError: read_excel() got an unexpected keyword argument 'encoding'.

Expected Output

No error, as in previous versions.

Output of pd.show_versions()

------------------
commit           : d9fff2792bf16178d4e450fe7384244e50635733
python           : 3.8.2.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.0-42-generic
Version          : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : pt_BR.UTF-8

pandas           : 1.1.0
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1
setuptools       : 41.2.0
Cython           : None
pytest           : 6.0.1
hypothesis       : None
sphinx           : 3.2.0
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : None
numba            : None```

</details>
@staticdev staticdev added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 16, 2020
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Aug 16, 2020

Thanks @staticdev for the report. This is documented in the release notes.

from https://pandas.pydata.org/docs/whatsnew/v1.1.0.html

read_excel() no longer takes **kwds arguments. This means that passing in the keyword argument chunksize now raises a TypeError (previously raised a NotImplementedError), while passing in the keyword argument encoding now raises a TypeError (GH34464)

were the constructed DataFrames different with and without the encoding parameter in 1.0.5?

updated: incorrect link

xref #34464

@simonjayhawkins simonjayhawkins added Needs Info Clarification about behavior needed to assess issue Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 16, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Aug 16, 2020
@twoertwein
Copy link
Member

There have been similar requests about an encoding parameter: #25523 and #23444.

I haven't used read_excel but the documentation says that you can also provide a file-object instead of a path. Something like

with open(path_to_excel_file, mode="r", encoding="iso-8859-1") as file:
    df = pd.read_excel(file, **other_kwargs)

might be a temporary work around? I haven't tested whether that works.

@staticdev
Copy link
Contributor Author

staticdev commented Aug 16, 2020

Thanks @staticdev for the report. This is documented in the release notes.

from https://pandas.pydata.org/docs/whatsnew/v1.1.0.html

read_excel() no longer takes **kwds arguments. This means that passing in the keyword argument chunksize now raises a TypeError (previously raised a NotImplementedError), while passing in the keyword argument encoding now raises a TypeError (GH34464)

were the constructed DataFrames different with and without the encoding parameter in 1.0.5?

Yes, and I was using 1.0.5 before 1.1.0 working fine.

updated: incorrect link

xref #34464

Another odd thing to add about this change is that I have another point in my code that hasn't broken and I also using kwargs and also encoding:

df = pd.read_excel(
    filename,
    encoding="iso-8859-1",
    usecols="B:K",
    parse_dates=["Date"],
    date_parser=date_parse,
    skipfooter=4,
    skiprows=10,
)

@staticdev
Copy link
Contributor Author

staticdev commented Aug 16, 2020

There have been similar requests about an encoding parameter: #25523 and #23444.

I haven't used read_excel but the documentation says that you can also provide a file-object instead of a path. Something like

with open(path_to_excel_file, mode="r", encoding="iso-8859-1") as file:
    df = pd.read_excel(file, **other_kwargs)

might be a temporary work around? I haven't tested whether that works.

I also tried that, but created many side-effects to my code. I have a test with a file-fake that checks if it is an empty file, it should raise pd.errors.EmptyDataError, now it is giving me TypeError('expected str, bytes or os.PathLike object, not NoneType').

@twoertwein
Copy link
Member

There have been similar requests about an encoding parameter: #25523 and #23444.
I haven't used read_excel but the documentation says that you can also provide a file-object instead of a path. Something like

with open(path_to_excel_file, mode="r", encoding="iso-8859-1") as file:
    df = pd.read_excel(file, **other_kwargs)

might be a temporary work around? I haven't tested whether that works.

I also tried that, but created many side-effects to my code. I have a test with a file-fake that checks if it is an empty file, it should raise pd.errors.EmptyDataError, now it is giving me TypeError('expected str, bytes or os.PathLike object, not NoneType').

I just noticed that my suggestion doesn't make much sense. The excel file needs to be opened in binary mode but binary files don't take an encoding argument.

@twoertwein
Copy link
Member

Does someone know how to create an excel file with a specific encoding using either LibreOffice or gnumeric? There are no excel files needing a non-default encoding in pandas/tests/io/data/excel/. Need to have a test for the encoding argument.

@simonjayhawkins simonjayhawkins added Bug IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version and removed Needs Info Clarification about behavior needed to assess issue Usage Question labels Aug 17, 2020
@staticdev
Copy link
Contributor Author

@twoertwein what I can do is send you one with this "iso-8859-1" encoding. Would it suffice?

@twoertwein
Copy link
Member

@staticdev thank you, that would probably be the best. Ideally, just one sheet with only one cell filled with a bunch of special characters :)

@staticdev
Copy link
Contributor Author

Please rename it to xls, since github complains about extension.

InfoCEI.txt

@twoertwein
Copy link
Member

thank you! I seem to be able to open this Excel file in pandas without specifying an encoding. Can you confirm that you can open it as well?

Let me know if you have a file that needs an encoding argument! Please try to create a minimal Excel file, ideally only one cell having content.

@staticdev
Copy link
Contributor Author

@twoertwein you are correct. I tried removing encoding arg, and I SEEMS to be working fine. I am not very confident though. This code was running for the past versions of pandas even before 1.0.0. But I didn't see on the 1.1.0 changelog that it automatically detects the encoding or something alike. I really don't know if just removing the encoding really solves the problem.

@staticdev
Copy link
Contributor Author

Does it solve for #23444?

@twoertwein
Copy link
Member

Does it solve for #23444?

I thought these issues were similar but I think they aren't. imho #23444 tries to allow read_excel to work with text file-objects.

@staticdev
Copy link
Contributor Author

staticdev commented Aug 17, 2020

I saw it in your PR.. so I think the most important question here is why the behavior changed. I didn't find a clear explanation from the https://pandas.pydata.org/docs/whatsnew/v1.1.0.html#deprecations. The only related thing is this one: Passing any arguments but the first two to read_excel() as positional arguments is deprecated. All other arguments should be given as keyword arguments (GH27573). When I click it, it is not even related to read_excel. So I conclude that something IS wrong (maybe the documentation). Maybe @alexitkes could help us figure that out.

@twoertwein
Copy link
Member

Do you mind trying the old pandas version to see whether you actually needed the encoding argument?

imho, encoding probably never worked: it was passed to TextFileReader but I think it isn't used there.

@staticdev
Copy link
Contributor Author

staticdev commented Aug 17, 2020

@twoertwein I can try, but I really doubt that one. I would never add this argument if it wasn't necessary and the code for this component was 100% written by me. I remember encoding was one of the first problems I had to solve.

@staticdev
Copy link
Contributor Author

staticdev commented Oct 6, 2020

So we assume encoded is not needed anymore? I am ok with that if other people are not facing issues with that.

@nahuelalmeira
Copy link

@twoertwein In my case, the files ended up being successfully read.

@simonjayhawkins simonjayhawkins added this to the 1.1.4 milestone Oct 7, 2020
@simonjayhawkins simonjayhawkins added the Closing Candidate May be closeable, needs more eyeballs label Oct 29, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.4 milestone (scheduled for release tomorrow)

@simonjayhawkins
Copy link
Member

Removing regression tag. Providing encoding here has always been a no-op you would just now getting an explicit error about that

was encoding ever part of the documented api.

we could either close this issue or update read_excel to accept the encoding parameter with a FutureWarning.

@WillAyd
Copy link
Member

WillAyd commented Oct 29, 2020

At least since excel 97 text in excel documents has a well defined encoding.

https://xlrd.readthedocs.io/en/latest/unicode.html?highlight=encoding_override#handling-of-unicode

I don't think it is worth supporting Excel 95 and earlier file formats at this point

@WillAyd
Copy link
Member

WillAyd commented Oct 29, 2020

Closing as a non-issue. Providing encoding as a keyword argument is unnecessary and only "worked" before because we silently accepted kwargs and discard it; it never had any functionality and should be removed from code

@WillAyd WillAyd closed this as completed Oct 29, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.5, No action Oct 29, 2020
@simonjayhawkins simonjayhawkins removed Closing Candidate May be closeable, needs more eyeballs Needs Discussion Requires discussion from core team before further action labels Oct 29, 2020
@torncheesecake
Copy link

I find I need to use the encoding for keeping the UTF-8 encoding on my files when other people on different platforms open them. I was having issues with the formatting of special characters on Windows machines but not on Macs due to the file changing encodings. Forcing the UTF-8 encodings on opening and saving documents meant my files work on all platforms. Without this encoding working I could run into problems again. Is there another workaround if this is being removed from the pandas code??

@boardtc
Copy link

boardtc commented Jan 20, 2021

I have language accents in my excel, when reading from csv I can pass encoding='utf-8'. Why would Pandas remove this ability from reading excel files? This is a blocker for me now, all of my accented words are getting garbled when read in from excel. The after my wrangling I am writing to csv and get UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f3c0' in position 69: character maps to
What is the workaround that allowed you to remove this functionality?

Note my excel is a .xlsx export from Google sheets.

I am disappointed nobody responded to Matthew over 2 months ago, what hope is there for me? Since this is closed should I open a new issue? I will wait a couple of hours.

@twoertwein
Copy link
Member

Why would Pandas remove this ability from reading excel files?

As far as I remember: It was removed because it was never actually used internally. You were able to specify it but it didn't make a difference. Do you have an excel file for which you needed the encoding argument on 1.0.x (and you verfiy that it didn't work without the argument)?

Please create a new issue for the CSV problem you are encountering. [If you have this issue with 1.2.0, please upgrade first to 1.2.1 as your problem might have been fixed - sound similar to #38989]

@boardtc
Copy link

boardtc commented Jan 20, 2021

Thanks for the reply!
I'm unsure how to answer your question, my .xlsx has UTF-8 accents and it's a save from Google sheets.
I am using the latest Anaconda which has Pandas 1.1.3
Sorry you are right, the garbles are not causing the error so that still an issue but the error is a CSV issue, U+1F3C0 & U+1F355 are not being encoded but I was using csv,writer. not pandas as each row in my 400 row dataframe became 6 rows in the CSV and I could not figure out how to do that using pandas.
(conda install pandas=1.2.0 failed due to a savage amount of conflicts)

I found https://github.com/python/cpython but could not figure out how to open an issue against core python

@asishm
Copy link
Contributor

asishm commented Feb 15, 2021

If you believe there is a bug with the python stdlib modules - the bug tracker is at https://bugs.python.org/

@torncheesecake
Copy link

I have language accents in my excel, when reading from csv I can pass encoding='utf-8'. Why would Pandas remove this ability from reading excel files? This is a blocker for me now, all of my accented words are getting garbled when read in from excel. The after my wrangling I am writing to csv and get UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f3c0' in position 69: character maps to
What is the workaround that allowed you to remove this functionality?

Note my excel is a .xlsx export from Google sheets.

I am disappointed nobody responded to Matthew over 2 months ago, what hope is there for me? Since this is closed should I open a new issue? I will wait a couple of hours.

Still no work around for this issue. Means if I switch from Mac to a PC my file is unable to run due to the 'charmap' code issue even with 1.2.3

How has this not been fixed yet??

@twoertwein
Copy link
Member

twoertwein commented Mar 4, 2021

@matthewahillman before this issue can be fixed, it needs to be re-produced. Do you have an excel file that you can't open without the encoding option in 1.1.* but you can open it with the encoding option in 1.1.*?

@rdancy01
Copy link

Hello all,

I'm new to Pandas and I can't solve these errors I'm getting. Any assistance would be appreciated.

Traceback (most recent call last):
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 515, in
main()
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 471, in main
[all_data,suite_list] = openSpreadsheet()
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 91, in openSpreadsheet
all_data = pd.read_excel(workbook_name, 1 ,usecols=['Name','Test Stand','Status',
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\util_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
TypeError: read_excel() got an unexpected keyword argument 'index'

C:\Users\HA_Report_Generation\Report Generator Legacy>python3 report_generator_1_5_3.py
Opening spreadsheet from .\Regression\CVB Logix Firmware-Test Run-20210411.xls...

Traceback (most recent call last):
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 515, in
main()
File "C:\Users\HA_Report_Generation\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 471, in main
[all_data,suite_list] = openSpreadsheet()
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 91, in openSpreadsheet
all_data = pd.read_excel(workbook_name, 1 ,usecols=['Name','Test Stand','Status',
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\util_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
TypeError: read_excel() got an unexpected keyword argument 'index'

@twoertwein
Copy link
Member

TypeError: read_excel() got an unexpected keyword argument 'index'

I don't think your error is related to encoding. Please have a look at the possible arguments for read_excel (they are not the same as for read_csv): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
If that doesn't solve your error, feel free to create a new issue!

@rdancy01
Copy link

Turns out instead of 'index=False' I needed to change it to 'index_cols'=False. That seemed to work. Many thanks!

@IloBe
Copy link

IloBe commented Jul 8, 2021

Hi all,
I would like to ask again, how to solve the encoding topic with Pandas read_excel() of version>1.1? My xlxs Excel files including German text strings worked fine with encoding="ISO-8859-2", but now all specific characters are mapped to signs which are not acceptable for common German users of the final reports. First converting to .csv is no workaround for my files. and sorry, I cannot deliver such files. But perhaps you can try files with text strings including ä, ü, ö or ß.
I tried the open file code snippet from the beginning, but had other issues with it.
If there is another solution I would appreciate if someone could share it. Thanks in advance.

@bsnresearcher
Copy link

The encoding argument is very important. I want to come back "encoding=".

Excel application windows edition's default encoding ShiftJIS(cp932). The export file(.xlsx) causes character corruption in dealing with pandas in MacOS because the os default encoding UTF8.

@kapilhp
Copy link

kapilhp commented May 3, 2022

The workaround suggested here seems to work.

wb = xlrd.open_workbook(path, encoding_override='CORRECT_ENCODING')
df = pd.read_excel(wb)

@Nacnano
Copy link

Nacnano commented Jun 23, 2022

I am not sure whether the developers see these comments since this issue is already closed. The encoding is very important for the program to interpret special characters. So, I would be pleased if there would be a solution to this problem (either the encoding or any other alternatives should be fine for me)
Thank you

@hunter2009pf
Copy link

encoding parameter is needed !!!
Year 2023 updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.