BUG: read_excel not accepting encoding on 1.1.0 #35753

staticdev · 2020-08-16T17:22:10Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
def date_parse(value: str) -> datetime.datetime:
    return datetime.datetime.strptime(value.strip(), "%d/%m/%y")

df = pd.read_excel(
    filepath,
    encoding="iso-8859-1",
    usecols="B",
    date_parser=date_parse,
    skiprows=4,
)

Problem description

TypeError: read_excel() got an unexpected keyword argument 'encoding'.

Expected Output

No error, as in previous versions.

Output of `pd.show_versions()`

------------------
commit           : d9fff2792bf16178d4e450fe7384244e50635733
python           : 3.8.2.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.0-42-generic
Version          : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : pt_BR.UTF-8

pandas           : 1.1.0
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1
setuptools       : 41.2.0
Cython           : None
pytest           : 6.0.1
hypothesis       : None
sphinx           : 3.2.0
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : None
numba            : None```

</details>

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2020-08-16T17:59:47Z

Thanks @staticdev for the report. This is documented in the release notes.

from https://pandas.pydata.org/docs/whatsnew/v1.1.0.html

read_excel() no longer takes **kwds arguments. This means that passing in the keyword argument chunksize now raises a TypeError (previously raised a NotImplementedError), while passing in the keyword argument encoding now raises a TypeError (GH34464)

were the constructed DataFrames different with and without the encoding parameter in 1.0.5?

updated: incorrect link

xref #34464

twoertwein · 2020-08-16T18:05:23Z

There have been similar requests about an encoding parameter: #25523 and #23444.

I haven't used read_excel but the documentation says that you can also provide a file-object instead of a path. Something like

with open(path_to_excel_file, mode="r", encoding="iso-8859-1") as file:
    df = pd.read_excel(file, **other_kwargs)

might be a temporary work around? I haven't tested whether that works.

staticdev · 2020-08-16T18:20:48Z

Thanks @staticdev for the report. This is documented in the release notes.

from https://pandas.pydata.org/docs/whatsnew/v1.1.0.html

read_excel() no longer takes **kwds arguments. This means that passing in the keyword argument chunksize now raises a TypeError (previously raised a NotImplementedError), while passing in the keyword argument encoding now raises a TypeError (GH34464)

were the constructed DataFrames different with and without the encoding parameter in 1.0.5?

Yes, and I was using 1.0.5 before 1.1.0 working fine.

updated: incorrect link

xref #34464

Another odd thing to add about this change is that I have another point in my code that hasn't broken and I also using kwargs and also encoding:

df = pd.read_excel(
    filename,
    encoding="iso-8859-1",
    usecols="B:K",
    parse_dates=["Date"],
    date_parser=date_parse,
    skipfooter=4,
    skiprows=10,
)

staticdev · 2020-08-16T18:51:06Z

There have been similar requests about an encoding parameter: #25523 and #23444.

I haven't used read_excel but the documentation says that you can also provide a file-object instead of a path. Something like
with open(path_to_excel_file, mode="r", encoding="iso-8859-1") as file:
    df = pd.read_excel(file, **other_kwargs)
might be a temporary work around? I haven't tested whether that works.

I also tried that, but created many side-effects to my code. I have a test with a file-fake that checks if it is an empty file, it should raise pd.errors.EmptyDataError, now it is giving me TypeError('expected str, bytes or os.PathLike object, not NoneType').

twoertwein · 2020-08-16T19:58:40Z

There have been similar requests about an encoding parameter: #25523 and #23444.
I haven't used read_excel but the documentation says that you can also provide a file-object instead of a path. Something like
with open(path_to_excel_file, mode="r", encoding="iso-8859-1") as file:
    df = pd.read_excel(file, **other_kwargs)
might be a temporary work around? I haven't tested whether that works.
I also tried that, but created many side-effects to my code. I have a test with a file-fake that checks if it is an empty file, it should raise pd.errors.EmptyDataError, now it is giving me TypeError('expected str, bytes or os.PathLike object, not NoneType').

I just noticed that my suggestion doesn't make much sense. The excel file needs to be opened in binary mode but binary files don't take an encoding argument.

twoertwein · 2020-08-17T03:37:02Z

Does someone know how to create an excel file with a specific encoding using either LibreOffice or gnumeric? There are no excel files needing a non-default encoding in pandas/tests/io/data/excel/. Need to have a test for the encoding argument.

staticdev · 2020-08-17T10:12:10Z

@twoertwein what I can do is send you one with this "iso-8859-1" encoding. Would it suffice?

twoertwein · 2020-08-17T13:00:22Z

@staticdev thank you, that would probably be the best. Ideally, just one sheet with only one cell filled with a bunch of special characters :)

staticdev · 2020-08-17T14:19:36Z

Please rename it to xls, since github complains about extension.

InfoCEI.txt

twoertwein · 2020-08-17T16:25:44Z

thank you! I seem to be able to open this Excel file in pandas without specifying an encoding. Can you confirm that you can open it as well?

Let me know if you have a file that needs an encoding argument! Please try to create a minimal Excel file, ideally only one cell having content.

staticdev · 2020-08-17T19:02:25Z

@twoertwein you are correct. I tried removing encoding arg, and I SEEMS to be working fine. I am not very confident though. This code was running for the past versions of pandas even before 1.0.0. But I didn't see on the 1.1.0 changelog that it automatically detects the encoding or something alike. I really don't know if just removing the encoding really solves the problem.

staticdev · 2020-08-17T19:03:36Z

Does it solve for #23444?

twoertwein · 2020-08-17T21:18:39Z

Does it solve for #23444?

I thought these issues were similar but I think they aren't. imho #23444 tries to allow read_excel to work with text file-objects.

staticdev · 2020-08-17T22:43:21Z

I saw it in your PR.. so I think the most important question here is why the behavior changed. I didn't find a clear explanation from the https://pandas.pydata.org/docs/whatsnew/v1.1.0.html#deprecations. The only related thing is this one: Passing any arguments but the first two to read_excel() as positional arguments is deprecated. All other arguments should be given as keyword arguments (GH27573). When I click it, it is not even related to read_excel. So I conclude that something IS wrong (maybe the documentation). Maybe @alexitkes could help us figure that out.

twoertwein · 2020-08-17T23:13:05Z

Do you mind trying the old pandas version to see whether you actually needed the encoding argument?

imho, encoding probably never worked: it was passed to TextFileReader but I think it isn't used there.

staticdev · 2020-08-17T23:24:17Z

@twoertwein I can try, but I really doubt that one. I would never add this argument if it wasn't necessary and the code for this component was 100% written by me. I remember encoding was one of the first problems I had to solve.

staticdev · 2020-10-06T23:32:00Z

So we assume encoded is not needed anymore? I am ok with that if other people are not facing issues with that.

nahuelalmeira · 2020-10-07T10:57:46Z

@twoertwein In my case, the files ended up being successfully read.

simonjayhawkins · 2020-10-29T15:41:18Z

moved off 1.1.4 milestone (scheduled for release tomorrow)

simonjayhawkins · 2020-10-29T15:46:36Z

Removing regression tag. Providing encoding here has always been a no-op you would just now getting an explicit error about that

was encoding ever part of the documented api.

we could either close this issue or update read_excel to accept the encoding parameter with a FutureWarning.

WillAyd · 2020-10-29T15:48:16Z

At least since excel 97 text in excel documents has a well defined encoding.

https://xlrd.readthedocs.io/en/latest/unicode.html?highlight=encoding_override#handling-of-unicode

I don't think it is worth supporting Excel 95 and earlier file formats at this point

WillAyd · 2020-10-29T15:49:36Z

Closing as a non-issue. Providing encoding as a keyword argument is unnecessary and only "worked" before because we silently accepted kwargs and discard it; it never had any functionality and should be removed from code

torncheesecake · 2020-11-02T11:32:07Z

I find I need to use the encoding for keeping the UTF-8 encoding on my files when other people on different platforms open them. I was having issues with the formatting of special characters on Windows machines but not on Macs due to the file changing encodings. Forcing the UTF-8 encodings on opening and saving documents meant my files work on all platforms. Without this encoding working I could run into problems again. Is there another workaround if this is being removed from the pandas code??

boardtc · 2021-01-20T19:49:14Z

I have language accents in my excel, when reading from csv I can pass encoding='utf-8'. Why would Pandas remove this ability from reading excel files? This is a blocker for me now, all of my accented words are getting garbled when read in from excel. The after my wrangling I am writing to csv and get UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f3c0' in position 69: character maps to
What is the workaround that allowed you to remove this functionality?

Note my excel is a .xlsx export from Google sheets.

I am disappointed nobody responded to Matthew over 2 months ago, what hope is there for me? Since this is closed should I open a new issue? I will wait a couple of hours.

twoertwein · 2021-01-20T20:01:30Z

Why would Pandas remove this ability from reading excel files?

As far as I remember: It was removed because it was never actually used internally. You were able to specify it but it didn't make a difference. Do you have an excel file for which you needed the encoding argument on 1.0.x (and you verfiy that it didn't work without the argument)?

Please create a new issue for the CSV problem you are encountering. [If you have this issue with 1.2.0, please upgrade first to 1.2.1 as your problem might have been fixed - sound similar to #38989]

boardtc · 2021-01-20T20:16:50Z

Thanks for the reply!
I'm unsure how to answer your question, my .xlsx has UTF-8 accents and it's a save from Google sheets.
I am using the latest Anaconda which has Pandas 1.1.3
Sorry you are right, the garbles are not causing the error so that still an issue but the error is a CSV issue, U+1F3C0 & U+1F355 are not being encoded but I was using csv,writer. not pandas as each row in my 400 row dataframe became 6 rows in the CSV and I could not figure out how to do that using pandas.
(conda install pandas=1.2.0 failed due to a savage amount of conflicts)

I found https://github.com/python/cpython but could not figure out how to open an issue against core python

asishm · 2021-02-15T05:10:55Z

If you believe there is a bug with the python stdlib modules - the bug tracker is at https://bugs.python.org/

torncheesecake · 2021-03-04T11:55:45Z

I have language accents in my excel, when reading from csv I can pass encoding='utf-8'. Why would Pandas remove this ability from reading excel files? This is a blocker for me now, all of my accented words are getting garbled when read in from excel. The after my wrangling I am writing to csv and get UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f3c0' in position 69: character maps to
What is the workaround that allowed you to remove this functionality?

Note my excel is a .xlsx export from Google sheets.

I am disappointed nobody responded to Matthew over 2 months ago, what hope is there for me? Since this is closed should I open a new issue? I will wait a couple of hours.

Still no work around for this issue. Means if I switch from Mac to a PC my file is unable to run due to the 'charmap' code issue even with 1.2.3

How has this not been fixed yet??

twoertwein · 2021-03-04T15:16:37Z

@matthewahillman before this issue can be fixed, it needs to be re-produced. Do you have an excel file that you can't open without the encoding option in 1.1.* but you can open it with the encoding option in 1.1.*?

rdancy01 · 2021-04-21T20:35:15Z

Hello all,

I'm new to Pandas and I can't solve these errors I'm getting. Any assistance would be appreciated.

Traceback (most recent call last):
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 515, in
main()
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 471, in main
[all_data,suite_list] = openSpreadsheet()
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 91, in openSpreadsheet
all_data = pd.read_excel(workbook_name, 1 ,usecols=['Name','Test Stand','Status',
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\util_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
TypeError: read_excel() got an unexpected keyword argument 'index'

C:\Users\HA_Report_Generation\Report Generator Legacy>python3 report_generator_1_5_3.py
Opening spreadsheet from .\Regression\CVB Logix Firmware-Test Run-20210411.xls...

Traceback (most recent call last):
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 515, in
main()
File "C:\Users\HA_Report_Generation\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 471, in main
[all_data,suite_list] = openSpreadsheet()
File "C:\Users\HA_Report_Generation\Report Generator Legacy\report_generator_1_5_3.py", line 91, in openSpreadsheet
all_data = pd.read_excel(workbook_name, 1 ,usecols=['Name','Test Stand','Status',
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\util_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
TypeError: read_excel() got an unexpected keyword argument 'index'

twoertwein · 2021-04-21T21:35:32Z

TypeError: read_excel() got an unexpected keyword argument 'index'

I don't think your error is related to encoding. Please have a look at the possible arguments for read_excel (they are not the same as for read_csv): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
If that doesn't solve your error, feel free to create a new issue!

rdancy01 · 2021-04-23T14:23:26Z

Turns out instead of 'index=False' I needed to change it to 'index_cols'=False. That seemed to work. Many thanks!

IloBe · 2021-07-08T17:55:24Z

Hi all,
I would like to ask again, how to solve the encoding topic with Pandas read_excel() of version>1.1? My xlxs Excel files including German text strings worked fine with encoding="ISO-8859-2", but now all specific characters are mapped to signs which are not acceptable for common German users of the final reports. First converting to .csv is no workaround for my files. and sorry, I cannot deliver such files. But perhaps you can try files with text strings including ä, ü, ö or ß.
I tried the open file code snippet from the beginning, but had other issues with it.
If there is another solution I would appreciate if someone could share it. Thanks in advance.

bsnresearcher · 2021-07-11T04:45:11Z

The encoding argument is very important. I want to come back "encoding=".

Excel application windows edition's default encoding ShiftJIS(cp932). The export file(.xlsx) causes character corruption in dealing with pandas in MacOS because the os default encoding UTF8.

kapilhp · 2022-05-03T13:07:46Z

The workaround suggested here seems to work.

wb = xlrd.open_workbook(path, encoding_override='CORRECT_ENCODING')
df = pd.read_excel(wb)

Nacnano · 2022-06-23T10:54:31Z

I am not sure whether the developers see these comments since this issue is already closed. The encoding is very important for the program to interpret special characters. So, I would be pleased if there would be a solution to this problem (either the encoding or any other alternatives should be fine for me)
Thank you

hunter2009pf · 2023-08-03T05:56:15Z

encoding parameter is needed !!!
Year 2023 updated!

staticdev added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 16, 2020

staticdev mentioned this issue Aug 16, 2020

Fix read_xls arguments with pandas 1.1+ staticdev/irpf-cei#183

Closed

simonjayhawkins added Needs Info Clarification about behavior needed to assess issue Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 16, 2020

simonjayhawkins added this to the 1.1.1 milestone Aug 16, 2020

twoertwein mentioned this issue Aug 17, 2020

REGR: re-add encoding for read_excel #35758

Closed

5 tasks

simonjayhawkins added Bug IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version and removed Needs Info Clarification about behavior needed to assess issue Usage Question labels Aug 17, 2020

simonjayhawkins added this to the 1.1.4 milestone Oct 7, 2020

simonjayhawkins added the Closing Candidate May be closeable, needs more eyeballs label Oct 29, 2020

simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020

WillAyd closed this as completed Oct 29, 2020

simonjayhawkins modified the milestones: 1.1.5, No action Oct 29, 2020

simonjayhawkins removed Closing Candidate May be closeable, needs more eyeballs Needs Discussion Requires discussion from core team before further action labels Oct 29, 2020

BUG: read_excel not accepting encoding on 1.1.0 #35753

BUG: read_excel not accepting encoding on 1.1.0 #35753

Comments

staticdev commented Aug 16, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

simonjayhawkins commented Aug 16, 2020 • edited Loading

twoertwein commented Aug 16, 2020

staticdev commented Aug 16, 2020 • edited Loading

staticdev commented Aug 16, 2020 • edited Loading

twoertwein commented Aug 16, 2020

twoertwein commented Aug 17, 2020

staticdev commented Aug 17, 2020

twoertwein commented Aug 17, 2020

staticdev commented Aug 17, 2020

twoertwein commented Aug 17, 2020

staticdev commented Aug 17, 2020

staticdev commented Aug 17, 2020

twoertwein commented Aug 17, 2020

staticdev commented Aug 17, 2020 • edited Loading

twoertwein commented Aug 17, 2020

staticdev commented Aug 17, 2020 • edited Loading

staticdev commented Oct 6, 2020 • edited Loading

nahuelalmeira commented Oct 7, 2020

simonjayhawkins commented Oct 29, 2020

simonjayhawkins commented Oct 29, 2020

WillAyd commented Oct 29, 2020 • edited Loading

WillAyd commented Oct 29, 2020

torncheesecake commented Nov 2, 2020

boardtc commented Jan 20, 2021 • edited Loading

twoertwein commented Jan 20, 2021

boardtc commented Jan 20, 2021 • edited Loading

asishm commented Feb 15, 2021

torncheesecake commented Mar 4, 2021

twoertwein commented Mar 4, 2021 • edited Loading

rdancy01 commented Apr 21, 2021

twoertwein commented Apr 21, 2021

rdancy01 commented Apr 23, 2021

IloBe commented Jul 8, 2021

bsnresearcher commented Jul 11, 2021

kapilhp commented May 3, 2022

Nacnano commented Jun 23, 2022

hunter2009pf commented Aug 3, 2023

staticdev commented Aug 16, 2020 •

edited

Loading

Output of `pd.show_versions()`

simonjayhawkins commented Aug 16, 2020 •

edited

Loading

staticdev commented Aug 16, 2020 •

edited

Loading

staticdev commented Aug 16, 2020 •

edited

Loading

staticdev commented Aug 17, 2020 •

edited

Loading

staticdev commented Aug 17, 2020 •

edited

Loading

staticdev commented Oct 6, 2020 •

edited

Loading

WillAyd commented Oct 29, 2020 •

edited

Loading

boardtc commented Jan 20, 2021 •

edited

Loading

boardtc commented Jan 20, 2021 •

edited

Loading

twoertwein commented Mar 4, 2021 •

edited

Loading