Skip to content

Commit 7861850

Browse files
committed
BUG: Patch Python 2.x non-conversion of Unicode
For column names only.
1 parent 7153e5f commit 7861850

File tree

2 files changed

+43
-5
lines changed

2 files changed

+43
-5
lines changed

doc/source/whatsnew/v0.24.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1402,6 +1402,7 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
14021402
- Bug in :meth:`read_csv()` in which unnecessary warnings were being raised when the dialect's values conflicted with the default arguments (:issue:`23761`)
14031403
- Bug in :meth:`read_html()` in which the error message was not displaying the valid flavors when an invalid one was provided (:issue:`23549`)
14041404
- Bug in :meth:`read_excel()` in which extraneous header names were extracted, even though none were specified (:issue:`11733`)
1405+
- Bug in :meth:`read_excel()` in which column names were not being properly converted to string sometimes in Python 2.x (:issue:`23874`)
14051406
- Bug in :meth:`read_excel()` in which ``index_col=None`` was not being respected and parsing index columns anyway (:issue:`18792`, :issue:`20480`)
14061407
- Bug in :meth:`read_excel()` in which ``usecols`` was not being validated for proper column names when passed in as a string (:issue:`20480`)
14071408
- :func:`DataFrame.to_string()`, :func:`DataFrame.to_html()`, :func:`DataFrame.to_latex()` will correctly format output when a string is passed as the ``float_format`` argument (:issue:`21625`, :issue:`22270`)

pandas/io/excel.py

+42-5
Original file line numberDiff line numberDiff line change
@@ -662,10 +662,14 @@ def _parse_cell(cell_contents, cell_typ):
662662

663663
output[asheetname] = parser.read(nrows=nrows)
664664

665-
if ((not squeeze or isinstance(output[asheetname], DataFrame))
666-
and header_names):
667-
output[asheetname].columns = output[
668-
asheetname].columns.set_names(header_names)
665+
if not squeeze or isinstance(output[asheetname], DataFrame):
666+
if header_names:
667+
output[asheetname].columns = output[
668+
asheetname].columns.set_names(header_names)
669+
elif compat.PY2:
670+
output[asheetname].columns = _maybe_convert_to_string(
671+
output[asheetname].columns)
672+
669673
except EmptyDataError:
670674
# No Data, return an empty DataFrame
671675
output[asheetname] = DataFrame()
@@ -810,6 +814,39 @@ def _trim_excel_header(row):
810814
return row
811815

812816

817+
def _maybe_convert_to_string(row):
818+
"""
819+
Convert elements in a row to string from Unicode.
820+
821+
This is purely a Python 2.x patch and is performed ONLY when all
822+
elements of the row are string-like.
823+
824+
Parameters
825+
----------
826+
row : array-like
827+
The row of data to convert.
828+
829+
Returns
830+
-------
831+
converted : array-like
832+
"""
833+
if compat.PY2:
834+
converted = []
835+
836+
for i in range(len(row)):
837+
if isinstance(row[i], compat.string_types):
838+
try:
839+
converted.append(str(row[i]))
840+
except UnicodeEncodeError:
841+
break
842+
else:
843+
break
844+
else:
845+
row = converted
846+
847+
return row
848+
849+
813850
def _fill_mi_header(row, control_row):
814851
"""Forward fills blank entries in row, but only inside the same parent index
815852
@@ -838,7 +875,7 @@ def _fill_mi_header(row, control_row):
838875
control_row[i] = False
839876
last = row[i]
840877

841-
return row, control_row
878+
return _maybe_convert_to_string(row), control_row
842879

843880
# fill blank if index_col not None
844881

0 commit comments

Comments
 (0)