Skip to content

Fix Issue 34748 - read in datetime as MultiIndex for column headers #34954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 8, 2020
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1051,6 +1051,7 @@ I/O
- Bug in :meth:`~HDFStore.create_table` now raises an error when `column` argument was not specified in `data_columns` on input (:issue:`28156`)
- :meth:`read_json` now could read line-delimited json file from a file url while `lines` and `chunksize` are set.
- Bug in :meth:`DataFrame.to_sql` when reading DataFrames with ``-np.inf`` entries with MySQL now has a more explicit ``ValueError`` (:issue:`34431`)
- Bug in "meth"`read_excel` where datetime values are used in the header in a `MultiIndex` (:issue:`34748`)

Plotting
^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1614,7 +1614,7 @@ def extract(r):
# Clean the column names (if we have an index_col).
if len(ic):
col_names = [
r[0] if (len(r[0]) and r[0] not in self.unnamed_cols) else None
r[0] if ((r[0] is not None) and r[0] not in self.unnamed_cols) else None
for r in header
]
else:
Expand Down
Binary file added pandas/tests/io/data/excel/test_datetime_mi.ods
Binary file not shown.
Binary file not shown.
Binary file added pandas/tests/io/data/excel/test_datetime_mi.xlsb
Binary file not shown.
Binary file added pandas/tests/io/data/excel/test_datetime_mi.xlsm
Binary file not shown.
Binary file added pandas/tests/io/data/excel/test_datetime_mi.xlsx
Binary file not shown.
19 changes: 19 additions & 0 deletions pandas/tests/io/excel/test_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1143,3 +1143,22 @@ def test_header_with_index_col(self, engine, filename):
filename, sheet_name="Sheet1", index_col=0, header=[0, 1]
)
tm.assert_frame_equal(expected, result)

def test_read_datetime_multiindex(self, engine, read_ext):
# GH 34748
if engine == "pyxlsb":
pytest.xfail("Sheets containing datetimes not supported by pyxlsb")

f = "test_datetime_mi" + read_ext
with pd.ExcelFile(f) as excel:
actual = pd.read_excel(excel, header=[0, 1], index_col=0, engine=engine)
expected_column_index = pd.MultiIndex.from_tuples(
[(pd.to_datetime("02/29/2020"), pd.to_datetime("03/01/2020"))],
names=[
pd.to_datetime("02/29/2020").to_pydatetime(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the to_pydatetime calls required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when the Excel reader creates the names of the index, the types are of dt.datetime not the pandas datetime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, where is this done? this is unfortunately as these should actually be Timestamp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, where is this done? this is unfortunately as these should actually be Timestamp

So is this a separate issue - that we don't want the names to be dt.datetime ? If so, I will create an issue for that.

pd.to_datetime("03/01/2020").to_pydatetime(),
],
)
expected = pd.DataFrame([], columns=expected_column_index)

tm.assert_frame_equal(expected, actual)