-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_excel() modifies provided types dict when accessing file with duplicate column #42508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
cf40fb3
a45535c
6c98fa3
e79e9a1
c10b931
4419146
b63aef2
f0f3022
cf27280
52c9bd5
0f78c9f
cb369bb
7bcb504
ffb5852
72d50f4
57c65e5
3df5cf3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1278,6 +1278,24 @@ def test_ignore_chartsheets_by_int(self, request, read_ext): | |
): | ||
pd.read_excel("chartsheet" + read_ext, sheet_name=1) | ||
|
||
def test_dtype_dict_unchanged_with_duplicate_columns(self, read_ext): | ||
# GH 42462 | ||
|
||
filename = "test_common_headers" + read_ext | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you not use existing data? or simply do a round trip; i don't want to add even more files like this |
||
dtype_dict = {"a": str, "b": str, "c": str} | ||
dtype_dict_copy = dtype_dict.copy() | ||
result = pd.read_excel(filename, dtype=dtype_dict) | ||
expected = DataFrame( | ||
{ | ||
"a": ["1", "2", "3"], | ||
"a.1": ["1", "2", "3"], | ||
"b": ["b1", "b2", "b3"], | ||
"c": ["c1", "c2", "c3"], | ||
} | ||
) | ||
assert dtype_dict == dtype_dict_copy, "dtype dict changed" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we also check that the resulting frame is as expected? (I know this is focusing on the dtypes dict, but may as well also test the reading portion here since unlikely we have great coverage for |
||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
class TestExcelFileRead: | ||
@pytest.fixture(autouse=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please a small comment why this is necessary (even just pointing back to the issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzeitlin11 added comment pointing back to issue