-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_excel() modifies provided types dict when accessing file with duplicate column #42508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
cf40fb3
a45535c
6c98fa3
e79e9a1
c10b931
4419146
b63aef2
f0f3022
cf27280
52c9bd5
0f78c9f
cb369bb
7bcb504
ffb5852
72d50f4
57c65e5
3df5cf3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1278,6 +1278,13 @@ def test_ignore_chartsheets_by_int(self, request, read_ext): | |
): | ||
pd.read_excel("chartsheet" + read_ext, sheet_name=1) | ||
|
||
def test_dtype_dict(self, read_ext): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please leave a link to the relevant github issue here? And maybe also make the name of this test more specific, eg in this care we about about |
||
filename = "test_common_headers" + read_ext | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you not use existing data? or simply do a round trip; i don't want to add even more files like this |
||
dtype_dict = {"a": str, "b": str, "c": str} | ||
dtype_dict_copy = dtype_dict.copy() | ||
pd.read_excel(filename, dtype=dtype_dict) | ||
assert dtype_dict == dtype_dict_copy, "dtype dict changed" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we also check that the resulting frame is as expected? (I know this is focusing on the dtypes dict, but may as well also test the reading portion here since unlikely we have great coverage for |
||
|
||
|
||
class TestExcelFileRead: | ||
@pytest.fixture(autouse=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is essentially trying to do a deepcopy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mzeitlin11 . Yes it is. I tried to use deepcopy first, but it was failing multiple tests.
I got the idea of creating a copy of kwargs from read_csv (which as you mentioned doesn't produce this unnecessary side effect). So I thought it would be a good idea to keep it consistent.
pandas/pandas/io/parsers/readers.py
Lines 566 to 586 in d60e687
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. That case is a bit different though because
kwargs
itself was being modified. Here the issue is limited to thedtypes
dict. If there's a clean way to copy (or not modify in place) the dtypes dict, I think that would be cleaner.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will work along those lines. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you run into any issues!