You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
fromtempfileimportNamedTemporaryFileimportpandasaspdXML='''<issue> <type>BUG</type> <category>BUG</category></issue>'''.encode('utf-8')
withNamedTemporaryFile() astempfile:
tempfile.write(XML)
tempfile.flush()
df=pd.read_xml(tempfile.name, iterparse={'issue': ['type', 'category']})
assertdf.iloc[0]['category'] =='BUG'# works finedf=pd.read_xml(tempfile.name, iterparse={'issue': ['type', 'category']}, names=['type', 'cat'])
assertdf.iloc[0]['cat'] =='BUG'# cat is never set because its value is duplicated with type
Issue Description
When using the names feature to rename columns, for some reason, if any value is duplicated with a previous value, it's completely ignored.
Note this issue is about a feature that is not released yet (planned for 1.5.0). #47414
Expected Behavior
type and cat should both get set to "BUG". Even though the value is duplicated, it's a separate piece of information in the xml.
I suspect the bug is on line 345 in pandas/io/xml.py. Changing if elem_val not in row.values() and nm not in row: to if nm not in row: seems to be the fix, though I do not have time to finish writing test cases (suddenly picked up a new job) and am not entirely sure if that is correct. Sorry for holding the issue- hopefully this helps.
Thanks @haydenw2005! You are very close, line 345 should concurrently check key and value if row.get(nm) != elem_val and nm not in row and not just values as the fix for recent issue, #47343, raised by OP.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
When using the names feature to rename columns, for some reason, if any value is duplicated with a previous value, it's completely ignored.
Note this issue is about a feature that is not released yet (planned for 1.5.0). #47414
Expected Behavior
type and cat should both get set to "BUG". Even though the value is duplicated, it's a separate piece of information in the xml.
Installed Versions
d43d6e2
The text was updated successfully, but these errors were encountered: