-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Error in to_stata when DataFrame contains non-string column names #6622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a unicode - string issue in my first attempt. I looked in compat but didn't see a good way to safely text whether a column label was with string or unicode in Python 2 or just string in Python 3, so I added a simple one in |
I found the already provided method - |
can you add a test to assert that this warning is raised |
This has created one other problem - since the writer expects columns to have string names, I am now converting columns to have unique string names, no problem. However, if someone tries to use The solution as it now stands would be to check for this too and to convert to I can fix the date conversion issue too, but am starting to wonder if it might be more appropriate to raise an |
@bashtage all for raising when it gets ambiguous / too tricky. This is an export format that has certain requirements if its difficult to meet those then simply defer back to the user. |
My preference would be to kick any non-trivial variable name change, that is something other than Currently the conversion handles lots of other invalid names such as unicode characters, reserved word or variable names that start with an number, by either replacement with |
you could have a argument, say I am not sure how common this is, maybe best simply to kick it back to the user at this point. |
I have decided to simply consolidate the magic to 1 place which makes it seem somewhat less magical. It also means no API changes. It this passes on 3.x and 2.6 then I'll do the doc update and it should be ready to close. |
@jreback Rebased on master, added issue, improved the commit message a bit, and so hopefully ready to close. |
ideally can you put in a test for |
Made this change. |
From the 2.7 build. This is ok, but maybe you want to catch this too? (ideally the nosetests won't produce warnings). So maybe need another assert_produces? (or can simply filter around the other ones)
|
to_stata does not work correctly when used with non-string names. Since Stata requires string names, the proposed fix attempts to rename columns using the string representation of the column name used. The main method that reformats column names was refactored to handle this case. Patch includes additional fixes for detecting invalid names. Patch includes some minor documentation fixes.
Trapped that warning, so looks clean now. |
BUG: Error in to_stata when DataFrame contains non-string column names
thanks! you are getting really fast / good with these! |
closes #4558
to_stata does not work correctly when used with non-string names. Since
Stata requires string names, the proposed fix attempts to rename columns using
the string representation of the column name used. A warning is raised if
the column name is changed.