-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
astype(unicode) does not work as expected #7758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you can do: what are you doing with this? pandas keeps all string-likes as |
I have a method that detects whether a column should be considered as a category based on its type and cardinality. Columns that are considered as categories are casted into unicode object. I know how to workaround this issue, but I thought I should report what I thought was a bug. Let me know if you need more information. |
ok, this could be more informative, but its fundamentally an issue. This would return a numpy array (and NOT a series, and that would simply recast, and lose the cast to unicode). I think that is a bit odd though. What do you think should happen? |
Ideally, I would have either wanted the cast to work as python unicode() function.
Does that make sense in Pandas? |
@fulmicoton Why do you need to convert to unicode? Do you have things that are convertible to unicode but aren't already converted? Can you give a more detailed example that illustrates why you need to do this. I think I'm just missing something. |
This could all be done I think (may need to allow an here's a picture of the internal structure:
|
@fulmicoton interested in doing a pull-request for this? |
@cpcloud Just having a piece of code trying to coerce a bunch of columns marked as categorical into unicode strings. Some of them are already unicode, some of them have been detected as int but have such a low cardinality I want to handle them as categories. |
@jreback I'll take a look at that tonight. |
@fulmicoton you might wasn to explore this as well (just merged in): http://pandas-docs.github.io/pandas-docs-travis/categorical.html. Prob not a lot of tests for unicode (but it should work) |
Just calls numpy.unicode on all the values. Seems to work alright on python2 and python3.
Here is the pull requests. I didn't have to use infer_dtype, so I hope I didn't do anything wrong. |
astype unicode seems to call str, so that the following code throws
raises :
The text was updated successfully, but these errors were encountered: