Skip to content

unicode encode error on index joins #3875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpcloud opened this issue Jun 13, 2013 · 17 comments · Fixed by #3900
Closed

unicode encode error on index joins #3875

cpcloud opened this issue Jun 13, 2013 · 17 comments · Fixed by #3900
Assignees
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves Unicode Unicode strings

Comments

@cpcloud
Copy link
Member

cpcloud commented Jun 13, 2013

The following throws a UnicodeDecodeError.

In [8]: df = mkdf(10, 10, data_gen_f=lambda r,c:randn(), r_idx_type='dt', c_idx_type='u')

In [9]: s = Series(randn(5,), df.index[:5])

In [10]: s.index.join(df.columns, how='outer').join(s.index)
@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

admittedly this is probably not the most common use case having mixed unicode and datetime indices

@jreback
Copy link
Contributor

jreback commented Jun 13, 2013

problem is you are mixing strings and Unicode
so that error is legit

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

where are the strings coming in? datetimes are their own objecct...should raise a typeerror according to tslib.pyx

@jreback
Copy link
Contributor

jreback commented Jun 13, 2013

when u join different dtypes they are casted to object
which makes them strings from the datetimes (Unicode stays that way though)

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

they are timestamp objects though...

@jreback
Copy link
Contributor

jreback commented Jun 13, 2013

I think the index joining ultimately hashes strings/ Unicode or something

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

hm i will change to error reporting then and see what i can do about it.

@jreback
Copy link
Contributor

jreback commented Jun 13, 2013

isn't this related to #3878 (though resolution is to raise I think)

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

sort of but slightly different problem

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

the other is in python space, this is raising all the way down in the base class of Timestamp

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

@jreback is there a sane way to debug cython? i feel like i'm using php right now

@jreback
Copy link
Contributor

jreback commented Jun 13, 2013

print statements! or gdb I think

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

nevermind found the docs

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

this is insane i'm blowing the stack by calling isinstance(other, unicode) in _Timestamp...what?!

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

there's a cycle in the type graph somewhere...

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

and it's not object

@cpcloud
Copy link
Member Author

cpcloud commented Jun 13, 2013

it's somewhere around periodindex construction...tracking it down now..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves Unicode Unicode strings
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants