-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ERR: validate encoding on to_stata #15723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
doc-string
I would say this is technically correct, passing a unicode encoding is invalid. But I think we should simply reject these, rather than actually write an invalid format. want to do a PR to do this? (now I am not sure which encoding stata can actually support, any idea?) |
I am a it confused now. From the docs it should fail in both
generates the correct file. So, I would think it may be better to just ignore the option in I think |
cc @bashtage any ideas here? |
There is no UTF8 in Stata. Only ASCII And the simple 8 bit encoding Latin-1. |
In Stata I mean in |
@bashtage so we should then validate |
So why doesn't it generate a corrupt file in |
because its actually encoding it in PY3 with the passed in encoding (utf8), rather than the default of latin1. |
So in PY2 it is not encoded as UTF8 even when giving the option? How is it saved then? What in |
Just noticed one more thing import pandas as pd
df1 = pd.DataFrame(np.array([u'á',u'Ö']), columns=['var1'])
df1.to_stata('not-corrupt.dta', write_index=False, encoding='utf8')
df = pd.read_stata('corrupt3.dta')
df == df1 generates a usable file in both PY2 and PY3. Still, the data is wrong as seen in the example. |
not really sure it is actually encoding as utf8 internally. lots of things in py2 are wonky. it probably happens to work. |
in any event. seems you have a bunch of tests cases! it seems easy enough to simply validate the encoding that is passed and raise if its not valid. |
Indeed. I guess my example shows that PY2 is not encoding at all. |
Ensure StataReader and StataWriter have the correct encoding. Standardized default encoding to 'latin-1' closes pandas-dev#15723
Ensure StataReader and StataWriter have the correct encoding. Standardized default encoding to 'latin-1' closes pandas-dev#15723
Ensure StataReader and StataWriter have the correct encoding. Standardized default encoding to 'latin-1' closes pandas-dev#15723 Author: Kevin Sheppard <[email protected]> Closes pandas-dev#15768 from bashtage/limit-stata-encoding and squashes the following commits: 8278be7 [Kevin Sheppard] BUG: Fix limited key range on 32-bit platofrms 2f02697 [Kevin Sheppard] BUG: Enforce correct encoding in stata
I would like to report that with the current encoding options it is not possible for Chinese users to interact with Pandas due to their Stata files containing Chinese characters. This has been reported by the user @lianyonghui123 on my |
There is no Unicode support in the pandas stata functions. They only
support the fixed encoding latin1. It would be a relatively large rewrite
to add Unicode support.
…On Tue, Sep 5, 2017, 13:14 Ties de Kok ***@***.***> wrote:
I would like to report that with the current encoding options it is not
possible for Chinese users to interact with Pandas due to their Stata files
containing Chinese characters. This has been reported by the user
@lianyonghui123 <https://github.com/lianyonghui123> on my ipystata
package: TiesdeKok/ipystata#28
<TiesdeKok/ipystata#28>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15723 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFU5RSCFIyLpKXRO_sa8CifMaN_QpvHCks5sfTsggaJpZM4Mg5rV>
.
|
It seems
pandas
inpython3.5
causes issues due to encoding. For example the following generates a corrupt output filewhile
generates a correct file. I imagine this may be due to use of encoding and the difference in the treatment between
python 2
andpython 3
, which breaks compatibility of scripts acrosspython
versions. I guess it would be nice if it does not take this option into account onpython 3
, unless the error is caused by something else.The text was updated successfully, but these errors were encountered: