You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pd.read_json seems not to be able to process the encoding='utf-8-sig' parameter.
Expected behavior is that it allows to work with byte streams with an utf-8 byte order mark
Easiest to wrap your bytes-mode file in a TextIOWrapper.
In [14]: file=open('sample-utf-8-sig.txt', 'rb')
In [15]: file2=io.TextIOWrapper(file, 'utf-8-sig')
In [16]: df=pd.read_json(file2, lines=True)
In [17]: dfOut[17]:
IdMItemId0273780M1001906-0011273781M1002085-0012273782M1002086-001
Want to take a look at our read_json stuff to see if we're filing to do that in pandas?
Thanks Tom, Jeff for looking into that. The use of io.TextIOWrapper makes sense.
It still feels like read_json is ignoring the encoding parameters . Wouldn't this be something that pd.read_json should do by itself, something like if 'b' in file.mode: file = io.TextIOWrapper(file, encoding)
I work with a byte stream (from Azure DataLake https://pypi.python.org/pypi/azure-datalake-store/0.0.19 which only supports byte stream) that has a UTF-8 byte order mark, and want to read it into a data frame.
pandas.read_json fails.
For comparison, pd.read_csv(file, lines=True, encoding='utf-8-sig') works fine with a similar file
Problem description
pd.read_json seems not to be able to process the encoding='utf-8-sig' parameter.
Expected behavior is that it allows to work with byte streams with an utf-8 byte order mark
sample-utf-8-sig.txt
The text was updated successfully, but these errors were encountered: