Skip to content

json encoding for python 2 #15715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mbochk opened this issue Mar 17, 2017 · 4 comments
Closed

json encoding for python 2 #15715

mbochk opened this issue Mar 17, 2017 · 4 comments
Labels
Compat pandas objects compatability with Numpy or Python functions IO JSON read_json, to_json, json_normalize Unicode Unicode strings

Comments

@mbochk
Copy link

mbochk commented Mar 17, 2017

Code Sample, a copy-pastable example if possible

# not working
pd.read_json(path, encoding='cp1251')

# that works
import json 
with open(path, 'r') as f:
    js = json.load(f, encoding='cp1251')
pd.DataFrame(js)

Problem description

It is not mentioned explicitly in docstring that encoding option used in py3 only.

Currently pd.read_json mostly ignores encoding= option in python2.
Function pd.common._get_handle warns about using encoding with compression, but silently continues without actually using encoding otherwise.

It looks like subtasks are split in unfavourable way to pass encoding up to json.loads call.

Expected Output

One might expect pandas use encoding, to get life easier (as pandas usually do ;) ).
Or at least properly warn that option is ignored.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 28.8.0.post20161110
Cython: 0.24.1
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: 0.7.6.None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 17, 2017

this is currently an open issue xref #13774

we have a tests but its not implemented on the writer side; the reader side should work.

can you provide an reproducible example showing this is not work. can add that as a test.

@jreback jreback added IO JSON read_json, to_json, json_normalize Unicode Unicode strings labels Mar 17, 2017
@mbochk
Copy link
Author

mbochk commented Mar 17, 2017

# path = "path_to/example.txt"

try:
    # not working
    df1 = pd.read_json(path, encoding='cp1251')
except:
    print "pd read failed"
else:
    print "pd read complete"
try:
    import json
    with open(path, 'r') as f:
        js = json.load(f, encoding='cp1251')
    df2 = pd.DataFrame(js)
    assert df2.shape == (1, 19)
except:
    print "json read failed"
else:
    print "json read complete"

example.txt

I do achive "pd read failed", "json read complete" with attached 'example.txt'.
I have to rename extension, but its should be valid json in 'cp1251'
(notepad++ says 'windows-1251', it is synonym and gives same results).

@jreback
Copy link
Contributor

jreback commented Mar 17, 2017

yep I agree. something not getting decoded properly (works on py3, but not on 2). Want to have a look?

In [1]: pd.read_json('/Users/jreback/Downloads/example.txt', encoding='cp1251')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-4b2729700154> in <module>()
----> 1 pd.read_json('/Users/jreback/Downloads/example.txt', encoding='cp1251')

/Users/jreback/miniconda3/envs/py2.7/pandas/pandas/io/json/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines)
    347         obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
    348                           keep_default_dates, numpy, precise_float,
--> 349                           date_unit).parse()
    350 
    351     if typ == 'series' or obj is None:

/Users/jreback/miniconda3/envs/py2.7/pandas/pandas/io/json/json.pyc in parse(self)
    415 
    416         else:
--> 417             self._parse_no_numpy()
    418 
    419         if self.obj is None:

/Users/jreback/miniconda3/envs/py2.7/pandas/pandas/io/json/json.pyc in _parse_no_numpy(self)
    632         if orient == "columns":
    633             self.obj = DataFrame(
--> 634                 loads(json, precise_float=self.precise_float), dtype=None)
    635         elif orient == "split":
    636             decoded = dict((str(k), v)

ValueError: Invalid octet in UTF-8 sequence when decoding 'string'

3.5

In [1]: pd.read_json('/Users/jreback/Downloads/example.txt', encoding='cp1251')
Out[1]: 
                                    ADRES AdmArea        DDOC  DMT        DREG  KAD_KV  KAD_RN  KAD_ZU       NDOC     NREG      SOOR  STRT                                      TDOC     UNOM  VLD  \
0  Бесединское шоссе, дом 17, строение 10      []  17.07.2015   17  22.07.2015       0       0       0  01-41-321  5015930  Строение    10  Распоряжение префектуры АО города Москвы  3811559  Дом   

                                         VYVAD                                            geoData  global_id  system_object_id  
0  адрес утвержден распорядительным документом  {'center': [[37.7690069572664, 55.623022198294...  163879706           3811559  

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate labels Mar 17, 2017
@jreback jreback added this to the Next Major Release milestone Mar 17, 2017
@jreback
Copy link
Contributor

jreback commented Apr 4, 2018

duplicate of #13774

@jreback jreback closed this as completed Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO JSON read_json, to_json, json_normalize Unicode Unicode strings
Projects
None yet
Development

No branches or pull requests

2 participants