Skip to content

BUG: AttributeError: 'BlockManager' object has no attribute 'is_mixed_type' when trying to jsonify dataframe. #39837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
yuvalmarciano opened this issue Feb 16, 2021 · 6 comments · Fixed by #40525
Labels
IO JSON read_json, to_json, json_normalize Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@yuvalmarciano
Copy link

yuvalmarciano commented Feb 16, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

csv_reader = pandas.read_csv(
             csv_file, iterator=True, chunksize=2500, sep=',', encoding=None, error_bad_lines=False
     )
for i, chunk in enumerate(csv_reader):
     rows = chunk.to_json(orient="records", lines=True)

Problem description

on pypy 7.3.3 - python 3.7.9
to_json execution raises an exception:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/pypy/site-packages/pandas/core/generic.py", line 2478, in to_json
    storage_options=storage_options,
  File "/opt/pypy/site-packages/pandas/io/json/_json.py", line 94, in to_json
    indent=indent,
  File "/opt/pypy/site-packages/pandas/io/json/_json.py", line 155, in write
    indent=self.indent,
AttributeError: 'BlockManager' object has no attribute 'is_mixed_type'

I suspect this PR: https://github.com/pandas-dev/pandas/pull/36873/files
@jbrockmendel do you mind explaining why the BlockManager no longer has the is_mixed_type property?

Expected Output

When I try to manually assign is_mixed_type to the chunk._mgr object, as dropped in the aforementioned PR (a BlockManager) it seems to be working fine:

for i, chunk in enumerate(csv_reader):
     chunk._mgr._consolidate_inplace()
     chunk._mgr.is_mixed_type = len(chunk._mgr.blocks) > 1
     print(chunk.to_json(orient="records", lines=True))

{"@timestamp":10000000,"test_1":"something","test_2":"something2"}

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7d32926
python : 3.7.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.121-linuxkit
Version : #1 SMP Tue Dec 1 17:50:32 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.2
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 20.3.4
setuptools : 53.0.0
Cython : None
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.23
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@yuvalmarciano yuvalmarciano added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 16, 2021
@jorisvandenbossche jorisvandenbossche added IO JSON read_json, to_json, json_normalize Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 16, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.2.3 milestone Feb 16, 2021
@jorisvandenbossche
Copy link
Member

@yuvalmarciano Thanks for the report!
I suppose this was just an oversight in the PR that removed it (searching for where is_mixed_type is used can easily overlook the C code for JSON, since it's being accessed there not as a normal attribute). But so should also be an easy fix to add back / use an alternative check:

static int is_simple_frame(PyObject *obj) {
PyObject *check = get_sub_attr(obj, "_mgr", "is_mixed_type");
int ret = (check == Py_False);

Now, would you be able to provide a reproducible example that triggers the error (will be needed to write a test for it). With a small dummy example, I don't get the error:

In [11]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

In [12]: df.to_json(orient="records", lines=True)
Out[12]: '{"a":1,"b":4}\n{"a":2,"b":5}\n{"a":3,"b":6}\n'

@yuvalmarciano
Copy link
Author

@jorisvandenbossche Thanks!!
I managed to reproduce the error, but please notice it's happening on pypy 7.3.3 - python 3.7.9:

csv_reader = pandas.read_csv(
             csv_file, iterator=True, chunksize=2500, sep=',', encoding=None, error_bad_lines=False
     )
for i, chunk in enumerate(csv_reader):
     rows = chunk.to_json(orient="records", lines=True)

Let me know if there's anything else missing.

@jorisvandenbossche
Copy link
Member

@yuvalmarciano with "reproducible" example, I meant a code snippet that someone else can run as well. In your example code, there is a csv file we don't have. Also, I suppose that the actual csv reading / iterator is not needed to reproduce the issue. See https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports for some more explanation.

@jorisvandenbossche
Copy link
Member

But my small example is apparently enough, if indeed ran on pypy:

$ pypy
Python 3.7.9 (7e6e2bb30ac5fbdbd443619cae28c51d5c162a02, Jan 12 2021, 06:52:08)
[PyPy 7.3.3-beta0 with GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``PyPy 2.0 released''
>>>> import pandas as pd
>>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>>> df
   a  b
0  1  4
1  2  5
2  3  6
>>>> df.to_json(orient="records", lines=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/joris/miniconda3/envs/pypy/site-packages/pandas/core/generic.py", line 2478, in to_json
    storage_options=storage_options,
  File "/home/joris/miniconda3/envs/pypy/site-packages/pandas/io/json/_json.py", line 94, in to_json
    indent=indent,
  File "/home/joris/miniconda3/envs/pypy/site-packages/pandas/io/json/_json.py", line 155, in write
    indent=self.indent,
AttributeError: 'BlockManager' object has no attribute 'is_mixed_type'

Might be that PyObject_GetAttrString error reporting is behaving differently on pypy.

@yuvalmarciano
Copy link
Author

yuvalmarciano commented Feb 16, 2021

Yeah, I meant any CSV file (mine is a minimal one with three text columns) would cause this error. Sorry for not being clear enough.
Is there anything else needed from me?

@jbrockmendel
Copy link
Member

I suppose this was just an oversight in the PR that removed it (searching for where is_mixed_type is used can easily overlook the C code for JSON, since it's being accessed there not as a normal attribute). But so should also be an easy fix to add back / use an alternative check:

Sounds right to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants