Skip to content

BUG: DataFrame.consolidate throws TypeError with bytes blocks #15482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adbull opened this issue Feb 23, 2017 · 4 comments
Closed

BUG: DataFrame.consolidate throws TypeError with bytes blocks #15482

adbull opened this issue Feb 23, 2017 · 4 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request
Milestone

Comments

@adbull
Copy link
Contributor

adbull commented Feb 23, 2017

Code Sample, a copy-pastable example if possible

x = pd.DataFrame([['a']]).astype('S1')
y = pd.concat([x]*2, 1)
y.consolidate()

Problem description

This throws the following error:

  File "bug.py", line 3, in <module>
    y.consolidate()
  File "/home/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2822, in consolidate
    cons_data = self._protect_consolidate(f)
  File "/home/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2790, in _protect_consolidate
    result = f()
  File "/home/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2821, in <lambda>
    f = lambda: self._data.consolidate()
  File "/home/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 3526, in consolidate
    bm._consolidate_inplace()
  File "/home/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 3531, in _consolidate_inplace
    self.blocks = tuple(_consolidate(self.blocks))
  File "/home/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4523, in _consolidate
    _can_consolidate=_can_consolidate)
  File "/home/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4543, in _merge_blocks
    new_values = _vstack([b.values for b in blocks], dtype)
  File "/home/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4584, in _vstack
    if dtype == _NS_DTYPE or dtype == _TD_DTYPE:
TypeError: data type "bytes8" not understood

Expected Output

No error should be thrown, y should be consolidated.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.9.8-100.fc24.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C LANG: C LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.8.0
xarray: 0.9.1
IPython: 4.2.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@adbull adbull changed the title DataFrame.consolidate throws TypeError with bytes blocks BUG: DataFrame.consolidate throws TypeError with bytes blocks Feb 23, 2017
@jreback
Copy link
Contributor

jreback commented Feb 23, 2017

I suppose. couple of things.

  • .consolidate() should not be publicly exposed (will deprecate this), i'll create an issue
  • this is a numpy bug
In [13]: np.dtype('datetime64[ns]') == 'bytes8'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-d89e7e99d99f> in <module>()
----> 1 np.dtype('datetime64[ns]') == 'bytes8'

TypeError: data type "bytes8" not understood

but we should fix this anyhow (in the code should be using the type-safe is_datetime64_dtype and is_timedelta64_dtype rather than these direct comparisons.

want to do a PR?

@jreback jreback added 2/3 Compat Bug Dtype Conversions Unexpected or buggy dtype conversions labels Feb 23, 2017
@jreback jreback added this to the 0.20.0 milestone Feb 23, 2017
@adbull
Copy link
Contributor Author

adbull commented Feb 23, 2017

  • yeah, I'm calling .consolidate() here for clarity, but the error would be thrown e.g. when repr-ing a long DataFrame of bytes
  • I don't think it's a numpy bug, as 'bytes8' is a pandas-only dtype; the numpy equivalent 'S1' works fine
  • anyway, agreed this should use safe type-checks; I think this affects various other points in pandas.core.common and pandas.core.internals also?

@jreback
Copy link
Contributor

jreback commented Feb 23, 2017

numpy/numpy#5329 I had this argument long ago with numpy. They have lots of bugs / API issues w.r.t. dtypes, but can't / won't fix them. so be it.

yes any equality checks versus dtypes either need to use is_*_dtype methods, or is_dtype_equal

a lot have been changed, but obvously not all.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2017

this is a dupe of #12857

@jreback jreback closed this as completed Mar 5, 2017
@jreback jreback added the Duplicate Report Duplicate issue or pull request label Mar 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants