Skip to content

ERR: concat of non-unique join axes should have better error #13084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gregsifr opened this issue May 4, 2016 · 6 comments
Closed

ERR: concat of non-unique join axes should have better error #13084

gregsifr opened this issue May 4, 2016 · 6 comments
Labels
Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@gregsifr
Copy link

gregsifr commented May 4, 2016

You get a confusing error message when trying to concat on non-unique (but also non-exactly-equal) indices. Small example:

In [57]: df1 = pd.DataFrame({'col1': [1, 2, 3]}, index=[0, 0, 1])
    ...: df2 = pd.DataFrame({'col2': [1, 2, 3]}, index=[0, 1, 2])

In [59]: pd.concat([df1, df2], axis=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-59-756087e4d415> in <module>()
----> 1 pd.concat([df1, df2], axis=1)

/home/joris/scipy/pandas/pandas/tools/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    205                        verify_integrity=verify_integrity,
    206                        copy=copy)
--> 207     return op.get_result()
    208 
    209 

/home/joris/scipy/pandas/pandas/tools/concat.py in get_result(self)
    405             new_data = concatenate_block_managers(
    406                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 407                 copy=self.copy)
    408             if not self.copy:
    409                 new_data._consolidate_inplace()

/home/joris/scipy/pandas/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4849         placement=placement) for placement, join_units in concat_plan]
   4850 
-> 4851     return BlockManager(blocks, axes)
   4852 
   4853 

/home/joris/scipy/pandas/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2784 
   2785         if do_integrity_check:
-> 2786             self._verify_integrity()
   2787 
   2788         self._consolidate_check()

/home/joris/scipy/pandas/pandas/core/internals.py in _verify_integrity(self)
   2994         for block in self.blocks:
   2995             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 2996                 construction_error(tot_items, block.shape[1:], self.axes)
   2997         if len(self.items) != tot_items:
   2998             raise AssertionError('Number of manager items must equal union of '

/home/joris/scipy/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4258         raise ValueError("Empty data passed with indices specified.")
   4259     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4260         passed, implied))
   4261 
   4262 

ValueError: Shape of passed values is (2, 6), indices imply (2, 4)



Original reported issue by @gregsifr :

I am working with a large dataframe of customers which I was unable to concat. After spending some time I narrowed the problem area down to the below (pickled) dataframes and code.

When trying to concat the dataframes using the following code the error shown below is returned:

import pandas as pd
import pickle

df1 = pickle.loads('ccopy_reg\n_reconstructor\np1\n(cpandas.core.frame\nDataFrame\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS\'_metadata\'\np6\n(lp7\nsS\'_typ\'\np8\nS\'dataframe\'\np9\nsS\'_data\'\np10\ng1\n(cpandas.core.internals\nBlockManager\np11\ng3\nNtRp12\n((lp13\ncpandas.core.index\n_new_Index\np14\n(cpandas.core.index\nMultiIndex\np15\n(dp16\nS\'labels\'\np17\n(lp18\ncnumpy.core.multiarray\n_reconstruct\np19\n(cpandas.core.base\nFrozenNDArray\np20\n(I0\ntS\'b\'\ntRp21\n(I1\n(L2L\ntcnumpy\ndtype\np22\n(S\'i1\'\nI0\nI1\ntRp23\n(I3\nS\'|\'\nNNNI-1\nI-1\nI0\ntbI00\nS\'\\x00\\x00\'\ntbag19\n(g20\n(I0\ntS\'b\'\ntRp24\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x01\'\ntbasS\'names\'\np25\n(lp26\nNaNasS\'levels\'\np27\n(lp28\ng14\n(cpandas.core.index\nIndex\np29\n(dp30\nS\'data\'\np31\ng19\n(cnumpy\nndarray\np32\n(I0\ntS\'b\'\ntRp33\n(I1\n(L1L\ntg22\n(S\'O8\'\nI0\nI1\ntRp34\n(I3\nS\'|\'\nNNNI-1\nI-1\nI63\ntbI00\n(lp35\nVCUSTOMER_A\np36\natbsS\'name\'\np37\nNstRp38\nag14\n(g29\n(dp39\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp40\n(I1\n(L2L\ntg34\nI00\n(lp41\nVVISIT_DT\np42\naVPURCHASE\np43\natbsg37\nNstRp44\nasS\'sortorder\'\np45\nNstRp46\nacpandas.tseries.index\n_new_DatetimeIndex\np47\n(cpandas.tseries.index\nDatetimeIndex\np48\n(dp49\nS\'tz\'\np50\nNsS\'freq\'\np51\nNsg31\ng19\n(g32\n(I0\ntS\'b\'\ntRp52\n(I1\n(L22L\ntg22\n(S\'M8\'\nI0\nI1\ntRp53\n(I4\nS\'<\'\nNNNI-1\nI-1\nI0\n((d(S\'ns\'\nI1\nI1\nI1\ntttbI00\nS\'\\x00\\x00\\x1c\\xca\\xf9\\xceO\\x10\\x00\\x00k[\\x8e\\x1dP\\x10\\x00\\x00\\xba\\xec"lP\\x10\\x00\\x00\\t~\\xb7\\xbaP\\x10\\x00\\x00X\\x0fL\\tQ\\x10\\x00\\x00\\xa7\\xa0\\xe0WQ\\x10\\x00\\x00\\x94T\\x9eCR\\x10\\x00\\x00\\xe3\\xe52\\x92R\\x10\\x00\\x002w\\xc7\\xe0R\\x10\\x00\\x00\\x81\\x08\\\\/S\\x10\\x00\\x00\\xd0\\x99\\xf0}S\\x10\\x00\\x00\\xbdM\\xaeiT\\x10\\x00\\x00\\x0c\\xdfB\\xb8T\\x10\\x00\\x00[p\\xd7\\x06U\\x10\\x00\\x00\\xaa\\x01lUU\\x10\\x00\\x00\\xf9\\x92\\x00\\xa4U\\x10\\x00\\x00\\xe6F\\xbe\\x8fV\\x10\\x00\\x005\\xd8R\\xdeV\\x10\\x00\\x00\\x84i\\xe7,W\\x10\\x00\\x00\\xd3\\xfa{{W\\x10\\x00\\x00"\\x8c\\x10\\xcaW\\x10\\x00\\x00\\x0f@\\xce\\xb5X\\x10\'\ntbsg37\nS\'date\'\np54\nstRp55\na(lp56\ng19\n(g32\n(I0\ntS\'b\'\ntRp57\n(I1\n(L2L\nL22L\ntg22\n(S\'f8\'\nI0\nI1\ntRp58\n(I3\nS\'<\'\nNNNI-1\nI-1\nI0\ntbI00\nS\'\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\'\ntba(lp59\ng14\n(g15\n(dp60\ng17\n(lp61\ng19\n(g20\n(I0\ntS\'b\'\ntRp62\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x00\'\ntbag19\n(g20\n(I0\ntS\'b\'\ntRp63\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x01\'\ntbasg25\n(lp64\nNaNasg27\n(lp65\ng14\n(g29\n(dp66\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp67\n(I1\n(L1L\ntg34\nI00\n(lp68\ng36\natbsg37\nNstRp69\nag14\n(g29\n(dp70\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp71\n(I1\n(L2L\ntg34\nI00\n(lp72\ng42\nag43\natbsg37\nNstRp73\nasg45\nNstRp74\na(dp75\nS\'0.14.1\'\np76\n(dp77\nS\'axes\'\np78\ng13\nsS\'blocks\'\np79\n(lp80\n(dp81\nS\'mgr_locs\'\np82\nc__builtin__\nslice\np83\n(I0\nI2\nL1L\ntRp84\nsS\'values\'\np85\ng57\nsasstbsb.')
df2 = pickle.loads('ccopy_reg\n_reconstructor\np1\n(cpandas.core.frame\nDataFrame\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS\'_metadata\'\np6\n(lp7\nsS\'_typ\'\np8\nS\'dataframe\'\np9\nsS\'_data\'\np10\ng1\n(cpandas.core.internals\nBlockManager\np11\ng3\nNtRp12\n((lp13\ncpandas.core.index\n_new_Index\np14\n(cpandas.core.index\nMultiIndex\np15\n(dp16\nS\'labels\'\np17\n(lp18\ncnumpy.core.multiarray\n_reconstruct\np19\n(cpandas.core.base\nFrozenNDArray\np20\n(I0\ntS\'b\'\ntRp21\n(I1\n(L2L\ntcnumpy\ndtype\np22\n(S\'i1\'\nI0\nI1\ntRp23\n(I3\nS\'|\'\nNNNI-1\nI-1\nI0\ntbI00\nS\'\\x00\\x00\'\ntbag19\n(g20\n(I0\ntS\'b\'\ntRp24\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x01\'\ntbasS\'names\'\np25\n(lp26\nNaNasS\'levels\'\np27\n(lp28\ng14\n(cpandas.core.index\nIndex\np29\n(dp30\nS\'data\'\np31\ng19\n(cnumpy\nndarray\np32\n(I0\ntS\'b\'\ntRp33\n(I1\n(L1L\ntg22\n(S\'O8\'\nI0\nI1\ntRp34\n(I3\nS\'|\'\nNNNI-1\nI-1\nI63\ntbI00\n(lp35\nVCUSTOMER_B\np36\natbsS\'name\'\np37\nNstRp38\nag14\n(g29\n(dp39\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp40\n(I1\n(L2L\ntg34\nI00\n(lp41\nVVISIT_DT\np42\naVPURCHASE\np43\natbsg37\nNstRp44\nasS\'sortorder\'\np45\nNstRp46\nacpandas.tseries.index\n_new_DatetimeIndex\np47\n(cpandas.tseries.index\nDatetimeIndex\np48\n(dp49\nS\'tz\'\np50\nNsS\'freq\'\np51\nNsg31\ng19\n(g32\n(I0\ntS\'b\'\ntRp52\n(I1\n(L24L\ntg22\n(S\'M8\'\nI0\nI1\ntRp53\n(I4\nS\'<\'\nNNNI-1\nI-1\nI0\n((d(S\'ns\'\nI1\nI1\nI1\ntttbI00\nS\'\\x00\\x00k[\\x8e\\x1dP\\x10\\x00\\x00\\xba\\xec"lP\\x10\\x00\\x00\\t~\\xb7\\xbaP\\x10\\x00\\x00X\\x0fL\\tQ\\x10\\x00\\x00\\xa7\\xa0\\xe0WQ\\x10\\x00\\x00\\x94T\\x9eCR\\x10\\x00\\x00\\xe3\\xe52\\x92R\\x10\\x00\\x002w\\xc7\\xe0R\\x10\\x00\\x00\\x81\\x08\\\\/S\\x10\\x00\\x00\\xd0\\x99\\xf0}S\\x10\\x00\\x00\\xbdM\\xaeiT\\x10\\x00\\x00\\x0c\\xdfB\\xb8T\\x10\\x00\\x00[p\\xd7\\x06U\\x10\\x00\\x00\\xaa\\x01lUU\\x10\\x00\\x00\\xf9\\x92\\x00\\xa4U\\x10\\x00\\x00\\xe6F\\xbe\\x8fV\\x10\\x00\\x005\\xd8R\\xdeV\\x10\\x00\\x00\\x84i\\xe7,W\\x10\\x00\\x00\\xd3\\xfa{{W\\x10\\x00\\x00"\\x8c\\x10\\xcaW\\x10\\x00\\x00\\xc0\\xae9gX\\x10\\x00\\x00\\xc0\\xae9gX\\x10\\x00\\x00\\xc0\\xae9gX\\x10\\x00\\x00\\x0f@\\xce\\xb5X\\x10\'\ntbsg37\nS\'date\'\np54\nstRp55\na(lp56\ng19\n(g32\n(I0\ntS\'b\'\ntRp57\n(I1\n(L2L\nL24L\ntg22\n(S\'f8\'\nI0\nI1\ntRp58\n(I3\nS\'<\'\nNNNI-1\nI-1\nI0\ntbI00\nS\'\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\xf0\\x0c$sA\\x00\\x00\\x00\\xf0\\x0c$sA\\x00\\x00\\x00\\xf0\\x0c$sA\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\\x00\\x00\\x00\\x00\\x00\\x00\\xf8\\x7f\'\ntba(lp59\ng14\n(g15\n(dp60\ng17\n(lp61\ng19\n(g20\n(I0\ntS\'b\'\ntRp62\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x00\'\ntbag19\n(g20\n(I0\ntS\'b\'\ntRp63\n(I1\n(L2L\ntg23\nI00\nS\'\\x00\\x01\'\ntbasg25\n(lp64\nNaNasg27\n(lp65\ng14\n(g29\n(dp66\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp67\n(I1\n(L1L\ntg34\nI00\n(lp68\ng36\natbsg37\nNstRp69\nag14\n(g29\n(dp70\ng31\ng19\n(g32\n(I0\ntS\'b\'\ntRp71\n(I1\n(L2L\ntg34\nI00\n(lp72\ng42\nag43\natbsg37\nNstRp73\nasg45\nNstRp74\na(dp75\nS\'0.14.1\'\np76\n(dp77\nS\'axes\'\np78\ng13\nsS\'blocks\'\np79\n(lp80\n(dp81\nS\'mgr_locs\'\np82\nc__builtin__\nslice\np83\n(I0\nI2\nL1L\ntRp84\nsS\'values\'\np85\ng57\nsasstbsb.')

customers, tables = ['CUSTOMER_A', 'CUSTOMER_B'], [df1.iloc[:], df2.iloc[:]]
tables = pd.concat(tables, keys=customers, axis=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-8096f8962dec> in <module>()
      6 
      7 customers, tables = ['CUSTOMER_A', 'CUSTOMER_B'], [df1.iloc[:], df2.iloc[:]]
----> 8 tables = pd.concat(tables, keys=customers, axis=1)

/home/code/anaconda2/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    833                        verify_integrity=verify_integrity,
    834                        copy=copy)
--> 835     return op.get_result()
    836 
    837 

/home/code/anaconda2/lib/python2.7/site-packages/pandas/tools/merge.pyc in get_result(self)
   1023             new_data = concatenate_block_managers(
   1024                 mgrs_indexers, self.new_axes,
-> 1025                 concat_axis=self.axis, copy=self.copy)
   1026             if not self.copy:
   1027                 new_data._consolidate_inplace()

/home/code/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4474               for placement, join_units in concat_plan]
   4475 
-> 4476     return BlockManager(blocks, axes)
   4477 
   4478 

/home/code/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2535 
   2536         if do_integrity_check:
-> 2537             self._verify_integrity()
   2538 
   2539         self._consolidate_check()

/home/code/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in _verify_integrity(self)
   2745         for block in self.blocks:
   2746             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 2747                 construction_error(tot_items, block.shape[1:], self.axes)
   2748         if len(self.items) != tot_items:
   2749             raise AssertionError('Number of manager items must equal union of '

/home/code/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in construction_error(tot_items, block_shape, axes, e)
   3897         raise ValueError("Empty data passed with indices specified.")
   3898     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3899         passed, implied))
   3900 
   3901 

ValueError: Shape of passed values is (4, 31), indices imply (4, 25)

However if the dataframes are sliced e.g. [:10] OR [10:], the concat works:

customers, tables = ['CUSTOMER_A', 'CUSTOMER_B'], [df1.iloc[:20], df2.iloc[:20]]
tables = pd.concat(tables, keys=customers, axis=1)
tables

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.6.7
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.39.0
@jreback
Copy link
Contributor

jreback commented May 4, 2016

show df.info() for the input frames. a copy-pastable example would be helpful.

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Usage Question labels May 4, 2016
@gregsifr
Copy link
Author

gregsifr commented May 4, 2016

The example provided is copy-pastable (it includes the pickled dataframes)

Below is the output from df.info():

df1.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 22 entries, 2007-04-01 to 2007-04-30
Data columns (total 2 columns):
(CUSTOMER_A, VISIT_DT)    0 non-null float64
(CUSTOMER_A, PURCHASE)    0 non-null float64
dtypes: float64(2)
memory usage: 528.0 bytes

df2.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 24 entries, 2007-04-02 to 2007-04-30
Data columns (total 2 columns):
(CUSTOMER_B, VISIT_DT)    3 non-null float64
(CUSTOMER_B, PURCHASE)    0 non-null float64
dtypes: float64(2)
memory usage: 576.0 bytes

@jreback
Copy link
Contributor

jreback commented May 4, 2016

You can't concat non-unique indexes. It doesn't make sense as these are essentially gluing blocks.

In [37]: pd.merge(df1,df2,left_index=True,right_index=True,how='outer')
Out[37]: 
           CUSTOMER_A           CUSTOMER_B         
             VISIT_DT PURCHASE    VISIT_DT PURCHASE
date                                               
2007-04-01        NaN      NaN         NaN      NaN
2007-04-02        NaN      NaN         NaN      NaN
2007-04-03        NaN      NaN         NaN      NaN
2007-04-04        NaN      NaN         NaN      NaN
2007-04-05        NaN      NaN         NaN      NaN
2007-04-06        NaN      NaN         NaN      NaN
2007-04-09        NaN      NaN         NaN      NaN
2007-04-10        NaN      NaN         NaN      NaN
2007-04-11        NaN      NaN         NaN      NaN
2007-04-12        NaN      NaN         NaN      NaN
2007-04-13        NaN      NaN         NaN      NaN
2007-04-16        NaN      NaN         NaN      NaN
2007-04-17        NaN      NaN         NaN      NaN
2007-04-18        NaN      NaN         NaN      NaN
2007-04-19        NaN      NaN         NaN      NaN
2007-04-20        NaN      NaN         NaN      NaN
2007-04-23        NaN      NaN         NaN      NaN
2007-04-24        NaN      NaN         NaN      NaN
2007-04-25        NaN      NaN         NaN      NaN
2007-04-26        NaN      NaN         NaN      NaN
2007-04-27        NaN      NaN         NaN      NaN
2007-04-29        NaN      NaN  20070607.0      NaN
2007-04-29        NaN      NaN  20070607.0      NaN
2007-04-29        NaN      NaN  20070607.0      NaN
2007-04-30        NaN      NaN         NaN      NaN

I think we could give a better error message here. Want to take a crack at it?

@jreback jreback added Error Reporting Incorrect or improved errors from pandas Difficulty Intermediate and removed Usage Question labels May 4, 2016
@jreback jreback added this to the 0.18.2 milestone May 4, 2016
@jreback jreback changed the title pd.concat results in ValueError: Shape of passed values is (4, 31), indices imply (4, 25) ERR: concat of non-unique join axes should have better error May 4, 2016
@gregsifr
Copy link
Author

gregsifr commented May 4, 2016

I'd be happy to take a crack at it.

@jreback
Copy link
Contributor

jreback commented May 4, 2016

gr8!

@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 21, 2016
@TomAugspurger
Copy link
Contributor

Duplicate of #6963.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants