Skip to content

Allow broadcasting vertically a 1-dim input to pd.DataFrame(), - and document #20837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toobaz opened this issue Apr 27, 2018 · 0 comments
Open
Labels
DataFrame DataFrame data structure Enhancement

Comments

@toobaz
Copy link
Member

toobaz commented Apr 27, 2018

Code Sample, a copy-pastable example if possible

In [2]: pd.DataFrame([1, 2], columns=range(3))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4844                 blocks = [make_block(values=blocks[0],
-> 4845                                      placement=slice(0, len(axes[0])))]
   4846 

/home/nobackup/repo/pandas/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   3192 
-> 3193     return klass(values, ndim=ndim, placement=placement)
   3194 

/home/nobackup/repo/pandas/pandas/core/internals.py in __init__(self, values, placement, ndim)
    124                 'Wrong number of items passed {val}, placement implies '
--> 125                 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
    126 

ValueError: Wrong number of items passed 1, placement implies 3

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-2-4ad51ebcfae4> in <module>()
----> 1 pd.DataFrame([1, 2], columns=range(3))

/home/nobackup/repo/pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    403                 else:
    404                     mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 405                                              copy=copy)
    406             else:
    407                 mgr = self._init_dict({}, index, columns, dtype=dtype)

/home/nobackup/repo/pandas/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
    536             values = maybe_infer_to_datetimelike(values)
    537 
--> 538         return create_block_manager_from_blocks([values], [columns, index])
    539 
    540     @property

/home/nobackup/repo/pandas/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4852         blocks = [getattr(b, 'values', b) for b in blocks]
   4853         tot_items = sum(b.shape[0] for b in blocks)
-> 4854         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   4855 
   4856 

/home/nobackup/repo/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4829         raise ValueError("Empty data passed with indices specified.")
   4830     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4831         passed, implied))
   4832 
   4833 

ValueError: Shape of passed values is (1, 2), indices imply (3, 2)

Problem description

(From #18626 (comment) )

#18819 (now fixed) disabled a call such as pd.Series([1], index=range(3)) - the same result can be obtained with pd.Series(1, index=range(3), which is less ambiguous.

In principle, the same reasoning should lead us to disable pd.DataFrame([[1, 2]], index=range(3)). But that can't be replaced as comfortably, because pd.DataFrame([1, 2], index=range(3)) aligns vertically - and this couldn't be otherwise, as 1d objects are treated as Series, and Series in DataFrames are mainly columns, not rows. Moreover, this is probably quite used in existing code, and also in tests:

expected = DataFrame([self.frame.mean()], index=self.frame.index)

df0 = pd.DataFrame([[1, 2]], index=idx0)

df = DataFrame([[10, 11]], index=midx)

So I think the best way to proceed is:

  • allow 1d objects to be broadcasted horizontally (not just aligned vertically)
  • clearly document the above, and the fact that 2d objects of length 1 are broadcasted vertically instead

Expected Output

In [3]: pd.DataFrame([[1]*3, [2]*3], columns=range(3))
Out[3]: 
   0  1  2
0  1  1  1
1  2  2  2

Output of pd.show_versions()

In [3]: pd.show_versions()

INSTALLED VERSIONS

commit: 7ec74e5
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.23.0.dev0+798.g7ec74e5f7
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure Enhancement
Projects
None yet
Development

No branches or pull requests

2 participants