Skip to content

Setting DataFrame of Python objects and MultiIndex columns wth single-element NDFrame inserts list #14592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Nov 5, 2016 · 2 comments · Fixed by #31161
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@toobaz
Copy link
Member

toobaz commented Nov 5, 2016

A small, complete example of the issue

In [2]: t = pd.DataFrame('a', index=range(2),
                              columns=pd.MultiIndex.from_product([range(2), range(2)]))

In [3]: t.loc[0, [(0,1)]] = t.loc[0, [(0,1)]]

In [4]: t
Out[4]: 
   0       1   
   0    1  0  1
0  a  [a]  a  a
1  a    a  a  a

The same happens when providing, rather than a list of indices, a mask with only one True value.

The above line is an useless example, but this is a problem in in-place operations.

I suspect the fix should not be very complicated, given that everything works smoothly if we have numbers rather than Python objects (the bug instead arises both if we assign a cell of a numbers-only DF to a cell of a DF of objects, and if we do the opposite)

Expected Output

Just the original t.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 7a2bcb6
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.7.0-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8

pandas: 0.19.0+67.g7a2bcb6.dirty
nose: 1.3.7
pip: 8.1.2
setuptools: 28.0.0
Cython: 0.23.4
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.8.0.dev0+f80669e
xarray: None
IPython: 5.1.0.dev
sphinx: 1.4.8
patsy: 0.3.0-dev
dateutil: 2.5.3
pytz: 2015.7
blosc: None
bottleneck: 1.2.0dev
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: 1.5.2
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: 0.2.1

@toobaz
Copy link
Member Author

toobaz commented Nov 5, 2016

Actually, I think the "numbers only" case has problems too. Take

t = pd.DataFrame(3, index=range(2),columns=pd.MultiIndex.from_product([range(2), range(2)]))

Then the following works fine:

In [3]: t.loc[0, [(0,1), (1,1)]] = [5,6]

while...

In [4]: t.loc[0, [(0,0)]] = [7]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-57ac0089af6c> in <module>()
----> 1 t.loc[0, [(0,0)]] = [7]

/home/nobackup/repo/pandas/pandas/core/indexing.py in __setitem__(self, key, value)
    138             key = com._apply_if_callable(key, self.obj)
    139         indexer = self._get_setitem_indexer(key)
--> 140         self._setitem_with_indexer(indexer, value)
    141 
    142     def _has_valid_type(self, k, axis):

/home/nobackup/repo/pandas/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    530                 # we have an equal len list/ndarray
    531                 elif can_do_equal_len():
--> 532                     setter(labels[0], value)
    533 
    534                 # per label values

/home/nobackup/repo/pandas/pandas/core/indexing.py in setter(item, v)
    470                     s._consolidate_inplace()
    471                     s = s.copy()
--> 472                     s._data = s._data.setitem(indexer=pi, value=v)
    473                     s._maybe_update_cacher(clear=True)
    474 

/home/nobackup/repo/pandas/pandas/core/internals.py in setitem(self, **kwargs)
   3167 
   3168     def setitem(self, **kwargs):
-> 3169         return self.apply('setitem', **kwargs)
   3170 
   3171     def putmask(self, **kwargs):

/home/nobackup/repo/pandas/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3055 
   3056             kwargs['mgr'] = self
-> 3057             applied = getattr(b, f)(**kwargs)
   3058             result_blocks = _extend_blocks(applied, result_blocks)
   3059 

/home/nobackup/repo/pandas/pandas/core/internals.py in setitem(self, indexer, value, mgr)
    727             # GH 6043
    728             elif _is_scalar_indexer(indexer):
--> 729                 values[indexer] = value
    730 
    731             # if we are an exact match (ex-broadcasting),

ValueError: setting an array element with a sequence.

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Feb 8, 2018
@jreback jreback added this to the Next Major Release milestone Feb 8, 2018
@mroeschke
Copy link
Member

This looks to give the correct result on master. Could use a test.

In [236]: In [2]: t = pd.DataFrame('a', index=range(2),
     ...:                               columns=pd.MultiIndex.from_product([range(2), range(2)]))

In [237]: t
Out[237]:
   0     1
   0  1  0  1
0  a  a  a  a
1  a  a  a  a

In [238]: In [3]: t.loc[0, [(0,1)]] = t.loc[0, [(0,1)]]

In [239]: t
Out[239]:
   0     1
   0  1  0  1
0  a  a  a  a
1  a  a  a  a

In [240]: pd.__version__
Out[240]: '0.26.0.dev0+593.g9d45934af'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Oct 21, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants