Skip to content

DataFrame.loc[n] = dict(..) fails with some type combinations #16309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bmcfee opened this issue May 9, 2017 · 5 comments · Fixed by #41607
Closed

DataFrame.loc[n] = dict(..) fails with some type combinations #16309

bmcfee opened this issue May 9, 2017 · 5 comments · Fixed by #41607
Labels
Dtype Conversions Unexpected or buggy dtype conversions good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@bmcfee
Copy link
Contributor

bmcfee commented May 9, 2017

Code Sample, a copy-pastable example if possible

This one fails:

# Your code here
In [9]: d = pd.DataFrame(columns=['time', 'value'])                    
In [9]: d.loc[0] = dict(time=pd.to_timedelta(5, unit='s'), value='foo')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-9-b557eb950858> in <module>()
----> 1 d.loc[0] = dict(time=pd.to_timedelta(5, unit='s'), value='foo')

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    177             key = com._apply_if_callable(key, self.obj)
    178         indexer = self._get_setitem_indexer(key)
--> 179         self._setitem_with_indexer(indexer, value)
    180 
    181     def _has_valid_type(self, k, axis):

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    423                                        name=indexer)
    424 
--> 425                     self.obj._data = self.obj.append(value)._data
    426                     self.obj._maybe_update_cacher(clear=True)
    427                     return self.obj

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity)
   4628             other = DataFrame(other.values.reshape((1, len(other))),
   4629                               index=index,
-> 4630                               columns=combined_columns)
   4631             other = other._convert(datetime=True, timedelta=True)
   4632             if not self.columns.equals(combined_columns):

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    304             else:
    305                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 306                                          copy=copy)
    307         elif isinstance(data, (list, types.GeneratorType)):
    308             if isinstance(data, types.GeneratorType):

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
    481             values = maybe_infer_to_datetimelike(values)
    482 
--> 483         return create_block_manager_from_blocks([values], [columns, index])
    484 
    485     @property

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4294                                      placement=slice(0, len(axes[0])))]
   4295 
-> 4296         mgr = BlockManager(blocks, axes)
   4297         mgr._consolidate_inplace()
   4298         return mgr

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2790                     raise AssertionError('Number of Block dimensions (%d) '
   2791                                          'must equal number of axes (%d)' %
-> 2792                                          (block.ndim, self.ndim))
   2793 
   2794         if do_integrity_check:

AssertionError: Number of Block dimensions (1) must equal number of axes (2)

But this one succeeds:

In [11]: d.loc[0] = dict(time=pd.to_timedelta(5, unit='s'), value=5)

In [12]: d
Out[12]: 
      time value
0 00:00:05     5

This one also succeeds:

In [13]: d = pd.DataFrame(columns=['time', 'value'])

In [14]: d.loc[0] = dict(time=3, value='foo')

In [15]: d
Out[15]: 
  time value
0    3   foo

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

The current behavior is a problem because it is inconsistent, and depends on the type of data provided. Mixing timedelta with str fails, but timedelta with int works, as does int with str.

I believe this is related to aggressive type inference previously noted in #13829.

Expected Output

Not crashing.

Output of pd.show_versions()

In [16]: pd.show_versions() /home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/xarray/core/formatting.py:16: FutureWarning: The pandas.tslib module is deprecated and will be removed in a future version. from pandas.tslib import OutOfBoundsDatetime

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-77-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: 0.9.5
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: None
numexpr: 2.6.0
feather: None
matplotlib: 2.0.1
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: 0.1.0
pandas_gbq: None
pandas_datareader: None

@sinhrks sinhrks added the Indexing Related to indexing on series/frames, not to indexes themselves label May 11, 2017
@sinhrks
Copy link
Member

sinhrks commented May 11, 2017

Thanks for the report. Yeah it looks like #13829. I once prepared a draft fix, and hopefully work on it again.

@sinhrks sinhrks added the Dtype Conversions Unexpected or buggy dtype conversions label May 11, 2017
@jorisvandenbossche
Copy link
Member

@bmcfee I cannot reproduce this anymore on master or with 0.20.2 (but fails on 0.20.1), so it seems this somehow got fixed.
Can you see if you can confirm it is fixed?

And if so, would also be nice to add some tests to keep it working.

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Jun 19, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.20.3 milestone Jun 19, 2017
@bmcfee
Copy link
Contributor Author

bmcfee commented Jun 19, 2017

Can you see if you can confirm it is fixed?

Confirmed that the above example now works on 0.20.2 (conda distribution).

However, I now get a different error if I try to update an existing record, even with identical contents:

In [1]: import pandas as pd

In [2]: d = pd.DataFrame(columns=['time', 'value'])

In [3]: d.loc[1] = dict(time=pd.to_timedelta(6, unit='s'), value='foo')

In [4]: d.loc[1] = dict(time=pd.to_timedelta(6, unit='s'), value='foo')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-20345bf5ca35> in <module>()
----> 1 d.loc[1] = dict(time=pd.to_timedelta(6, unit='s'), value='foo')

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    177             key = com._apply_if_callable(key, self.obj)
    178         indexer = self._get_setitem_indexer(key)
--> 179         self._setitem_with_indexer(indexer, value)
    180 
    181     def _has_valid_type(self, k, axis):

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    581 
    582                     for item, v in zip(labels, value):
--> 583                         setter(item, v)
    584             else:
    585 

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in setter(item, v)
    511                     s._consolidate_inplace()
    512                     s = s.copy()
--> 513                     s._data = s._data.setitem(indexer=pi, value=v)
    514                     s._maybe_update_cacher(clear=True)
    515 

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in setitem(self, **kwargs)
   3201 
   3202     def setitem(self, **kwargs):
-> 3203         return self.apply('setitem', **kwargs)
   3204 
   3205     def putmask(self, **kwargs):

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3089 
   3090             kwargs['mgr'] = self
-> 3091             applied = getattr(b, f)(**kwargs)
   3092             result_blocks = _extend_blocks(applied, result_blocks)
   3093 

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in setitem(self, indexer, value, mgr)
    684 
    685         # coerce args
--> 686         values, _, value, _ = self._try_coerce_args(self.values, value)
    687         arr_value = np.array(value)
    688 

/home/bmcfee/miniconda/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in _try_coerce_args(self, values, other)
   1754         else:
   1755             # scalar
-> 1756             other = Timedelta(other)
   1757             other_mask = isnull(other)
   1758             other = other.value

pandas/_libs/tslib.pyx in pandas._libs.tslib.Timedelta.__new__ (pandas/_libs/tslib.c:50070)()

pandas/_libs/tslib.pyx in pandas._libs.tslib.parse_timedelta_string (pandas/_libs/tslib.c:60542)()

ValueError: unit abbreviation w/o a number

I suspect this is an entirely different kind of error, so it might make sense to close this one out and start a new issue, but I'll leave that call to you.

@jorisvandenbossche
Copy link
Member

Ah, yes, I see that as well (so when the label already exists). This already is raising in 0.19.2, so not a new bug ..

@jorisvandenbossche jorisvandenbossche added Bug and removed Regression Functionality that used to work in a prior pandas version labels Jun 19, 2017
@jreback jreback modified the milestones: Next Major Release, 0.20.3 Jul 6, 2017
@phofl
Copy link
Member

phofl commented Nov 15, 2020

Works now, setting once and setting twice

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Nov 15, 2020
@mroeschke mroeschke mentioned this issue May 21, 2021
10 tasks
@jreback jreback modified the milestones: Contributions Welcome, 1.3 May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants