Skip to content

BUG: indexing error when setting item in empty Series which has a frequency #10193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rekcahpassyla opened this issue May 22, 2015 · 2 comments
Closed
Labels
API Design Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@rekcahpassyla
Copy link
Contributor

Another somewhat degenerate case that works in 0.15.2, fails in 0.16.1

import pandas as pd

pd.show_versions()


ts = pd.TimeSeries()

ts[pd.datetime(2012, 1, 1)] = 47


ts2 = pd.TimeSeries(0, pd.date_range('2011-01-01', '2011-01-01'))[:0]

ts2[pd.datetime(2012, 1, 1)] = 47

0.15.2 output:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.2
nose: 1.3.4
Cython: 0.21
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 3.0.0-dev
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.2
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

0.16.1 output:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.1-46-g0aceb38
nose: 1.3.6
Cython: 0.22
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.6.1
IPython: 3.1.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.2
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.4
pymysql: None
psycopg2: None

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
c:\test_set_empty_series_with_freq.py in <module>()
     11 ts2 = pd.TimeSeries(0, pd.date_range('2011-01-01', '2011-01-01'))[:0]
     12
---> 13 ts2[pd.datetime(2012, 1, 1)] = 47

c:\python\envs\pandas-0.16.1\lib\site-packages\pandas\core\series.pyc in __setitem__(self, key, value)
    687         # do the setitem
    688         cacher_needs_updating = self._check_is_chained_assignment_possible()
--> 689         setitem(key, value)
    690         if cacher_needs_updating:
    691             self._maybe_update_cacher()

c:\python\envs\pandas-0.16.1\lib\site-packages\pandas\core\series.pyc in setitem(key, value)
    660                             pass
    661                 try:
--> 662                     self.loc[key] = value
    663                 except:
    664                     print ""

c:\python\envs\pandas-0.16.1\lib\site-packages\pandas\core\indexing.pyc in __setitem__(self, key, value)
    113     def __setitem__(self, key, value):
    114         indexer = self._get_setitem_indexer(key)
--> 115         self._setitem_with_indexer(indexer, value)
    116
    117     def _has_valid_type(self, k, axis):

c:\python\envs\pandas-0.16.1\lib\site-packages\pandas\core\indexing.pyc in _setitem_with_indexer(self, indexer, value)
    272                 if self.ndim == 1:
    273                     index = self.obj.index
--> 274                     new_index = index.insert(len(index),indexer)
    275
    276                     # this preserves dtype of the value

c:\python\envs\pandas-0.16.1\lib\site-packages\pandas\tseries\index.pyc in insert(self, loc, item)
   1523             # check freq can be preserved on edge cases
   1524             if self.freq is not None:
-> 1525                 if (loc == 0 or loc == -len(self)) and item + self.freq == self[0]:
   1526                     freq = self.freq
   1527                 elif (loc == len(self)) and item - self.freq == self[-1]:

c:\python\envs\pandas-0.16.1\lib\site-packages\pandas\tseries\index.pyc in __getitem__(self, key)
   1351         getitem = self._data.__getitem__
   1352         if np.isscalar(key):
-> 1353             val = getitem(key)
   1354             return Timestamp(val, offset=self.offset, tz=self.tz)
   1355         else:

IndexError: index 0 is out of bounds for axis 0 with size 0

Proposed #10194

@jreback
Copy link
Contributor

jreback commented May 22, 2015

This actually is a bit tricker. You are slicing another object, then trying to set it. This actually should hit the SettingWIthCopyWarning path, though it should actually work.

Side note is that we don't use TimeSeries anymore (its just a synonym for Series and will be deprecated in 0.17.0)

In [2]: ts2 = Series(0, pd.date_range('2011-01-01', '2011-01-01')) 

In [3]: ts2
Out[3]: 
2011-01-01    0
Freq: D, dtype: int64

In [4]: ts3 = ts2[:0]

In [5]: ts3
Out[5]: Series([], Freq: D, dtype: int64)

# this is the original structure
In [8]: ts3._data.blocks[0].values.base
Out[8]: array([0])

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves API Design Difficulty Intermediate labels May 22, 2015
@jreback jreback added this to the 0.17.0 milestone May 22, 2015
@rekcahpassyla
Copy link
Contributor Author

Actually, I've just found that setting an item on a newly created empty series with frequency still returns an error. Since it's a new series, there is no view involved.

In [142]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None

In [143]: series = pd.Series([], freq='D')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-143-123100163557> in <module>()
----> 1 series = pd.Series([], freq='D')

TypeError: __init__() got an unexpected keyword argument 'freq'

In [144]: series = pd.Series([], pd.DatetimeIndex([], freq='D'))

In [145]: dt = pd.Timestamp('2011-01-01')

In [146]: series.loc[dt] = 9
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-146-c577a9a95757> in <module>()
----> 1 series.loc[dt] = 9

C:\dev\bin\Anaconda\envs\pandas-0.16.2\lib\site-packages\pandas-0.16.2-py2.7-win-amd64.egg\pandas\core\indexing.pyc in __setitem__(self, key, value)
    113     def __setitem__(self, key, value):
    114         indexer = self._get_setitem_indexer(key)
--> 115         self._setitem_with_indexer(indexer, value)
    116
    117     def _has_valid_type(self, k, axis):

C:\dev\bin\Anaconda\envs\pandas-0.16.2\lib\site-packages\pandas-0.16.2-py2.7-win-amd64.egg\pandas\core\indexing.pyc in _setitem_with_indexer(self, indexer, value)
    281                 if self.ndim == 1:
    282                     index = self.obj.index
--> 283                     new_index = index.insert(len(index),indexer)
    284
    285                     # this preserves dtype of the value

C:\dev\bin\Anaconda\envs\pandas-0.16.2\lib\site-packages\pandas-0.16.2-py2.7-win-amd64.egg\pandas\tseries\index.pyc in insert(self, loc, item)
   1499             # check freq can be preserved on edge cases
   1500             if self.freq is not None:
-> 1501                 if (loc == 0 or loc == -len(self)) and item + self.freq == self[0]:
   1502                     freq = self.freq
   1503                 elif (loc == len(self)) and item - self.freq == self[-1]:

C:\dev\bin\Anaconda\envs\pandas-0.16.2\lib\site-packages\pandas-0.16.2-py2.7-win-amd64.egg\pandas\tseries\base.pyc in __getitem__(self, key)
     73         getitem = self._data.__getitem__
     74         if np.isscalar(key):
---> 75             val = getitem(key)
     76             return self._box_func(val)
     77         else:

IndexError: index 0 is out of bounds for axis 0 with size 0


In [148]: series_nofreq = pd.Series([], pd.DatetimeIndex([]))

In [149]: series_nofreq.loc[pd.Timestamp('2011-01-01')] = 47

In [150]:

yarikoptic added a commit to neurodebian/pandas that referenced this issue Jul 2, 2015
* commit 'v0.16.2-42-g383865f': (72 commits)
  BUG: provide categorical concat always on axis 0, pandas-dev#10430     numpy 1.10 makes this an error for 1-d on axis != 0
  DOC: update missing.rst with ref to groupby.rst
  BUG: Timedeltas with no specified units (and frac) should raise, pandas-dev#10426
  BUG: using .loc[:,column] fails when the object is a multi-index, pandas-dev#10408
  Removed scikit-timeseries migration docs from FAQ
  BUG: GH10395 bug in DataFrame.interpolate with axis=1 and inplace=True
  BUG: GH10392 bug where Table.select_column does not preserve column name
  TST: Use unicode literals in string test
  PERF: fix _get_level_indexer to accept an intermediate indexer result
  PERF: bench for pandas-dev#10287
  BUG: drop_duplicates drops name(s).
  ENH: Enable ExcelWriter to construct in-memory sheets
  BLD: remove support for 3.2, pandas-dev#9118
  PERF: timedelta and datetime64 ops improvements
  PERF: parse timedelta strings in cython pandas-dev#6755
  closes bug in reset_index when index contains NaT
  Check for size=0 before setting item Fixes pandas-dev#10193
  closes bug in apply when function returns categorical
  BUG: frequencies.get_freq_code raises an error against offset with n != 1
  CI: run doc-tests always
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants