AttributeError: 'Hour' object has no attribute 'normalize' when masking a time series #7748

leroygr · 2014-07-14T13:10:54Z

Hi All,

I have a strange bug when I want to mask values of a time series (that I get from a pickled Panel).

I have the following Series:

In[30]: ts=pd.read_pickle('df_issue.pkl')['wind_speed']

In [31]:ts
Out[31]: 
2013-12-31 16:00:00         NaN
2013-12-31 17:00:00         NaN
2013-12-31 18:00:00    9.845031
2013-12-31 19:00:00         NaN
2013-12-31 20:00:00         NaN
2013-12-31 21:00:00         NaN
2013-12-31 22:00:00         NaN
2013-12-31 23:00:00         NaN
Freq: H, Name: wind_speed, dtype: float64

And I have an exception when I try to mask it:

In [32]: ts<0.
Traceback (most recent call last):

  File "<ipython-input-32-534147d368f7>", line 1, in <module>
    ts<0.

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 585, in wrapper
    res[mask] = masker

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 637, in __setitem__
    self.where(~key, value, inplace=True)

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3238, in where
    inplace=True)

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 2219, in putmask
    return self.apply('putmask', **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 2185, in apply
    b_items = self.items[b.mgr_locs.indexer]

  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.py", line 1387, in __getitem__
    new_offset = key.step * self.offset

  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/offsets.py", line 265, in __rmul__
    return self.__mul__(someInt)

  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/offsets.py", line 262, in __mul__
    return self.__class__(n=someInt * self.n, normalize=self.normalize, **self.kwds)

AttributeError: 'Hour' object has no attribute 'normalize'

If I make the same Series manually, it is working:

ts1=pd.Series(data=[np.nan,np.nan,9.845031]+[np.nan]*5,index=pd.date_range('2013-12-31 16:00:00',periods=8,freq='H'))

In [36]: ts1
Out[36]: 
2013-12-31 16:00:00         NaN
2013-12-31 17:00:00         NaN
2013-12-31 18:00:00    9.845031
2013-12-31 19:00:00         NaN
2013-12-31 20:00:00         NaN
2013-12-31 21:00:00         NaN
2013-12-31 22:00:00         NaN
2013-12-31 23:00:00         NaN
Freq: H, dtype: float64

In [37]: ts1<0.
Out[37]: 
2013-12-31 16:00:00    False
2013-12-31 17:00:00    False
2013-12-31 18:00:00    False
2013-12-31 19:00:00    False
2013-12-31 20:00:00    False
2013-12-31 21:00:00    False
2013-12-31 22:00:00    False
2013-12-31 23:00:00    False
Freq: H, dtype: bool

I really don't understand why...
You can find the pickle file here: http://we.tl/lrsFvmanVl

Thanks,
Greg

Here is the show_version output:

INSTALLED VERSIONS

commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.8.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.0
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.6.0.dev-Unknown
IPython: 1.1.0
sphinx: 1.1.3
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.4
bottleneck: 0.6.0
tables: 3.1.1
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.2
xlsxwriter: None
lxml: 3.0.0
bs4: 4.3.2
html5lib: None
httplib2: 0.7.2
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

jreback · 2014-07-14T13:13:25Z

when/how did you create the pickle? e.g. in a prior version, which one?

leroygr · 2014-07-14T13:19:28Z

df_pickle.pkl is created using pandas 0.14.1. However, the raw data in the pickle file are results of calculations done on data originally stored in HDF5 (created with pandas 0.12).

jreback · 2014-07-14T13:26:27Z

can you show the original creation of the df?

leroygr · 2014-07-14T13:29:41Z

Not really because this is an output of several steps impossible to show here...

FYI, I just downgraded to pandas 0.13.1 and the issue above doesn't appear anymore.

Do you think all this could be due to the raw data loaded from HDF5 generated with pandas 0.12?

jreback · 2014-07-14T13:31:42Z

prob has to do with the pickling of the Hour object from the HDF5. Its an older version. I think you could just do something like

df.index = df.index.copy() in 0.14.1 and then it should work.

leroygr · 2014-07-14T14:07:05Z

df.index = df.index.copy() doesn't work but doing the following works

new_df = pd.DataFrame(df.values,columns=df.columns, index=[i for i in df.index])

new_df['variable'] <0.

It is a bit slow due to [i for i in df.index] though (the df.index can be quite large). Do you have a faster alternative that may work as well?

jreback · 2014-07-14T14:09:18Z

you can try this:

df.index = Index(df.index,freq=df.index.freqstr to recreate a new index, and the key is to create new frequency (rather than using the existing)

leroygr · 2014-07-14T14:10:45Z

It worked! Thanks a lot!

jreback · 2014-07-14T14:14:13Z

gr8. this is not a bug per-se, more of a slightly change in the compat of frequencies in terms of backward-compat for pickle. Not a whole lot I think can do about this.

cc @sinhrks, maybe change these to normalize=getattr(self,'normalize',None) ?

sinhrks · 2014-07-17T08:55:42Z

Yes, will do. Another option is to add normalize as class variable also. Which looks better?

jreback · 2014-07-17T09:56:59Z

hmm, a class variable might be more clear. As long as validation for the user passed parm.

because something odd changed in the way DataFrames get pickled into HDF5. The only change I made to fix the problem was to re-create our synthetic datasets (in nilmtk/data) using Pandas 0.14.1 and all is fine again. See pandas-dev/pandas#7748

jreback added the Compat label Jul 14, 2014

jreback added this to the 0.15.0 milestone Jul 14, 2014

armaganthis3 mentioned this issue Jul 14, 2014

GH6848 silently changed series.sort from stable to unstable sort #7750

Closed

sinhrks mentioned this issue Jul 18, 2014

BUG/COMPAT: pickled dtindex with freq raises AttributeError in normalize... #7789

Merged

jreback closed this as completed in #7789 Jul 19, 2014

JackKelly mentioned this issue Jul 25, 2014

AttributeError: 'Second' object has no attribute 'normalize' nilmtk/nilmtk#143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'Hour' object has no attribute 'normalize' when masking a time series #7748

AttributeError: 'Hour' object has no attribute 'normalize' when masking a time series #7748

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

sinhrks commented Jul 17, 2014

jreback commented Jul 17, 2014

AttributeError: 'Hour' object has no attribute 'normalize' when masking a time series #7748

AttributeError: 'Hour' object has no attribute 'normalize' when masking a time series #7748

Comments

leroygr commented Jul 14, 2014

INSTALLED VERSIONS

jreback commented Jul 14, 2014

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

leroygr commented Jul 14, 2014

jreback commented Jul 14, 2014

sinhrks commented Jul 17, 2014

jreback commented Jul 17, 2014