Skip to content

AttributeError: 'Hour' object has no attribute 'normalize' when masking a time series #7748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leroygr opened this issue Jul 14, 2014 · 11 comments · Fixed by #7789
Closed

AttributeError: 'Hour' object has no attribute 'normalize' when masking a time series #7748

leroygr opened this issue Jul 14, 2014 · 11 comments · Fixed by #7789
Labels
Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@leroygr
Copy link

leroygr commented Jul 14, 2014

Hi All,

I have a strange bug when I want to mask values of a time series (that I get from a pickled Panel).

I have the following Series:

In[30]: ts=pd.read_pickle('df_issue.pkl')['wind_speed']

In [31]:ts
Out[31]: 
2013-12-31 16:00:00         NaN
2013-12-31 17:00:00         NaN
2013-12-31 18:00:00    9.845031
2013-12-31 19:00:00         NaN
2013-12-31 20:00:00         NaN
2013-12-31 21:00:00         NaN
2013-12-31 22:00:00         NaN
2013-12-31 23:00:00         NaN
Freq: H, Name: wind_speed, dtype: float64

And I have an exception when I try to mask it:

In [32]: ts<0.
Traceback (most recent call last):

  File "<ipython-input-32-534147d368f7>", line 1, in <module>
    ts<0.

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 585, in wrapper
    res[mask] = masker

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 637, in __setitem__
    self.where(~key, value, inplace=True)

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3238, in where
    inplace=True)

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 2219, in putmask
    return self.apply('putmask', **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 2185, in apply
    b_items = self.items[b.mgr_locs.indexer]

  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/index.py", line 1387, in __getitem__
    new_offset = key.step * self.offset

  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/offsets.py", line 265, in __rmul__
    return self.__mul__(someInt)

  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/offsets.py", line 262, in __mul__
    return self.__class__(n=someInt * self.n, normalize=self.normalize, **self.kwds)

AttributeError: 'Hour' object has no attribute 'normalize'

If I make the same Series manually, it is working:

ts1=pd.Series(data=[np.nan,np.nan,9.845031]+[np.nan]*5,index=pd.date_range('2013-12-31 16:00:00',periods=8,freq='H'))

In [36]: ts1
Out[36]: 
2013-12-31 16:00:00         NaN
2013-12-31 17:00:00         NaN
2013-12-31 18:00:00    9.845031
2013-12-31 19:00:00         NaN
2013-12-31 20:00:00         NaN
2013-12-31 21:00:00         NaN
2013-12-31 22:00:00         NaN
2013-12-31 23:00:00         NaN
Freq: H, dtype: float64

In [37]: ts1<0.
Out[37]: 
2013-12-31 16:00:00    False
2013-12-31 17:00:00    False
2013-12-31 18:00:00    False
2013-12-31 19:00:00    False
2013-12-31 20:00:00    False
2013-12-31 21:00:00    False
2013-12-31 22:00:00    False
2013-12-31 23:00:00    False
Freq: H, dtype: bool

I really don't understand why...
You can find the pickle file here: http://we.tl/lrsFvmanVl

Thanks,
Greg

Here is the show_version output:

INSTALLED VERSIONS

commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.8.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.0
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.6.0.dev-Unknown
IPython: 1.1.0
sphinx: 1.1.3
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.4
bottleneck: 0.6.0
tables: 3.1.1
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.2
xlsxwriter: None
lxml: 3.0.0
bs4: 4.3.2
html5lib: None
httplib2: 0.7.2
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

when/how did you create the pickle? e.g. in a prior version, which one?

@leroygr
Copy link
Author

leroygr commented Jul 14, 2014

df_pickle.pkl is created using pandas 0.14.1. However, the raw data in the pickle file are results of calculations done on data originally stored in HDF5 (created with pandas 0.12).

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

can you show the original creation of the df?

@leroygr
Copy link
Author

leroygr commented Jul 14, 2014

Not really because this is an output of several steps impossible to show here...

FYI, I just downgraded to pandas 0.13.1 and the issue above doesn't appear anymore.

Do you think all this could be due to the raw data loaded from HDF5 generated with pandas 0.12?

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

prob has to do with the pickling of the Hour object from the HDF5. Its an older version. I think you could just do something like

df.index = df.index.copy() in 0.14.1 and then it should work.

@leroygr
Copy link
Author

leroygr commented Jul 14, 2014

df.index = df.index.copy() doesn't work but doing the following works

new_df = pd.DataFrame(df.values,columns=df.columns, index=[i for i in df.index])

new_df['variable'] <0.

It is a bit slow due to [i for i in df.index] though (the df.index can be quite large). Do you have a faster alternative that may work as well?

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

you can try this:

df.index = Index(df.index,freq=df.index.freqstr to recreate a new index, and the key is to create new frequency (rather than using the existing)

@leroygr
Copy link
Author

leroygr commented Jul 14, 2014

It worked! Thanks a lot!

@jreback
Copy link
Contributor

jreback commented Jul 14, 2014

gr8. this is not a bug per-se, more of a slightly change in the compat of frequencies in terms of backward-compat for pickle. Not a whole lot I think can do about this.

cc @sinhrks, maybe change these to normalize=getattr(self,'normalize',None) ?

@sinhrks
Copy link
Member

sinhrks commented Jul 17, 2014

Yes, will do. Another option is to add normalize as class variable also. Which looks better?

@jreback
Copy link
Contributor

jreback commented Jul 17, 2014

hmm, a class variable might be more clear. As long as validation for the user passed parm.

JackKelly added a commit to nilmtk/nilmtk that referenced this issue Jul 25, 2014
because something odd changed in the way DataFrames get pickled into
HDF5.  The only change I made to fix the problem was to re-create our
synthetic datasets (in nilmtk/data) using Pandas 0.14.1 and all is
fine again.  See pandas-dev/pandas#7748
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants