Skip to content

BUG: support/test of PeriodIndex in HDFStore #7796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zoof opened this issue Jul 18, 2014 · 13 comments · Fixed by #44314 or #48618
Closed

BUG: support/test of PeriodIndex in HDFStore #7796

zoof opened this issue Jul 18, 2014 · 13 comments · Fixed by #44314 or #48618
Labels
Enhancement IO HDF5 read_hdf, HDFStore Needs Tests Unit test(s) needed to prevent regressions Period Period data type

Comments

@zoof
Copy link

zoof commented Jul 18, 2014

Works for 'fixed'
buggy for 'table'

need explicit test for both

This may be a MultiIndex issue.

In [7]: weeklyPrices.head()
Out[7]: 
                                   price  
week                  id                                                   
2013-07-28/2013-08-03 0002189585   15.26  
2013-08-04/2013-08-10 0002189585   15.25  
2013-08-11/2013-08-17 0002189585   15.25  
2013-08-18/2013-08-24 0002189585   14.83 
2013-09-01/2013-09-07 0002189585   14.83  

In [8]: pd.read_hdf('weeklyPrices.h5','weeklyPrices').head()
Out[8]: 
                  price  
week id                                                                 
2274 0002189585   15.26  
2275 0002189585   15.25  
2276 0002189585   15.25  
2277 0002189585   14.83  
2279 0002189585   14.83  
@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

this is not implemented ATM. should raise NotImplementedError until it is. care to do a PR? (and implement if so desired). just a matter of reconverting the index)

actually, can you show a complete example: create the frame, write it, and read back

also pd.show_versions() (this DOES work with mi in >= 0.14.1 IIRC)

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

ok, this works for fixed format but not table

In [2]: df = DataFrame(np.random.randn(5,1),index=period_range('20130101',freq='M',periods=5))

In [3]: df
Out[3]: 
                0
2013-01  1.631650
2013-02  0.163391
2013-03  1.141329
2013-04 -1.821027
2013-05 -0.996801

In [4]: df.index
Out[4]: 
<class 'pandas.tseries.period.PeriodIndex'>
[2013-01, ..., 2013-05]
Length: 5, Freq: M
In [6]: df.to_hdf('test.hdf','df',mode='w')

In [7]: pd.read_hdf('test.hdf','df')
Out[7]: 
                0
2013-01  1.631650
2013-02  0.163391
2013-03  1.141329
2013-04 -1.821027
2013-05 -0.996801
In [8]: df.to_hdf('test.hdf','df',mode='w',format='table')

In [9]: pd.read_hdf('test.hdf','df')
Out[9]: 
            0
516  1.631650
517  0.163391
518  1.141329
519 -1.821027
520 -0.996801

@jreback jreback added this to the 0.15.0 milestone Jul 18, 2014
@zoof
Copy link
Author

zoof commented Jul 18, 2014

I did not use the format argument so presumably it was fixed format so this feature was not yet implemented in 0.14.0. I haven't upgraded to 0.14.1 as #7746 appears to be problematic for me.

In [26]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.12.24-1-MANJARO
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.2
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: None
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.4
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 1.8.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

ok #7746 is broken in 0.14.0 (for a different reason). You know that you can use timestamps rather than Periods and this will all work.

@jreback jreback changed the title to_hdf reverts PeriodIndex to Int64 BUG: support/test of PeriodIndex in HDFStore Jul 18, 2014
@zoof
Copy link
Author

zoof commented Jul 18, 2014

My problem is that I am converting daily data to weekly data and I want freq='W-SAT' but the builtin functionality of DatetimeIndex complains about the inferred frequency not conforming to my chosen frequency.

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

can you show an example?

@zoof
Copy link
Author

zoof commented Jul 18, 2014

Sure:

In [33]: pd.DatetimeIndex(['2013-01-05','2013-01-13','2013-01-20'],freq='W-SAT')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-e05feb09416f> in <module>()
----> 1 pd.DatetimeIndex(['2013-01-05','2013-01-13','2013-01-20'],freq='W-SAT')

/usr/lib/python2.7/site-packages/pandas/tseries/index.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, **kwds)
    304                     if not np.array_equal(subarr.asi8, on_freq.asi8):
    305                         raise ValueError('Inferred frequency {0} from passed dates does not'
--> 306                                          'conform to passed frequency {1}'.format(inferred, freq.freqstr))
    307 
    308         if freq_infer:

ValueError: Inferred frequency None from passed dates does notconform to passed frequency W-SAT

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

your example is NOT the right frequency, in fact its 8 days, then 7 days between (the periods). Not sure what is possible with that

In [5]: date_range('20130105',freq='W-SAT',periods=3)
Out[5]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-05, ..., 2013-01-19]
Length: 3, Freq: W-SAT, Timezone: None

In [7]: date_range('20130105',freq='W-SAT',periods=3).tolist()
Out[7]: 
[Timestamp('2013-01-05 00:00:00', offset='W-SAT'),
 Timestamp('2013-01-12 00:00:00', offset='W-SAT'),
 Timestamp('2013-01-19 00:00:00', offset='W-SAT')]

@zoof
Copy link
Author

zoof commented Jul 18, 2014

Right -- with daily data, dates are not going to conform to a 7 day frequency. For freq='W-SAT', PeriodIndex correctly puts dates into the 7 day period ending on a Saturday into a Period object like: '2013-07-28/2013-08-03'

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

ok, but you are resampling with this, yes? what is the problem they will be in the correct buckets for that (even with no frequency)

@zoof
Copy link
Author

zoof commented Jul 18, 2014

Ah, no. Something else I need to learn I see. Many thanks!

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

@jreback jreback modified the milestones: Next Major Release, 0.16.0 Mar 6, 2015
@mroeschke mroeschke removed Bug Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas labels Apr 11, 2021
@jreback jreback reopened this Nov 6, 2021
@jreback
Copy link
Contributor

jreback commented Nov 6, 2021

need to fully test this to close after #44314

@jreback jreback modified the milestones: Contributions Welcome, 1.4 Nov 6, 2021
@jreback jreback modified the milestones: 1.4, Contributions Welcome Dec 31, 2021
@lithomas1 lithomas1 added the Needs Tests Unit test(s) needed to prevent regressions label Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HDF5 read_hdf, HDFStore Needs Tests Unit test(s) needed to prevent regressions Period Period data type
Projects
None yet
4 participants