Skip to content

datetime index list when pd.concat does not infer frequency #12231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dacoex opened this issue Feb 4, 2016 · 6 comments
Closed

datetime index list when pd.concat does not infer frequency #12231

dacoex opened this issue Feb 4, 2016 · 6 comments
Labels
Datetime Datetime data dtype Frequency DateOffsets Usage Question

Comments

@dacoex
Copy link
Contributor

dacoex commented Feb 4, 2016

The cookbook Data In/Out points to: how to read in multiple files, appending to create a single dataframe.

When using the suggested procedure on 2 df with freq '10T' the index is without the freq after the concat.

@dacoex
Copy link
Contributor Author

dacoex commented Feb 4, 2016

maybe related to #12195

@TomAugspurger
Copy link
Contributor

Hi @dacoex can you provide an example that shows your problem? For example

In [2]: idx1 = pd.period_range('2015-01-01', freq='10T', periods=10)

In [3]: idx2 = pd.period_range('2015-02-01', freq='10T', periods=10)

In [4]: pd.concat([pd.Series(1, index=idx1), pd.Series(2, index=idx2)])
Out[4]:
2015-01-01 00:00    1
2015-01-01 00:10    1
2015-01-01 00:20    1
2015-01-01 00:30    1
2015-01-01 00:40    1
                   ..
2015-02-01 00:50    2
2015-02-01 01:00    2
2015-02-01 01:10    2
2015-02-01 01:20    2
2015-02-01 01:30    2
Freq: 10T, dtype: int64

In [5]: pd.concat([pd.Series(1, index=idx1), pd.Series(2, index=idx2)]).index
Out[5]:
PeriodIndex(['2015-01-01 00:00', '2015-01-01 00:10', '2015-01-01 00:20',
             '2015-01-01 00:30', '2015-01-01 00:40', '2015-01-01 00:50',
             '2015-01-01 01:00', '2015-01-01 01:10', '2015-01-01 01:20',
             '2015-01-01 01:30', '2015-02-01 00:00', '2015-02-01 00:10',
             '2015-02-01 00:20', '2015-02-01 00:30', '2015-02-01 00:40',
             '2015-02-01 00:50', '2015-02-01 01:00', '2015-02-01 01:10',
             '2015-02-01 01:20', '2015-02-01 01:30'],
            dtype='int64', freq='10T')

is correct. Also post your pandas version, as you noted it might have been fixed in #12195

@jreback
Copy link
Contributor

jreback commented Feb 5, 2016

freq is lazily computed (though it is by definition computed on direct construction)

In [14]: s1 = Series(1, date_range('20130101',periods=3))

In [15]: s2 = Series(2, date_range('20130101',periods=3,freq='H'))

In [16]: s3 = Series(3, date_range('20130104',periods=3))

In [17]: s1.index.freq
Out[17]: <Day>

In [18]: s2.index.freq
Out[18]: <Hour>

In [19]: s3.index.freq
Out[19]: <Day>

# this is lazy
In [21]: pd.concat([s1,s3]).index.freq

In [22]: pd.concat([s1,s3]).index.inferred_freq
Out[22]: 'D'

# this will never have a freq
In [23]: pd.concat([s1,s2]).index.inferred_freq

@jreback jreback closed this as completed Feb 5, 2016
@jreback jreback added Datetime Datetime data dtype Usage Question Frequency DateOffsets labels Feb 5, 2016
@dacoex
Copy link
Contributor Author

dacoex commented Feb 8, 2016

@jreback thanks for the clarification.

In my case, I have concatenated 2 df with the same frequency which was assigned before.

The frames were generated by a loop and added to a list of frames.
The single frame comes with frequency:

frames[0].index
dtype='datetime64[ns]', name='datetime', length=1000, freq='10T')

frames[1].index
dtype='datetime64[ns]', name='datetime', length=1000, freq='10T')

But not the combined frame:

pd.concat(frames).index
dtype='datetime64[ns]', name='datetime', length=2000, freq=None

Astonishingly, in the following simple example, it works:

import numpy as np
import pandas as pd


# according: 
# https://github.com/pydata/pandas/issues/12231

idx1 = pd.period_range('2015-01-01', freq='10T', periods=1000)

idx2 = pd.period_range('2016-01-01', freq='10T', periods=1000)

df1 = pd.DataFrame(np.random.randn(1000), index=idx1, 
                   columns=['A'])
df2 = pd.DataFrame(np.random.randn(1000), index=idx2, 
                   columns=['B'])

frames = [df1, df2]

df_concat = pd.concat(frames)

How could I debug this issue further?

@dacoex
Copy link
Contributor Author

dacoex commented Feb 10, 2016

I forgot:


pd.__version__
Out[9]: '0.17.1'

@dacoex dacoex changed the title datetime index list when pd.concat datetime index list when pd.concat does not infer frequency Feb 10, 2016
@jreback
Copy link
Contributor

jreback commented Feb 10, 2016

no idea, you will simply have to step thru your code and examine each step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Frequency DateOffsets Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants