Skip to content

Partial datetime indexing of Multiindex by year only #14049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jesrael opened this issue Aug 19, 2016 · 5 comments
Closed

Partial datetime indexing of Multiindex by year only #14049

jesrael opened this issue Aug 19, 2016 · 5 comments
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@jesrael
Copy link

jesrael commented Aug 19, 2016

I have DataFrame:

import pandas as pd
import numpy as np

rng = pd.date_range('01-01-1988',periods=130000,freq='H')
long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
dfmi = long_df.stack().sort_index()

print (dfmi)
1988-01-01 00:00:00  bar   -1.129213
                     baz    1.405388
                     foo    0.482324
                     zoo   -2.315226
1988-01-01 01:00:00  bar   -0.533171
                     baz    1.906526
                     foo   -0.745095
                     zoo    0.351300
1988-01-01 02:00:00  bar   -0.162411
                     baz   -0.218296
                     foo    0.327074
                     zoo   -0.508463
1988-01-01 03:00:00  bar   -0.676780
                     baz   -1.137270
                     foo    0.389457
                     zoo   -0.517306
1988-01-01 04:00:00  bar    1.368110
                     baz    1.106048
                     foo    1.342662
                     zoo   -0.215235
1988-01-01 05:00:00  bar   -1.284003
                     baz    0.216028
                     foo    0.889421
                     zoo   -0.965881
1988-01-01 06:00:00  bar    1.522085
                     baz    0.436332
                     foo    0.114714
                     zoo    0.342456
1988-01-01 07:00:00  bar   -0.176803
                     baz   -0.365471

2002-10-30 08:00:00  foo   -1.196314
                     zoo    1.245414
2002-10-30 09:00:00  bar    1.271792
                     baz   -0.688911
                     foo   -0.852729
                     zoo    0.869926
2002-10-30 10:00:00  bar   -0.487386
                     baz   -0.161470
                     foo    0.593564
                     zoo   -0.440397
2002-10-30 11:00:00  bar    0.856882
                     baz   -0.297341
                     foo    2.106393
                     zoo    0.186903
2002-10-30 12:00:00  bar    1.551650
                     baz    1.152780
                     foo    0.726544
                     zoo   -0.334287
2002-10-30 13:00:00  bar    0.483035
                     baz    1.573808
                     foo   -0.734948
                     zoo    0.005149
2002-10-30 14:00:00  bar    0.077451
                     baz    0.783963
                     foo   -0.111543
                     zoo   -0.637871
2002-10-30 15:00:00  bar    0.383890
                     baz   -0.368546
                     foo    0.503162
                     zoo    0.324679
dtype: float64

print (dfmi.index.get_level_values(0))
DatetimeIndex(['1988-01-01 00:00:00', '1988-01-01 00:00:00',
               '1988-01-01 00:00:00', '1988-01-01 00:00:00',
               '1988-01-01 01:00:00', '1988-01-01 01:00:00',
               '1988-01-01 01:00:00', '1988-01-01 01:00:00',
               '1988-01-01 02:00:00', '1988-01-01 02:00:00',
               ...
               '2002-10-30 13:00:00', '2002-10-30 13:00:00',
               '2002-10-30 14:00:00', '2002-10-30 14:00:00',
               '2002-10-30 14:00:00', '2002-10-30 14:00:00',
               '2002-10-30 15:00:00', '2002-10-30 15:00:00',
               '2002-10-30 15:00:00', '2002-10-30 15:00:00'],
              dtype='datetime64[ns]', length=520000, freq='H')

If select by year, month and day, it works nice:

print (dfmi.loc['2001-01-01'])

2001-01-01 00:00:00  bar   -0.349633
                     baz   -1.945182
                     foo    0.422883
                     zoo   -1.183061
2001-01-01 01:00:00  bar   -0.014788
                     baz   -0.646235
                     foo   -1.449877
                     zoo    0.422516
2001-01-01 02:00:00  bar    0.197731
                     baz   -0.307682
                     foo   -1.293725
                     zoo    0.283953
2001-01-01 03:00:00  bar    0.725005
                     baz    1.182862
                     foo   -0.349092
                     zoo    1.354510
2001-01-01 04:00:00  bar   -0.516522
                     baz    0.582963
                     foo   -0.577522
                     zoo    0.710188
2001-01-01 05:00:00  bar   -0.453547
                     baz   -1.054144
                     foo    0.911428
                     zoo   -0.417231
2001-01-01 06:00:00  bar   -1.117797
                     baz   -0.315343
                     foo    1.441117
                     zoo    1.135960
2001-01-01 07:00:00  bar   -0.607946
                     baz    1.210047
                              ...   
2001-01-01 16:00:00  foo   -1.435443
                     zoo    0.045783
2001-01-01 17:00:00  bar   -0.590686
                     baz    1.115333
                     foo   -0.558769
                     zoo   -0.522037
2001-01-01 18:00:00  bar   -0.862442
                     baz    0.917343
                     foo    0.759948
                     zoo   -0.004601
2001-01-01 19:00:00  bar   -1.901064
                     baz    0.281404
                     foo   -0.704072
                     zoo   -0.104176
2001-01-01 20:00:00  bar   -0.703224
                     baz   -0.170848
                     foo    0.547919
                     zoo    1.199431
2001-01-01 21:00:00  bar    1.198095
                     baz   -0.416019
                     foo    1.551871
                     zoo   -0.472543
2001-01-01 22:00:00  bar   -0.239550
                     baz   -0.401000
                     foo    0.959729
                     zoo    0.783011
2001-01-01 23:00:00  bar    0.415624
                     baz    0.093273
                     foo    0.712913
                     zoo   -1.026016
dtype: float64

But if select by year or year and month, it return all DataFrame, there is not slicing.

print (dfmi.loc['2001-01'])

print (dfmi.loc['2001'])

Is it bug or not implemented yet? Thank you.

print (pd.show_versions())
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: sk_SK

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.1
None
@jesrael jesrael changed the title Partial datetime indexing by years and months Partial datetime indexing of Multiindex by year only Aug 19, 2016
@jreback
Copy link
Contributor

jreback commented Aug 19, 2016

pls check that your output is from running on 0.18.1 as this DOES work with that version.

Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec  7 2015, 11:24:55) 
Type "copyright", "credits" or "license" for more information.

IPython 5.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: pd.__version__
Out[1]: '0.18.1'

In [2]: rng = pd.date_range('01-01-1988',periods=130000,freq='H')
   ...: long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
   ...: dfmi = long_df.stack().sort_index()
   ...: 

In [3]: len(dfmi.loc['2001'])
Out[3]: 398760

In [4]: len(dfmi.loc['2001-01'])
Out[4]: 390744

In [5]: len(dfmi.loc['2001-01-01'])
Out[5]: 96

@jreback
Copy link
Contributor

jreback commented Aug 19, 2016

though you your LANG setting might not be honoured. Try using that LANG date separators.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype MultiIndex labels Aug 19, 2016
@jesrael
Copy link
Author

jesrael commented Aug 20, 2016

For me it work with len also nice. But output get all DataFrame. And it failed only in large DataFrame, in small it works perfectly:

import pandas as pd
import numpy as np

rng = pd.date_range('01-01-1988',periods=130000,freq='H')

long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
dfmi = long_df.stack().sort_index()

print (len(dfmi.loc['2001']))
398760

print (len(dfmi.loc['2001-01']))
390744

print (len(dfmi.loc['2001-01-01']))
96

print (dfmi.loc['2001'].head(10))
1988-01-01 00:00:00  bar   -0.468276
                     baz    0.236022
                     foo    1.220370
1988-01-01 01:00:00  bar   -0.266689
                     baz    0.151896
                     foo    0.162511
1988-01-01 02:00:00  bar   -0.046803
                     baz    0.147449
                     foo   -0.532906
1988-01-01 03:00:00  bar    0.544098
dtype: float64

print (dfmi.loc['2001'].tail(10))
2002-10-30 12:00:00  foo    0.012732
2002-10-30 13:00:00  bar   -0.222702
                     baz    1.099860
                     foo    0.143408
2002-10-30 14:00:00  bar    1.022347
                     baz    1.617577
                     foo   -0.061869
2002-10-30 15:00:00  bar    1.037734
                     baz   -0.759772
                     foo   -1.280097
dtype: float64

rng = pd.date_range('01-01-2000',periods=50,freq='M')
long_df = pd.DataFrame(np.random.randn(50,4),index = rng, columns=['bar','baz','foo','zoo'])
dfmi = long_df.stack()

print (dfmi.loc['2001'].head(10))
2001-01-31  bar    0.670179
            baz   -0.872433
            foo    1.790951
            zoo   -0.359836
2001-02-28  bar   -1.130303
            baz    1.878084
            foo    0.928927
            zoo   -0.728020
2001-03-31  bar   -1.095083
            baz   -0.338816
dtype: float64

print (dfmi.loc['2001'].tail(10))
2001-10-31  foo    0.451467
            zoo    1.488951
2001-11-30  bar    0.820663
            baz    1.302047
            foo    1.062436
            zoo   -0.573341
2001-12-31  bar   -1.325396
            baz    0.428895
            foo   -0.336822
            zoo    0.317958
dtype: float64

@jesrael
Copy link
Author

jesrael commented Aug 20, 2016

@jreback
Copy link
Contributor

jreback commented Aug 20, 2016

this is the same as #12896 and closed by #13117 in master and 0.19.0 (releasing soon)

In [1]: rng = pd.date_range('01-01-1988',periods=130000,freq='H')
   ...: long_df = pd.DataFrame(np.random.randn(130000,4),index = rng, columns=['bar','baz','foo','zoo'])
   ...: dfmi = long_df.stack().sort_index()
   ...: 

In [2]: dfmi.loc['2001-01'].head()
Out[2]: 
2001-01-01 00:00:00  bar    1.632104
                     baz   -1.039070
                     foo    1.371071
                     zoo   -0.661709
2001-01-01 01:00:00  bar    1.372920
dtype: float64

In [3]: dfmi.loc['2001-01'].tail()
Out[3]: 
2001-01-31 22:00:00  zoo    1.972502
2001-01-31 23:00:00  bar    1.414616
                     baz    0.248703
                     foo    1.095618
                     zoo   -1.528967
dtype: float64

In [4]: dfmi.loc['2001-01-01'].head()
Out[4]: 
2001-01-01 00:00:00  bar    1.632104
                     baz   -1.039070
                     foo    1.371071
                     zoo   -0.661709
2001-01-01 01:00:00  bar    1.372920
dtype: float64

In [5]: dfmi.loc['2001-01-01'].tail()
Out[5]: 
2001-01-01 22:00:00  zoo    0.109188
2001-01-01 23:00:00  bar    0.531228
                     baz   -0.675307
                     foo   -0.168099
                     zoo    1.217736
dtype: float64

In [6]: len(dfmi.loc['2001-01'])
Out[6]: 2976

@jreback jreback closed this as completed Aug 20, 2016
@jreback jreback added the Duplicate Report Duplicate issue or pull request label Aug 20, 2016
@jreback jreback added this to the No action milestone Aug 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

2 participants