Skip to content

partial slicing FAILS with a datetimeindex #13539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
randomgambit opened this issue Jun 30, 2016 · 6 comments
Closed

partial slicing FAILS with a datetimeindex #13539

randomgambit opened this issue Jun 30, 2016 · 6 comments
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@randomgambit
Copy link

randomgambit commented Jun 30, 2016

Hi,

I have a simple dataframe with a multiindex.

stats.index.names
Out[67]: FrozenList([u'day', u'category'])

day is a time variable, category is a string.

stats.index.inferred_type
Out[69]: 'mixed'

stats.index.dtype_str
Out[68]: 'object'

if I type

idx=pd.IndexSlice
stats.loc[idx['2015-01-01',:],:]

I get the correct slice at '2015-01-01', that is all the observations for all the categories on that day.

If I type

idx=pd.IndexSlice
stats.loc[idx['2015',:],:] 

I get back (almost) all my observations, even for years other than 2015.

What can be the problem here? I cannot paste the data unfortunately but I am happy to help in any way.

Thanks

@jreback
Copy link
Contributor

jreback commented Jun 30, 2016

pls follow the instructions and submit the versions and a minimal example that reproduces

@randomgambit
Copy link
Author


pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@randomgambit
Copy link
Author

 dft2 = pd.DataFrame(np.random.randn(2000, 1),
                     columns=['A'],
                     index=pd.MultiIndex.from_product([pd.date_range('20130101',
                     periods=1000,
                     freq='12H'),
                     ['category1', 'category2']]))
idx=pd.IndexSlice
dft2.loc[idx['2013-01-03',:],:]
Out[95]: 
                                      A
2013-01-03 00:00:00 category1  0.295829
                    category2  0.006298
2013-01-03 12:00:00 category1 -0.389461
                    category2  0.124865
idx=pd.IndexSlice
dft2.loc[idx['2013',:],:]


Out[96]: 
                                      A
2013-01-01 00:00:00 category1 -0.050261
                    category2  1.221524
2013-01-01 12:00:00 category1 -0.133423
                    category2  1.029201
2013-01-02 00:00:00 category1  0.543025
                    category2  0.619642
2013-01-02 12:00:00 category1  1.509450
                    category2 -1.405432
2013-01-03 00:00:00 category1  0.295829
                    category2  0.006298
2013-01-03 12:00:00 category1 -0.389461
                    category2  0.124865
2013-01-04 00:00:00 category1  0.332380
                    category2  0.705887
2013-01-04 12:00:00 category1 -0.932210
                    category2  0.602924
2013-01-05 00:00:00 category1  0.156599
                    category2 -0.268288
2013-01-05 12:00:00 category1  0.491792
                    category2  0.281679
2013-01-06 00:00:00 category1  0.835345
                    category2 -0.958069
2013-01-06 12:00:00 category1 -0.880574
                    category2 -1.279748
2013-01-07 00:00:00 category1 -1.652601
                    category2  0.120114
2013-01-07 12:00:00 category1 -1.542260
                    category2  0.469174
2013-01-08 00:00:00 category1  0.890845
                    category2  0.201862
                                ...
2014-05-01 00:00:00 category1 -0.156247
2014-05-01 12:00:00 category1  0.497289
2014-05-02 00:00:00 category1 -0.276293
2014-05-02 12:00:00 category1  0.676201
2014-05-03 00:00:00 category1 -0.014220
2014-05-03 12:00:00 category1  0.101065
2014-05-04 00:00:00 category1  0.171072
2014-05-04 12:00:00 category1 -0.166051
2014-05-05 00:00:00 category1  0.330556
2014-05-05 12:00:00 category1  0.998734
2014-05-06 00:00:00 category1  0.263579
2014-05-06 12:00:00 category1  0.092337
2014-05-07 00:00:00 category1 -1.041369
2014-05-07 12:00:00 category1 -0.425271
2014-05-08 00:00:00 category1 -1.036846
2014-05-08 12:00:00 category1  0.010377
2014-05-09 00:00:00 category1 -0.875446
2014-05-09 12:00:00 category1  0.363559
2014-05-10 00:00:00 category1 -1.586318
2014-05-10 12:00:00 category1 -1.178954
2014-05-11 00:00:00 category1 -0.024954
2014-05-11 12:00:00 category1  0.058637
2014-05-12 00:00:00 category1  1.127713
2014-05-12 12:00:00 category1 -0.259863
2014-05-13 00:00:00 category1  0.053491
2014-05-13 12:00:00 category1  0.193916
2014-05-14 00:00:00 category1 -0.574039
2014-05-14 12:00:00 category1  0.380214
2014-05-15 00:00:00 category1 -3.067484
2014-05-15 12:00:00 category1  1.158423

[1730 rows x 1 columns]

Am I misunderstanding something here?
thanks

@jreback jreback added Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Duplicate Report Duplicate issue or pull request MultiIndex labels Jul 1, 2016
@jreback jreback added this to the No action milestone Jul 1, 2016
@jreback
Copy link
Contributor

jreback commented Jul 1, 2016

duplicated by #12896, closed by #13117

In [8]: pd.options.display.max_rows=10

In [9]: dft2 = pd.DataFrame(np.random.randn(2000, 1),
                     columns=['A'],
                     index=pd.MultiIndex.from_product([pd.date_range('20130101',
                     periods=1000,
                     freq='12H'),
                     ['category1', 'category2']]))

In [10]: dft2
Out[10]: 
                                      A
2013-01-01 00:00:00 category1  0.555222
                    category2  1.067353
2013-01-01 12:00:00 category1 -0.556181
                    category2  1.115773
2013-01-02 00:00:00 category1  0.290784
...                                 ...
2014-05-14 12:00:00 category2 -0.922701
2014-05-15 00:00:00 category1 -0.733799
                    category2 -1.497488
2014-05-15 12:00:00 category1 -0.348072
                    category2  1.387935

[2000 rows x 1 columns]

In [11]: idx=pd.IndexSlice

In [12]: dft2.loc[idx['2013-01-03',:],:]
Out[12]: 
                                      A
2013-01-03 00:00:00 category1 -1.397535
                    category2  0.051541
2013-01-03 12:00:00 category1  0.310371
                    category2 -0.715179

In [13]: dft2.loc[idx['2013',:],:]
Out[13]: 
                                      A
2013-01-01 00:00:00 category1  0.555222
                    category2  1.067353
2013-01-01 12:00:00 category1 -0.556181
                    category2  1.115773
2013-01-02 00:00:00 category1  0.290784
...                                 ...
2013-12-30 12:00:00 category2 -0.494284
2013-12-31 00:00:00 category1 -0.123571
                    category2 -2.059731
2013-12-31 12:00:00 category1  1.465607
                    category2 -0.975960

[1460 rows x 1 columns]

@jreback jreback closed this as completed Jul 1, 2016
@randomgambit
Copy link
Author

damn I was proud of finding a cool bug, you destroyed my dreams

@jorisvandenbossche
Copy link
Member

@randomgambit But anyway, thanks for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

3 participants