Skip to content

Problem using panel.to_frame() conversion #3690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
etiennecha opened this issue May 24, 2013 · 4 comments
Closed

Problem using panel.to_frame() conversion #3690

etiennecha opened this issue May 24, 2013 · 4 comments

Comments

@etiennecha
Copy link

Hi,

Basically I wanted to kind of replicate the example given in the doc about to_frame() with data that I have.

In the doc:
panel = Panel(np.random.randn(3, 5, 4), items=['one', 'two', 'three'], major_axis=date_range('1/1/2000', periods=5), minor_axis=['a', 'b', 'c', 'd'])

gives:

<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 5 (major_axis) x 4 (minor_axis)
Items axis: one to three
Major_axis axis: 2000-01-01 00:00:00 to 2000-01-05 00:00:00
Minor_axis axis: a to d

then if you use panel.to_frame().index() you will of course get:
MultiIndex
[(2000-01-01 00:00:00, a), (2000-01-01 00:00:00, b), (2000-01-01 00:00:00, c), (2000-01-01 00:00:00, d), (2000-01-02 00:00:00, a), (2000-01-02 00:00:00, b), (2000-01-02 00:00:00, c), (2000-01-02 00:00:00, d), (2000-01-03 00:00:00, a), (2000-01-03 00:00:00, b), (2000-01-03 00:00:00, c), (2000-01-03 00:00:00, d), (2000-01-04 00:00:00, a), (2000-01-04 00:00:00, b), (2000-01-04 00:00:00, c), (2000-01-04 00:00:00, d), (2000-01-05 00:00:00, a), (2000-01-05 00:00:00, b), (2000-01-05 00:00:00, c), (2000-01-05 00:00:00, d)]

I do the same with my panel data (basically I have 254 daily prices/brand (there are brand changes) at 9995 gas stations:
pd_panel_data_master
<class 'pandas.core.panel.Panel'>
Dimensions: 9995 (items) x 254 (major_axis) x 2 (minor_axis)
Items axis: 10000001 to 9700001
Major_axis axis: 20110904 to 20120514
Minor_axis axis: brand to price

then I use: pd_panel_data_master.to_frame().index
MultiIndex
[(20110904, brand), (20110905, brand), (20110906, brand), (20110907, brand), (20110908, brand), (20110909, brand), (20110910, brand), (20110911, brand), (20110912, brand), (20110913, brand), (20110914, brand), (20110915, brand), (20110916, brand), (20110917, brand), (20110918, brand), (20110919, brand), (20110920, brand), (20110921, brand), (20110922, brand), (20110923, brand), (20110924, brand), (20110925, brand), (20110926, brand), (20110927, brand), (20110928, brand), (20110929, brand), (20110930, brand), (20111001, brand), (20111002, brand), (20111003, brand), (20111004, brand), (20111005, brand), (20111006, brand), (20111007, brand), (20111008, brand), (20111009, brand), (20111010, brand), (20111011, brand), (20111012, brand), (20111013, brand), (20111014, brand), (20111015, brand), (20111016, brand), (20111017, brand), (20111018, brand), (20111019, brand), (20111020, brand), (20111021, brand), (20111022, brand), (20111023, brand), (20120326, brand), (20120327, brand), (20120328, brand), (20120329, brand), (2012033, brand), (20120331, brand), (20120401, brand), (20120402, brand), (20120403, brand), (20120404, brand), (20120405, brand), (20120406, brand), (20120407, brand), (20120408, brand), (20120409, brand), (20120410, brand), (20120411, brand), (20120412, brand), (20120413, brand), (20120414, brand), (20120415, brand), (20120416, brand), (20120417, brand), (20120418, brand), (20120419, brand), (20120420, brand), (20120421, brand), (20120422, brand), (20120423, brand), (20120424, brand), (20120425, brand), (20120426, brand), (20120427, brand), (20120428, brand), (20120429, brand), (20120430, brand), (20120501, brand), (20120502, brand), (20120503, brand), (20120504, brand), (20120505, brand), (20120506, brand), (20120507, brand), (20120508, brand), (20120509, brand), (20120510, brand), (20120511, brand), (20120512, brand), (20120513, brand), (20120514, brand)]

I don't get the (date, price) indexes... for the record, multi index object obtained is:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 254 entries, (20110904, brand) to (20120514, brand)
Columns: 9995 entries, 10000001 to 9700001
dtypes: object(9995)

Am I missing something here, or is there a real issue?

Cheers,

Etienne

PS: I upgraded to 0.11.0 version right after posting (:x) but the issue remains essentially the same (though the form of the output slightly changes)

@jreback
Copy link
Contributor

jreback commented May 24, 2013

try

p.to_frame(filter_observations=False)

@etiennecha
Copy link
Author

works indeed, thanks a lot, should I try to understand why? -_-

@jreback
Copy link
Contributor

jreback commented May 24, 2013

Let me try to explain.

so a Panel is items x major x minor which you are translating to items = columns, index = (major,minor)

this filtering drops the index entry if there are not a complete set of observations across all of the columns, e.g. if there are no nans; I think the default should really be filter_observations=False, but it might be a back compat issue (I think wes wrote this a while back).

@etiennecha
Copy link
Author

All clear, default to False would seem wise idd, many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants