-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. #22673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
670e768
a7ecbb2
ce5098a
08561d2
93edaca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2498,11 +2498,15 @@ def to_xarray(self): | |
a Dataset for a DataFrame | ||
a DataArray for higher dims | ||
|
||
See also | ||
-------- | ||
DataFrame.to_csv : Write out to a csv file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Personally I don't like "recommending" |
||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame({'A' : [1, 1, 2], | ||
'B' : ['foo', 'bar', 'foo'], | ||
'C' : np.arange(4.,7)}) | ||
... 'B' : ['foo', 'bar', 'foo'], | ||
... 'C' : np.arange(4.,7)}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think using arange makes the code more complicated for no reason. Using Also, it'd be good to have a meaningful example. The code in the example is difficult to follow as there is no way to know that the object column is B, more than looking at the example. If we get something few samples with animals with |
||
>>> df | ||
A B C | ||
0 1 foo 4.0 | ||
|
@@ -2520,9 +2524,9 @@ def to_xarray(self): | |
C (index) float64 4.0 5.0 6.0 | ||
|
||
>>> df = pd.DataFrame({'A' : [1, 1, 2], | ||
'B' : ['foo', 'bar', 'foo'], | ||
'C' : np.arange(4.,7)} | ||
).set_index(['B','A']) | ||
... 'B' : ['foo', 'bar', 'foo'], | ||
... 'C' : np.arange(4.,7)} | ||
... ).set_index(['B','A']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see the need to repeat the previous DataFrame, we can just have |
||
>>> df | ||
C | ||
B A | ||
|
@@ -2539,35 +2543,33 @@ def to_xarray(self): | |
Data variables: | ||
C (B, A) float64 5.0 nan 4.0 6.0 | ||
|
||
>>> p = pd.Panel(np.arange(24).reshape(4,3,2), | ||
items=list('ABCD'), | ||
major_axis=pd.date_range('20130101', periods=3), | ||
minor_axis=['first', 'second']) | ||
>>> p | ||
<class 'pandas.core.panel.Panel'> | ||
Dimensions: 4 (items) x 3 (major_axis) x 2 (minor_axis) | ||
Items axis: A to D | ||
Major_axis axis: 2013-01-01 00:00:00 to 2013-01-03 00:00:00 | ||
Minor_axis axis: first to second | ||
|
||
>>> p.to_xarray() | ||
<xarray.DataArray (items: 4, major_axis: 3, minor_axis: 2)> | ||
array([[[ 0, 1], | ||
[ 2, 3], | ||
[ 4, 5]], | ||
[[ 6, 7], | ||
[ 8, 9], | ||
[10, 11]], | ||
[[12, 13], | ||
[14, 15], | ||
[16, 17]], | ||
[[18, 19], | ||
[20, 21], | ||
[22, 23]]]) | ||
>>> index = pd.MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], | ||
... ['one', 'two']], | ||
... labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], | ||
... names=['first', 'second']) | ||
|
||
>>> s = pd.Series(np.arange(8), index=index) | ||
>>> s | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find this too complicated for what we need to show. To have a Series with a multiindex with a datetime level, we can have something like:
I haven't used much xarray myself, and not sure what makes sense to show here. May be:
If that makes sense, I think with the first example, we can have @jreback does this make sense? Sorry for requesting the changes @Moisan, but my I find like the current version gives the idea that we're trying to show something more complex than what we are actually showing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem, I'm happy to make the examples more relevant :). |
||
first second | ||
bar one 0 | ||
two 1 | ||
baz one 2 | ||
two 3 | ||
foo one 4 | ||
two 5 | ||
qux one 6 | ||
two 7 | ||
dtype: int64 | ||
|
||
>>> s.to_xarray() | ||
<xarray.DataArray (first: 4, second: 2)> | ||
array([[0, 1], | ||
[2, 3], | ||
[4, 5], | ||
[6, 7]]) | ||
Coordinates: | ||
* items (items) object 'A' 'B' 'C' 'D' | ||
* major_axis (major_axis) datetime64[ns] 2013-01-01 2013-01-02 2013-01-03 # noqa | ||
* minor_axis (minor_axis) object 'first' 'second' | ||
* first (first) object 'bar' 'baz' 'foo' 'qux' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would be better to have a datetime index for 1 level |
||
* second (second) object 'one' 'two' | ||
|
||
Notes | ||
----- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind replacing this to the standard format? Only the type in the first line, and a description in the next. For example: