COMPAT: Objects construction compat with xarray.Dataset #12400

max-sixty · 2016-02-20T06:59:54Z

closes Supplying an xarray Dataset to DataFrame constructor breaks #12353
tests added
tests passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

I'm changed the dict / mapping issue, but I can't get this part working (so the tests fail at the moment):

But pd.Series(dict(xr.Dataset({'a': 2, 'b': 3}))) doesn't work either, because the values in the dict are 0-th dimension arrays, and Series doesn't unpack them (list(dict(xr.Dataset({'a': 2, 'b': 3})).values())[0] for the full code)
#12353 (comment)

Is that a reasonable goal? Where would be a good place to do the conversion?

max-sixty · 2016-02-20T07:30:32Z

pandas/tests/series/test_constructors.py

 from pandas.util.testing import assert_series_equal
-import pandas.util.testing as tm
-


This was unintentional - PyCharm - although it is nicer now...

jreback · 2016-03-23T11:57:03Z

@MaximilianR since you are fixing this, thanks!

can you incoporate anything from my original PR that is needed: #12356

max-sixty · 2016-05-03T16:33:11Z

@jreback Ready to go, less a what's new note and removing a commented out test (this is why the build fails).

Apologies for mixing formatting & functional improvements.

Needs a look through from you, in particular the to_dataframe check.

max-sixty · 2016-05-03T16:34:31Z

pandas/tests/series/test_constructors.py

+        result = Series(xr.Dataset(d))
+        expected = Series(d)
+
+        # disabled for the moment (will remove from PR)


Because of the issues with 0-d arrays discussed in the issue

jreback · 2016-05-13T23:38:15Z

@MaximilianR if you'd like to rebase/update

max-sixty · 2016-05-17T04:30:02Z

Rebased.
@jreback, awaiting your review

jreback · 2016-05-17T13:29:12Z

pandas/core/common.py

 from pandas import compat
-from pandas.compat import (long, zip, map, string_types,
-                           iteritems)


put things back in this file as these are not substantive changes (and things getting changed elsewhere)

jreback · 2016-05-17T13:34:52Z

you have some setting on your editor which just messes with existing (correct) linting / formatting. Pls turn that off. Makes it much harder to see what is changing.

Overall I think this approach is nice. .to_dataframe() is a good approach / very duck-like.

max-sixty · 2016-07-14T18:25:43Z

I've been remiss about the final fixes here. Will need a couple of weeks on other commitments but I will get to this

shoyer · 2016-08-22T16:18:14Z

pandas/core/frame.py

@@ -217,11 +217,16 @@ def __init__(self, data=None, index=None, columns=None, dtype=None,
        if isinstance(data, DataFrame):
            data = data._data

+        if hasattr(data, 'to_dataframe'):  # xr.Dataset


I would slightly rather skip this special check. In theory, to_dataframe might return something else.

I don't think it will be that much slower to use the generic dict path, though we then run into the issue that DataFrame._init_dict does a special check for OrderedDict when deciding whether or not to order the keys. Sadly there's no way in check whether an arbitrary Mapping type has intentionally ordered keys or not (Python-ideas discussed adding collections.abc.Ordered but I don't think it was implemented).

The broader question is whether we want objects to be able to define their own conversion to a DataFrame - I think the main options are:

Everything is treated as a dict

We build the code within the DataFrame constructor - i.e. that if statement checks for DataSet (and anything else we want to check)

There is some duck-like method such as to_dataframe that classes can define themselves

Thoughts?

I'm happy with a duck-like method, but we should probably call it something very explicit like _to_pandas_dataframe_. This would be useful for a lot of other projects, not just xarray.

jreback · 2016-10-06T10:57:06Z

@MaximilianR ?

max-sixty · 2016-10-06T23:59:12Z

Apologies! I will do soon. Please don't close.

jreback · 2016-12-21T23:07:04Z

@MaximilianR want to rebase?

I would be onboard with allowing a special method __dataframe__ which is pretty clear IMHO.

shoyer · 2016-12-21T23:09:40Z

I would be onboard with allowing a special method dataframe which is pretty clear IMHO.

At this point there are a number of alternative "dataframe" objects (e.g., dask, spark), many of which probably want to be coercible to pandas, so I think the special method should have pandas in the name to clarify. Hence my suggestion for _to_pandas_dataframe_.

jreback · 2017-01-21T23:17:37Z

status?

jreback · 2017-03-20T13:55:09Z

can you rebase / update

max-sixty · 2017-03-20T15:17:46Z

Yes, thanks for your patience.

What do we want to name the method? __dataframe__? _to_dataframe?

jreback · 2017-03-20T17:52:20Z

__dataframe__. I don't think it matters that others have dataframes. pandas is the standard in python, just as numpy is for arrays.

shoyer · 2017-03-20T18:56:00Z

I feel pretty strongly that the name should not just be __dataframe__ and we should have pandas in it somewhere.

If we were starting over with NumPy, the name probably wouldn't be __array__, either.

jreback · 2017-03-20T19:12:30Z

I don't remember If I suggested the special method, but what is wrong with .to_dataframe()? (or .to_pandas()?

shoyer · 2017-03-20T20:41:52Z

I don't remember If I suggested the special method, but what is wrong with .to_dataframe()? (or .to_pandas()?

The main reason is that it's totally conceivable that some application might use these for something else, e.g., to_dataframe() might make a spark dataframe instead. In fact, xarray sometimes uses to_pandas() to create a pandas.Series (if the array is 1D). The advantage of looking for a special method with a name like _to_pandas_dataframe_ is that there is almost no chance that it will be used incorrectly.

jreback · 2017-03-20T21:00:31Z

@shoyer

Is this supposed to work?

In [30]: x = DataFrame({'A': [1, 2, 3], 'B': [3., 4., 5.]}).to_xarray()

In [31]: x
Out[31]: 
<xarray.Dataset>
Dimensions:  (index: 3)
Coordinates:
  * index    (index) int64 0 1 2
Data variables:
    A        (index) int64 1 2 3
    B        (index) float64 3.0 4.0 5.0

In [32]: np.array(x)
Out[32]: 
array(['index', 'A', 'B'], 
      dtype='<U5')

shoyer · 2017-03-21T01:25:54Z

@jreback xarray.Dataset is not meant to be convertable to a NumPy array, so it doesn't define __array__. But it does subclass collections.Mapping, so NumPy apparently manages to convert it by iterating over the keys, like this class:

class WrapMapping(collections.Mapping):
  def __init__(self, mapping):
    self._map = mapping
  def __getitem__(self, key):
    return self._map[key]
  def __iter__(self):
    return iter(self._map)
  def __len__(self):
    return len(self._map)

>>> np.array(WrapMapping({'a': 1, 'b': 2}))
array(['b', 'a'], 
      dtype='|S1')

This is totally useless though, so I'll make a note to implement a __array__ for Dataset that just raises TypeError with a helpful error message.

jreback · 2017-03-21T13:38:48Z

@shoyer gotcha, ok, then I guess __pandas_dataframe__ is prob reasonable for external converters.

jreback · 2017-05-07T13:57:16Z

@MaximilianR rebase?

jreback · 2017-06-10T19:04:46Z

can you rebase and update?

jreback · 2017-08-17T10:28:43Z

closing as stale

max-sixty changed the title ~~Series dataset compat~~ COMPAT: Series construction compat with xarray.Dataset Feb 20, 2016

max-sixty reviewed Feb 20, 2016
View reviewed changes

max-sixty force-pushed the series-dataset-compat branch 2 times, most recently from 6190d10 to 99f1ee7 Compare March 23, 2016 11:36

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Compat pandas objects compatability with Numpy or Python functions labels Mar 23, 2016

jreback mentioned this pull request Mar 23, 2016

COMPAT: DataFrame construction compat with xarray.Dataset #12356

Closed

max-sixty force-pushed the series-dataset-compat branch from 2f3b8de to 2477347 Compare March 24, 2016 23:29

max-sixty force-pushed the series-dataset-compat branch 2 times, most recently from ad80374 to 539ccd5 Compare April 6, 2016 23:11

max-sixty changed the title ~~COMPAT: Series construction compat with xarray.Dataset~~ COMPAT: Objects construction compat with xarray.Dataset Apr 22, 2016

max-sixty force-pushed the series-dataset-compat branch from 539ccd5 to dddb479 Compare April 22, 2016 06:17

max-sixty force-pushed the series-dataset-compat branch 4 times, most recently from c2d833c to 97bac68 Compare May 3, 2016 16:22

max-sixty reviewed May 3, 2016
View reviewed changes

max-sixty force-pushed the series-dataset-compat branch from 97bac68 to 836672a Compare May 17, 2016 04:23

jreback reviewed May 17, 2016
View reviewed changes

max-sixty mentioned this pull request Aug 22, 2016

Supplying an xarray Dataset to DataFrame constructor breaks #12353

Closed

shoyer reviewed Aug 22, 2016
View reviewed changes

max-sixty force-pushed the series-dataset-compat branch from 836672a to 969d0c6 Compare August 25, 2016 01:11

dataframe & series take Mappings & xr.Datasets

d1b6c62

max-sixty force-pushed the series-dataset-compat branch from 969d0c6 to d1b6c62 Compare August 25, 2016 01:17

shoyer mentioned this pull request Mar 21, 2017

xarray.Dataset.__array__ should raise TypeError pydata/xarray#1314

Closed

jreback closed this Aug 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMPAT: Objects construction compat with xarray.Dataset #12400

COMPAT: Objects construction compat with xarray.Dataset #12400

max-sixty commented Feb 20, 2016

max-sixty Feb 20, 2016

jreback commented Mar 23, 2016

max-sixty commented May 3, 2016 •

edited

Loading

max-sixty May 3, 2016

jreback commented May 13, 2016

max-sixty commented May 17, 2016

jreback May 17, 2016

jreback commented May 17, 2016

max-sixty commented Jul 14, 2016

shoyer Aug 22, 2016

max-sixty Aug 22, 2016

shoyer Aug 22, 2016

jreback commented Oct 6, 2016

max-sixty commented Oct 6, 2016

jreback commented Dec 21, 2016

shoyer commented Dec 21, 2016

jreback commented Jan 21, 2017

jreback commented Mar 20, 2017

max-sixty commented Mar 20, 2017

jreback commented Mar 20, 2017

shoyer commented Mar 20, 2017

jreback commented Mar 20, 2017

shoyer commented Mar 20, 2017

jreback commented Mar 20, 2017

shoyer commented Mar 21, 2017 •

edited

Loading

jreback commented Mar 21, 2017

jreback commented May 7, 2017

jreback commented Jun 10, 2017

jreback commented Aug 17, 2017

		from pandas.util.testing import assert_series_equal
		import pandas.util.testing as tm

COMPAT: Objects construction compat with xarray.Dataset #12400

COMPAT: Objects construction compat with xarray.Dataset #12400

Conversation

max-sixty commented Feb 20, 2016

max-sixty Feb 20, 2016

Choose a reason for hiding this comment

jreback commented Mar 23, 2016

max-sixty commented May 3, 2016 • edited Loading

max-sixty May 3, 2016

Choose a reason for hiding this comment

jreback commented May 13, 2016

max-sixty commented May 17, 2016

jreback May 17, 2016

Choose a reason for hiding this comment

jreback commented May 17, 2016

max-sixty commented Jul 14, 2016

shoyer Aug 22, 2016

Choose a reason for hiding this comment

max-sixty Aug 22, 2016

Choose a reason for hiding this comment

shoyer Aug 22, 2016

Choose a reason for hiding this comment

jreback commented Oct 6, 2016

max-sixty commented Oct 6, 2016

jreback commented Dec 21, 2016

shoyer commented Dec 21, 2016

jreback commented Jan 21, 2017

jreback commented Mar 20, 2017

max-sixty commented Mar 20, 2017

jreback commented Mar 20, 2017

shoyer commented Mar 20, 2017

jreback commented Mar 20, 2017

shoyer commented Mar 20, 2017

jreback commented Mar 20, 2017

shoyer commented Mar 21, 2017 • edited Loading

jreback commented Mar 21, 2017

jreback commented May 7, 2017

jreback commented Jun 10, 2017

jreback commented Aug 17, 2017

max-sixty commented May 3, 2016 •

edited

Loading

shoyer commented Mar 21, 2017 •

edited

Loading