-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
COMPAT: Objects construction compat with xarray.Dataset #12400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
from pandas.util.testing import assert_series_equal | ||
import pandas.util.testing as tm | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unintentional - PyCharm - although it is nicer now...
6190d10
to
99f1ee7
Compare
@MaximilianR since you are fixing this, thanks! can you incoporate anything from my original PR that is needed: #12356 |
2f3b8de
to
2477347
Compare
ad80374
to
539ccd5
Compare
539ccd5
to
dddb479
Compare
c2d833c
to
97bac68
Compare
@jreback Ready to go, less a what's new note and removing a commented out test (this is why the build fails). Apologies for mixing formatting & functional improvements. Needs a look through from you, in particular the |
result = Series(xr.Dataset(d)) | ||
expected = Series(d) | ||
|
||
# disabled for the moment (will remove from PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the issues with 0-d arrays discussed in the issue
@MaximilianR if you'd like to rebase/update |
97bac68
to
836672a
Compare
Rebased. |
from pandas import compat | ||
from pandas.compat import (long, zip, map, string_types, | ||
iteritems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put things back in this file as these are not substantive changes (and things getting changed elsewhere)
you have some setting on your editor which just messes with existing (correct) linting / formatting. Pls turn that off. Makes it much harder to see what is changing. Overall I think this approach is nice. |
I've been remiss about the final fixes here. Will need a couple of weeks on other commitments but I will get to this |
@@ -217,11 +217,16 @@ def __init__(self, data=None, index=None, columns=None, dtype=None, | |||
if isinstance(data, DataFrame): | |||
data = data._data | |||
|
|||
if hasattr(data, 'to_dataframe'): # xr.Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would slightly rather skip this special check. In theory, to_dataframe
might return something else.
I don't think it will be that much slower to use the generic dict path, though we then run into the issue that DataFrame._init_dict
does a special check for OrderedDict when deciding whether or not to order the keys. Sadly there's no way in check whether an arbitrary Mapping type has intentionally ordered keys or not (Python-ideas discussed adding collections.abc.Ordered
but I don't think it was implemented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broader question is whether we want objects to be able to define their own conversion to a DataFrame
- I think the main options are:
- Everything is treated as a
dict
- We build the code within the
DataFrame
constructor - i.e. thatif
statement checks forDataSet
(and anything else we want to check) - There is some duck-like method such as
to_dataframe
that classes can define themselves
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with a duck-like method, but we should probably call it something very explicit like _to_pandas_dataframe_
. This would be useful for a lot of other projects, not just xarray.
836672a
to
969d0c6
Compare
969d0c6
to
d1b6c62
Compare
Apologies! I will do soon. Please don't close. |
@MaximilianR want to rebase? I would be onboard with allowing a special method |
At this point there are a number of alternative "dataframe" objects (e.g., dask, spark), many of which probably want to be coercible to pandas, so I think the special method should have |
status? |
can you rebase / update |
Yes, thanks for your patience. What do we want to name the method? |
|
I feel pretty strongly that the name should not just be If we were starting over with NumPy, the name probably wouldn't be |
I don't remember If I suggested the special method, but what is wrong with |
The main reason is that it's totally conceivable that some application might use these for something else, e.g., |
Is this supposed to work?
|
@jreback class WrapMapping(collections.Mapping):
def __init__(self, mapping):
self._map = mapping
def __getitem__(self, key):
return self._map[key]
def __iter__(self):
return iter(self._map)
def __len__(self):
return len(self._map)
>>> np.array(WrapMapping({'a': 1, 'b': 2}))
array(['b', 'a'],
dtype='|S1') This is totally useless though, so I'll make a note to implement a |
@shoyer gotcha, ok, then I guess |
@MaximilianR rebase? |
can you rebase and update? |
closing as stale |
git diff upstream/master | flake8 --diff
I'm changed the dict / mapping issue, but I can't get this part working (so the tests fail at the moment):
Is that a reasonable goal? Where would be a good place to do the conversion?