Skip to content

Inserting subclass/composition of Series into DataFrame strips 'extra' functions/properties #1713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
carsonfarmer opened this issue Jul 31, 2012 · 15 comments

Comments

@carsonfarmer
Copy link

When trying to insert/append a subclass (or composition) of a pandas Series into a DataFrame, any and all of the 'extra' functions that come with my subclass (or composition) are stripped and a Series is created:

In [7]: df = read_csv('some/data/from/file.csv')

In [8]: sp = SpatialSeries(df.the_geom) # SpatialSeries is subclass, the_geom is spatial location (WKT)

In [9]: type(sp)
Out[9]: spseries.SpatialSeries

In [10]: type(df)
Out[10]: pandas.core.frame.DataFrame

In [11]: df['geoms'] = sp

In [12]: type(df['geoms'])
Out[12]: pandas.core.series.Series

I suspect that for the most part, this kind of behaviour is useful, however, I need the extra functions and classes associated with SpatialSeries, and I'd rather not have to subclass DataFrame to create a special DataFrame that allows this. It looks like the culprit is here in frame.py at lines 1761-1772:

    def _set_item(self, key, value):
        """
        Add series to DataFrame in specified column.

        If series is a numpy-array (not a Series/TimeSeries), it must be the
        same length as the DataFrame's index or an error will be thrown.

        Series/TimeSeries will be conformed to the DataFrame's index to
        ensure homogeneity.
        """
        value = self._sanitize_column(key, value)
        NDFrame._set_item(self, key, value)

I particular, I'm looking at value = self._sanitize_column(key, value), which appears to use np.asarray(value) before it returns the input array (even if the input column is a Series). Is there any way to avoid this behaviour? Or alternatively, a better way to implement this so that useful subclasses can be used within a DataFrame? I hope I'm not missing something simple/vital here?

FYI:

In [13]: pandas.__version__
Out[13]: '0.8.0b1'
@lodagro
Copy link
Contributor

lodagro commented Aug 9, 2012

see also #60

In [44]: def say_hello(s):
   ....:     print "hello"
   ....:     

In [45]: def my_max(s):
   ....:     print "my_max"
   ....:     

In [46]: pandas.Series.say_hello = say_hello

In [47]: pandas.Series.max = my_max

In [48]: s = pandas.Series(np.random.rand(4), index=list('abcd'))

In [49]: s.say_hello()
hello

In [50]: s.max()
my_max

In [51]: df = pandas.DataFrame(np.random.randint(0, 10, (4,4)), index=list('abcd'), columns=list('ABCD'))

In [52]: df
Out[52]: 
   A  B  C  D
a  6  0  9  5
b  9  1  1  3
c  1  9  6  6
d  8  4  8  7

In [53]: df['A'].say_hello()
hello

In [55]: df['A'].max()
my_max

@wesm
Copy link
Member

wesm commented Aug 12, 2012

We need per-column metadata to be able to do what you're describing. I'm still unsure about the design

@dalejung
Copy link
Contributor

Can you see if https://github.com/dalejung/pandas-composition works for you?

@carsonfarmer
Copy link
Author

This is excellent! Would solve pretty much any of the issues I can think of
at this stage.

On Tue, Jul 16, 2013 at 10:20 PM, dalejung [email protected] wrote:

Can you see if https://github.com/dalejung/pandas-composition works for
you?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21088179
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

@jtratner
Copy link
Contributor

I'm glad you've found something that works, does that mean this is going to
be closed? Or do you all want to consider it? If so, I could pick this up
later on after I prep the other PRs I'm working on for merge.

@carsonfarmer
Copy link
Author

While I think pandas_composition is a 'solution' for now, per-column
metadata would certainly be a real bonus to pandas DataFrames. It is
probably sufficient to close this item and open a new enhancement item for
per-column metadata. I've started a related PR that needs lots of work
before anything will happen with it:
#4271#4271

Carson

On Thu, Jul 18, 2013 at 6:18 PM, Jeff Tratner [email protected]:

I'm glad you've found something that works, does that mean this is going to
be closed? Or do you all want to consider it? If so, I could pick this up
later on after I prep the other PRs I'm working on for merge.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21219237
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

@cfarmer that is issue #39

@carsonfarmer
Copy link
Author

Right, missed that one.

On Mon, Jul 22, 2013 at 1:58 PM, jreback [email protected] wrote:

@cfarmer https://github.com/cfarmer that is issue #39#39


Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21362887
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

@dalejung
Copy link
Contributor

If it helps, pandas_composition does have column meta data that persists.

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

@cfarmer and #3482 coming in very shortly (right after 0.12 release); it encompasses the changes in this PR (and I makes sub-classing a bit easier, but still not completely trivial)

@carsonfarmer
Copy link
Author

@jreback Now that I'm taking a closer look at your PR, I see all the lovely
(self._constructor)s in there. Excellent! I think this is sufficient for me
to label my pull request as a duplicate and close it (can I do that?).

On Mon, Jul 22, 2013 at 2:29 PM, jreback [email protected] wrote:

@cfarmer https://github.com/cfarmer and #3482https://github.com/pydata/pandas/issues/3482coming in very shortly (right after 0.12 release); it encompasses the
changes in this PR (and I makes sub-classing a bit easier, but still not
completely trivial)


Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21365066
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

sure.....(this is been in the works for a while FYI)

@carsonfarmer
Copy link
Author

Yes I see that now, wish I had seen this sooner!

On Mon, Jul 22, 2013 at 2:50 PM, jreback [email protected] wrote:

sure.....(this is been in the works for a while FYI)


Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21366635
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

ironically your pr caused me to rebase back to current (as this was written right about the time 0.11 release) (mostly)...so thanks!

@TomAugspurger
Copy link
Contributor

covered by #2485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants