Inserting subclass/composition of Series into DataFrame strips 'extra' functions/properties #1713

carsonfarmer · 2012-07-31T16:34:30Z

When trying to insert/append a subclass (or composition) of a pandas Series into a DataFrame, any and all of the 'extra' functions that come with my subclass (or composition) are stripped and a Series is created:

In [7]: df = read_csv('some/data/from/file.csv')

In [8]: sp = SpatialSeries(df.the_geom) # SpatialSeries is subclass, the_geom is spatial location (WKT)

In [9]: type(sp)
Out[9]: spseries.SpatialSeries

In [10]: type(df)
Out[10]: pandas.core.frame.DataFrame

In [11]: df['geoms'] = sp

In [12]: type(df['geoms'])
Out[12]: pandas.core.series.Series

I suspect that for the most part, this kind of behaviour is useful, however, I need the extra functions and classes associated with SpatialSeries, and I'd rather not have to subclass DataFrame to create a special DataFrame that allows this. It looks like the culprit is here in frame.py at lines 1761-1772:

    def _set_item(self, key, value):
        """
        Add series to DataFrame in specified column.

        If series is a numpy-array (not a Series/TimeSeries), it must be the
        same length as the DataFrame's index or an error will be thrown.

        Series/TimeSeries will be conformed to the DataFrame's index to
        ensure homogeneity.
        """
        value = self._sanitize_column(key, value)
        NDFrame._set_item(self, key, value)

I particular, I'm looking at value = self._sanitize_column(key, value), which appears to use np.asarray(value) before it returns the input array (even if the input column is a Series). Is there any way to avoid this behaviour? Or alternatively, a better way to implement this so that useful subclasses can be used within a DataFrame? I hope I'm not missing something simple/vital here?

FYI:

In [13]: pandas.__version__
Out[13]: '0.8.0b1'

The text was updated successfully, but these errors were encountered:

lodagro · 2012-08-09T20:17:57Z

see also #60

In [44]: def say_hello(s):
   ....:     print "hello"
   ....:     

In [45]: def my_max(s):
   ....:     print "my_max"
   ....:     

In [46]: pandas.Series.say_hello = say_hello

In [47]: pandas.Series.max = my_max

In [48]: s = pandas.Series(np.random.rand(4), index=list('abcd'))

In [49]: s.say_hello()
hello

In [50]: s.max()
my_max

In [51]: df = pandas.DataFrame(np.random.randint(0, 10, (4,4)), index=list('abcd'), columns=list('ABCD'))

In [52]: df
Out[52]: 
   A  B  C  D
a  6  0  9  5
b  9  1  1  3
c  1  9  6  6
d  8  4  8  7

In [53]: df['A'].say_hello()
hello

In [55]: df['A'].max()
my_max

wesm · 2012-08-12T19:19:52Z

We need per-column metadata to be able to do what you're describing. I'm still unsure about the design

dalejung · 2013-07-17T02:20:19Z

Can you see if https://github.com/dalejung/pandas-composition works for you?

carsonfarmer · 2013-07-18T20:42:46Z

This is excellent! Would solve pretty much any of the issues I can think of
at this stage.

On Tue, Jul 16, 2013 at 10:20 PM, dalejung [email protected] wrote:

Can you see if https://github.com/dalejung/pandas-composition works for
you?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21088179
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

jtratner · 2013-07-18T22:18:04Z

I'm glad you've found something that works, does that mean this is going to
be closed? Or do you all want to consider it? If so, I could pick this up
later on after I prep the other PRs I'm working on for merge.

carsonfarmer · 2013-07-22T17:50:26Z

While I think pandas_composition is a 'solution' for now, per-column
metadata would certainly be a real bonus to pandas DataFrames. It is
probably sufficient to close this item and open a new enhancement item for
per-column metadata. I've started a related PR that needs lots of work
before anything will happen with it:
#4271 #4271

Carson

On Thu, Jul 18, 2013 at 6:18 PM, Jeff Tratner [email protected]:

I'm glad you've found something that works, does that mean this is going to
be closed? Or do you all want to consider it? If so, I could pick this up
later on after I prep the other PRs I'm working on for merge.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21219237
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

jreback · 2013-07-22T17:57:54Z

@cfarmer that is issue #39

carsonfarmer · 2013-07-22T18:20:09Z

Right, missed that one.

On Mon, Jul 22, 2013 at 1:58 PM, jreback [email protected] wrote:

@cfarmer https://github.com/cfarmer that is issue #39 #39

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21362887
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

dalejung · 2013-07-22T18:22:50Z

If it helps, pandas_composition does have column meta data that persists.

jreback · 2013-07-22T18:29:28Z

@cfarmer and #3482 coming in very shortly (right after 0.12 release); it encompasses the changes in this PR (and I makes sub-classing a bit easier, but still not completely trivial)

carsonfarmer · 2013-07-22T18:49:22Z

@jreback Now that I'm taking a closer look at your PR, I see all the lovely
(self._constructor)s in there. Excellent! I think this is sufficient for me
to label my pull request as a duplicate and close it (can I do that?).

On Mon, Jul 22, 2013 at 2:29 PM, jreback [email protected] wrote:

@cfarmer https://github.com/cfarmer and #3482 https://github.com/pydata/pandas/issues/3482coming in very shortly (right after 0.12 release); it encompasses the
changes in this PR (and I makes sub-classing a bit easier, but still not
completely trivial)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21365066
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

jreback · 2013-07-22T18:50:47Z

sure.....(this is been in the works for a while FYI)

carsonfarmer · 2013-07-22T18:54:46Z

Yes I see that now, wish I had seen this sooner!

On Mon, Jul 22, 2013 at 2:50 PM, jreback [email protected] wrote:

sure.....(this is been in the works for a while FYI)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1713#issuecomment-21366635
.

Dr. Carson J. Q. Farmer
Assistant Professor
Department of Geography
Hunter College - CUNY
695 Park Avenue
New York, NY, 10065
[email protected]
@carsonfarmer
www.CarsonFarmer.com

jreback · 2013-07-22T18:57:47Z

ironically your pr caused me to rebase back to current (as this was written right about the time 0.11 release) (mostly)...so thanks!

TomAugspurger · 2017-05-04T13:35:39Z

covered by #2485

carsonfarmer mentioned this issue Jul 16, 2013

Easier sub-classing for Series and DataFrame #4271

Closed

TomAugspurger closed this as completed May 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inserting subclass/composition of Series into DataFrame strips 'extra' functions/properties #1713

Inserting subclass/composition of Series into DataFrame strips 'extra' functions/properties #1713

carsonfarmer commented Jul 31, 2012

lodagro commented Aug 9, 2012

wesm commented Aug 12, 2012

dalejung commented Jul 17, 2013

carsonfarmer commented Jul 18, 2013

jtratner commented Jul 18, 2013

carsonfarmer commented Jul 22, 2013

jreback commented Jul 22, 2013

carsonfarmer commented Jul 22, 2013

dalejung commented Jul 22, 2013

jreback commented Jul 22, 2013

carsonfarmer commented Jul 22, 2013

jreback commented Jul 22, 2013

carsonfarmer commented Jul 22, 2013

jreback commented Jul 22, 2013

TomAugspurger commented May 4, 2017

Inserting subclass/composition of Series into DataFrame strips 'extra' functions/properties #1713

Inserting subclass/composition of Series into DataFrame strips 'extra' functions/properties #1713

Comments

carsonfarmer commented Jul 31, 2012

lodagro commented Aug 9, 2012

wesm commented Aug 12, 2012

dalejung commented Jul 17, 2013

carsonfarmer commented Jul 18, 2013

jtratner commented Jul 18, 2013

carsonfarmer commented Jul 22, 2013

jreback commented Jul 22, 2013

carsonfarmer commented Jul 22, 2013

dalejung commented Jul 22, 2013

jreback commented Jul 22, 2013

carsonfarmer commented Jul 22, 2013

jreback commented Jul 22, 2013

carsonfarmer commented Jul 22, 2013

jreback commented Jul 22, 2013

TomAugspurger commented May 4, 2017