BUG: GH10536 in concat for SparseSeries #10626

kawochen · 2015-07-19T06:54:19Z

To address #10536, but it's clearly not enough. What should be done for SparseSeries of different kinds and different fill values?

jreback · 2015-07-19T10:38:15Z

you can only promote if the things u r concatting have the same fill

the kind doesn't matter

artemyk · 2015-07-20T19:21:30Z

Not sure if this is accurate, but would calling values on the all the concatenated objects convert them to dense first? It would be nice if sparse concat didn't do this.

kawochen · 2015-07-20T21:06:47Z

.values returns a SparseArray and there is a _concat_compat specialized for sparse types, but beyond that I haven't really checked.

jreback · 2015-07-20T21:58:37Z

pandas/tools/merge.py

@@ -894,13 +896,21 @@ def get_result(self):
            if self.axis == 0:
                new_data = com._concat_compat([x.values for x in self.objs])
                name = com._consensus_name_attr(self.objs)
-                return Series(new_data, index=self.new_axes[0], name=name).__finalize__(self, method='concat')
+                if self._is_sp_series:


don't like all of these if-thens. The basic problem is the existence of SparseSeries/SparseDataFrame, which don't really need to exist at all. The blocks actually hold the sparse data. But that is a bit bigger project.

Could you do ._constructor to solve the immediate issue and allow for those objects to be removed in time?

That was my first approach but I also wanted to keep the symmetry between axis = 1 and axis = 0. And if you want to check fill values etc, the case work seems necessary (if ugly) without refactoring.

You can't replace this block with self._constructor(...?

if self._is_sp_series: klass = SparseDataFrame else: klass = DataFrame return klass(...

Sorry, I'm not seeing how that would work? Do Series and DataFrame share the same constructor?

I was a little fast on the trigger. I didn't realize that you need ._is_frame etc elsewhere.

For this code, this is what I was originally thinking. Tell me if I'm still off, though

In __init__

self.obj_constructor=sample._constructor

and then in get_result

self.obj_constructor(...

But the result could be a SparseDataFrame when the sample is a SparseSeries

If sample is a SparseSeries, then sample._constructor will also be a SparseSeries...?

Yes, you can do that if self.axis==0, but concatenation could happen along a different axis, so I thought the code would be easier to understand if the different cases were handled in roughly the same way.

kawochen · 2015-07-22T15:56:32Z

Since SparseSeries is for float64 only according to this doc string, are the tests here valid?

artemyk · 2015-07-22T18:50:43Z

@jreback Out of curiosity --- is there any particular reason for the SparseDataFrame vs DataFrame class distinction (as opposed to just having sparse vs dense underlying block managers?)

jreback · 2015-07-22T19:05:42Z

@artemyk I made this comment in another issue I think, but can't seem to find it.

The short answer is sort-of. I think the original intent of SparseDataFrame was to have a matrix like object that understood indexes. Kind of like a csr/coo (or pick your favorite storage format). That would then for example be all a single dtype, and thus could handle auto-filling.

However after the refactor to move Series out of sub-classing ndarray (0.13). This was no longer necessary (e.g. the SparseBlock became a real thing). So at this point SparseDataFrame is not really necessary at all. SparseSeries OTOH, does have utiliity as it carries around the filling property for example.

So bottom line, I think we could entirely drop SparseDataFrame and everything would work (and of course drop SparsePanel which has an original non-useful internal implementation).

artemyk · 2015-07-22T19:30:56Z

@jreback Thanks, that makes sense. I"m actually feeling like the underlying sparse implementation could use an overhaul. For example, if I understand correctly, sparse blocks cannot be consolidated right now. So, if get_dummies is called on a column with 100,000 values, we get a dataframe with 100,000 blocks. I imagine this can strongly hurt performance in some areas. An underlying sparse block manager that can truly handle sparse matrices (not just series) of various data types would be nice.

jreback · 2015-07-22T19:33:46Z

well, we have been thinking about what to do with sparse. I am not 100% sure why @wesm originally wrote this. But then again I don't know the state of scipy sparse back in 2010.

So we could have an alternative (or maybe THE) sparse repr that actually is something from scipy, e.g. coo/csr. And just use the current index stuff. It would be something of a project, but yes, could be very nice.

artemyk · 2015-07-22T19:38:06Z

Unfortunately it seems that scipy.sparse only supports numeric datatypes. Perhaps a wrapper around something like the Boost sparse matrix library (http://www.boost.org/doc/libs/1_45_0/libs/numeric/ublas/doc/matrix_sparse.htm#2CompressedMatrix) could be used instead.

wesm · 2015-07-22T20:40:44Z

@jreback AFAIK isn't pandas.sparse the only solution out there for "mostly NA" sparse tables (scipy.sparse is "mostly 0", which is not the same thing at all)? That was the original use case: panel regressions involving future contract data over a long history. So you might have a time series that only appears for 3-6 months out of a 30 year period, and want to convert that to a long (dense) panel form without insane memory use

jreback · 2015-07-28T10:55:54Z

yeah, have been thinking about sparse and what to do. a suggestion was made to split this off into pandas-sparse, to allow a more freedom to experiment (but still keep some integration). The actual sparse impl is pretty ok. I think we need to blow away SparsePanel (as it has an older impl and not really much supported). Maybe SparseDataFrame (as the blocks can simply be included in a DataFrame), though the usecase for a 2-d sparse repr is there for it. Suggestions welcome, esp someone to lead this effort.

jreback · 2015-10-11T15:58:49Z

@kawochen you want to rebase / update and can review

jreback · 2015-11-10T01:27:48Z

@kawochen yeh, the sparse stuff needs an overhaul....

closing, but if you'd like to update, pls reopen

BUG: GH10536 in concat for SparseSeries

1ed31ed

sinhrks added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels Jul 20, 2015

sinhrks added this to the 0.17.0 milestone Jul 20, 2015

jreback reviewed Jul 20, 2015
View reviewed changes

jreback modified the milestones: Next Major Release, 0.17.0 Aug 16, 2015

jreback closed this Nov 10, 2015

jorisvandenbossche modified the milestones: No action, Next Major Release Jul 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: GH10536 in concat for SparseSeries #10626

BUG: GH10536 in concat for SparseSeries #10626

kawochen commented Jul 19, 2015

jreback commented Jul 19, 2015

artemyk commented Jul 20, 2015

kawochen commented Jul 20, 2015

jreback Jul 20, 2015

max-sixty Jul 20, 2015

kawochen Jul 21, 2015

max-sixty Jul 21, 2015

kawochen Jul 21, 2015

max-sixty Jul 21, 2015

kawochen Jul 22, 2015

max-sixty Jul 22, 2015

kawochen Jul 22, 2015

kawochen commented Jul 22, 2015

artemyk commented Jul 22, 2015

jreback commented Jul 22, 2015

artemyk commented Jul 22, 2015

jreback commented Jul 22, 2015

artemyk commented Jul 22, 2015

wesm commented Jul 22, 2015

jreback commented Jul 28, 2015

jreback commented Oct 11, 2015

jreback commented Nov 10, 2015

BUG: GH10536 in concat for SparseSeries #10626

BUG: GH10536 in concat for SparseSeries #10626

Conversation

kawochen commented Jul 19, 2015

jreback commented Jul 19, 2015

artemyk commented Jul 20, 2015

kawochen commented Jul 20, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kawochen commented Jul 22, 2015

artemyk commented Jul 22, 2015

jreback commented Jul 22, 2015

artemyk commented Jul 22, 2015

jreback commented Jul 22, 2015

artemyk commented Jul 22, 2015

wesm commented Jul 22, 2015

jreback commented Jul 28, 2015

jreback commented Oct 11, 2015

jreback commented Nov 10, 2015