-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: GH10536 in concat for SparseSeries #10626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
you can only promote if the things u r concatting have the same fill the kind doesn't matter |
Not sure if this is accurate, but would calling |
|
@@ -894,13 +896,21 @@ def get_result(self): | |||
if self.axis == 0: | |||
new_data = com._concat_compat([x.values for x in self.objs]) | |||
name = com._consensus_name_attr(self.objs) | |||
return Series(new_data, index=self.new_axes[0], name=name).__finalize__(self, method='concat') | |||
if self._is_sp_series: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't like all of these if-thens. The basic problem is the existence of SparseSeries/SparseDataFrame
, which don't really need to exist at all. The blocks actually hold the sparse data. But that is a bit bigger project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you do ._constructor
to solve the immediate issue and allow for those objects to be removed in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first approach but I also wanted to keep the symmetry between axis = 1
and axis = 0
. And if you want to check fill values etc, the case work seems necessary (if ugly) without refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't replace this block with self._constructor(...
?
if self._is_sp_series:
klass = SparseDataFrame
else:
klass = DataFrame
return klass(...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not seeing how that would work? Do Series
and DataFrame
share the same constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little fast on the trigger. I didn't realize that you need ._is_frame
etc elsewhere.
For this code, this is what I was originally thinking. Tell me if I'm still off, though
In __init__
self.obj_constructor=sample._constructor
and then in get_result
self.obj_constructor(...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the result could be a SparseDataFrame
when the sample is a SparseSeries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If sample is a SparseSeries
, then sample._constructor
will also be a SparseSeries
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can do that if self.axis==0
, but concatenation could happen along a different axis, so I thought the code would be easier to understand if the different cases were handled in roughly the same way.
Since |
@jreback Out of curiosity --- is there any particular reason for the |
@artemyk I made this comment in another issue I think, but can't seem to find it. The short answer is sort-of. I think the original intent of However after the refactor to move Series out of sub-classing ndarray (0.13). This was no longer necessary (e.g. the SparseBlock became a real thing). So at this point So bottom line, I think we could entirely drop |
@jreback Thanks, that makes sense. I"m actually feeling like the underlying sparse implementation could use an overhaul. For example, if I understand correctly, sparse blocks cannot be consolidated right now. So, if get_dummies is called on a column with 100,000 values, we get a dataframe with 100,000 blocks. I imagine this can strongly hurt performance in some areas. An underlying sparse block manager that can truly handle sparse matrices (not just series) of various data types would be nice. |
well, we have been thinking about what to do with sparse. I am not 100% sure why @wesm originally wrote this. But then again I don't know the state of scipy sparse back in 2010. So we could have an alternative (or maybe THE) sparse repr that actually is something from scipy, e.g. coo/csr. And just use the current index stuff. It would be something of a project, but yes, could be very nice. |
Unfortunately it seems that scipy.sparse only supports numeric datatypes. Perhaps a wrapper around something like the Boost sparse matrix library (http://www.boost.org/doc/libs/1_45_0/libs/numeric/ublas/doc/matrix_sparse.htm#2CompressedMatrix) could be used instead. |
@jreback AFAIK isn't |
yeah, have been thinking about sparse and what to do. a suggestion was made to split this off into |
@kawochen you want to rebase / update and can review |
@kawochen yeh, the sparse stuff needs an overhaul.... closing, but if you'd like to update, pls reopen |
To address #10536, but it's clearly not enough. What should be done for
SparseSeries
of differentkind
s and differentfill
values?