Skip to content

Series/DataFrame.isna/notna with sparse data densifies #23745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Nov 16, 2018 · 3 comments
Closed

Series/DataFrame.isna/notna with sparse data densifies #23745

jorisvandenbossche opened this issue Nov 16, 2018 · 3 comments
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Sparse Sparse Data Type

Comments

@jorisvandenbossche
Copy link
Member

In [1]: df = pd.SparseDataFrame({'A': [np.nan, np.nan, 1, 2, np.nan], 
   ...:                          'B': [0, np.nan, np.nan, 2, np.nan]})                                                                                                          

In [2]: res = df.notna()                                                                                                                                                        

In [3]: res                                                                                                                                                                     
Out[3]: 
       A      B
0  False   True
1  False  False
2   True  False
3   True   True
4  False  False

In [4]: res['A']                                                                                                                                                                
Out[4]: 
0    False
1    False
2     True
3     True
4    False
Name: A, dtype: Sparse[bool, False]
BlockIndex
Block locations: array([2], dtype=int32)
Block lengths: array([2], dtype=int32)

In [5]: df2 = pd.DataFrame(df)                                                                                                                                                  

In [6]: res2 = df2.notna()                                                                                                                                                      

In [7]: res2                                                                                                                                                                    
Out[7]: 
       A      B
0  False   True
1  False  False
2   True  False
3   True   True
4  False  False

In [8]: res2['A']                                                                                                                                                               
Out[8]: 
0    False
1    False
2     True
3     True
4    False
Name: A, dtype: bool

The result of isna / notna can be of type Sparse[bool, False] here as well.

@jorisvandenbossche jorisvandenbossche added Sparse Sparse Data Type ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 16, 2018
@JustinZhengBC
Copy link
Contributor

Similar to #23744, I think the issue is the constructor as pd.DataFrame can only return that class so it cannot self-cast to the SparseDataFrame class. Using df2 = pd.SparseDataFrame(df) achieves expected results

@jorisvandenbossche
Copy link
Member Author

See #23744 (comment) for an answer.

@mroeschke mroeschke added the Bug label Apr 25, 2020
@mroeschke mroeschke added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jun 23, 2021
@mzeitlin11
Copy link
Member

This looks ok on master:

In [3]: df = pd.DataFrame({"a": [np.nan, 1.0], "b": [np.nan, 1.0]}, dtype=pd.Spa
   ...: rseDtype("float"))

In [4]: df.isna().dtypes
Out[4]:
a    Sparse[bool, True]
b    Sparse[bool, True]
dtype: object

In [5]: df["a"].isna().dtype
Out[5]: Sparse[bool, True]

And is hit by the TestMissing class of extension tests, so going to close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

4 participants