Skip to content

ENH: add sparse op for int64 dtypes #13848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jul 30, 2016

As a first step for #667, numeric op can now preserve int64 dtype. On current master, dtype is reset to float64 after op.

# current master
a = pd.SparseArray([1, 2], dtype=np.int64)
a.dtype
# dtype('int64')

(a + a).dtype
# dtype('float64')

NOTE: int64 SparseSeries.__floordiv__ test is skipped because dense Series also has inconsistency in nan/inf handling (#13843). Currently it outputs the same result as float64.

@sinhrks sinhrks added Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type labels Jul 30, 2016
@sinhrks sinhrks added this to the 0.19.0 milestone Jul 30, 2016
@sinhrks sinhrks added the Dtype Conversions Unexpected or buggy dtype conversions label Jul 30, 2016
@codecov-io
Copy link

codecov-io commented Jul 30, 2016

Current coverage is 85.28% (diff: 98.00%)

Merging #13848 into master will increase coverage by <.01%

@@             master     #13848   diff @@
==========================================
  Files           139        139          
  Lines         50020      50046    +26   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42657      42682    +25   
- Misses         7363       7364     +1   
  Partials          0          0          

Powered by Codecov. Last update 97de42a...f101e66

@TomAugspurger
Copy link
Contributor

Haven't had a chance to look through the code yet, but what are the rules around alignment and potentially recasting the dtype?

import numpy as np
import pandas as pd

s1 = pd.SparseSeries(np.arange(4), dtype=np.int64, fill_value=0)
s2 = pd.SparseSeries(np.arange(4), index=range(1, 5), dtype=np.int64, fill_value=0)

s1 + s1  # OK
s1 + s2  # error
Traceback (most recent call last):
  File "script.py", line 8, in <module>
    s1 + s2  # error
  File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/series.py", line 56, in wrapper
    return _sparse_series_op(self, other, op, name)
  File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/series.py", line 81, in _sparse_series_op
    series=True)
  File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/array.py", line 119, in _sparse_array_op
    sparse_op = getattr(splib, opname)
AttributeError: module 'pandas._sparse' has no attribute 'sparse_add_float64'

@sinhrks
Copy link
Member Author

sinhrks commented Jul 30, 2016

@TomAugspurger The latter case looks work on my branch, the error seems to show that sparse.pyx is not re-compiled properly.

I'm adding more tests related to alignment:)

@TomAugspurger
Copy link
Contributor

My bad, just got to that section of the code. Recompiled and it does indeed work 👍

Sparse changes
~~~~~~~~~~~~~~

These changes conform sparse data to support more dtypes, and for work to make a smoother experience with data handling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes allow pandas to handle sparse data with more dtypes.

@jreback
Copy link
Contributor

jreback commented Aug 2, 2016

rebase in light of changes #13787

@sinhrks sinhrks force-pushed the sparse_op2 branch 2 times, most recently from 9f69e99 to 75dfca7 Compare August 2, 2016 21:00
@jreback
Copy link
Contributor

jreback commented Aug 3, 2016

thanks!

nice cleanup

@jreback jreback closed this in 45d54d0 Aug 3, 2016
@sinhrks sinhrks deleted the sparse_op2 branch August 3, 2016 23:06
@jreback
Copy link
Contributor

jreback commented Aug 4, 2016

FYI: 8ec7406

as we no longer depend on generated; was causing recompilation of algos.pyx every time :<

@jreback
Copy link
Contributor

jreback commented Aug 4, 2016

small dtype adj needed on windows

(Pdb) c
E........................................................................................................................................
..............................................................................S.........................S................................
...........................................
======================================================================
ERROR: test_int_array_comparison (pandas.sparse.tests.test_arithmetics.TestSparseArrayArithmetics)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\tests\test_arithmetics.py", line 292, in test_int_array_comparison
    self._check_comparison_ops(a, b, values, rvalues)
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\tests\test_arithmetics.py", line 93, in _check_comparison_ops
    self._check_bool_result(a == b_dense)
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\array.py", line 54, in wrapper
    return _sparse_array_op(self, other, op, name)
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\array.py", line 98, in _sparse_array_op
    dtype = _maybe_match_dtype(left, right)
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\array.py", line 75, in _maybe_match_dtype
    raise NotImplementedError('dtypes must be identical')
NotImplementedError: dtypes must be identical

----------------------------------------------------------------------
Ran 331 tests in 49.517s

FAILED (SKIP=2, errors=1)
(pandas3.5) C:\Users\conda\Documents\pandas3.5>nosetests pandas\sparse --pdb
........> c:\users\conda\documents\pandas3.5\pandas\sparse\array.py(75)_maybe_match_dtype()
-> raise NotImplementedError('dtypes must be identical')
(Pdb) u
> c:\users\conda\documents\pandas3.5\pandas\sparse\array.py(98)_sparse_array_op()
-> dtype = _maybe_match_dtype(left, right)
(Pdb) u
> c:\users\conda\documents\pandas3.5\pandas\sparse\array.py(54)wrapper()
-> return _sparse_array_op(self, other, op, name)
(Pdb) d
> c:\users\conda\documents\pandas3.5\pandas\sparse\array.py(98)_sparse_array_op()
-> dtype = _maybe_match_dtype(left, right)
(Pdb) p left
[0, 1, 2, 0, 0, 0, 1, 2, 1, 0]
Fill: nan
IntIndex
Indices: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

(Pdb) p right
[2, 0, 2, 3, 0, 0, 1, 5, 2, 0]
Fill: nan
IntIndex
Indices: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

(Pdb) p left.dtype
dtype('int64')
(Pdb) p right.dtype
dtype('int32')
(Pdb) u

@sinhrks
Copy link
Member Author

sinhrks commented Aug 4, 2016

Thx, will fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants