ENH: add sparse op for int64 dtypes #13848

sinhrks · 2016-07-30T10:02:48Z

related to Support dtypes other than float in sparse data structures #667
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

As a first step for #667, numeric op can now preserve int64 dtype. On current master, dtype is reset to float64 after op.

# current master
a = pd.SparseArray([1, 2], dtype=np.int64)
a.dtype
# dtype('int64')

(a + a).dtype
# dtype('float64')

NOTE: int64 SparseSeries.__floordiv__ test is skipped because dense Series also has inconsistency in nan/inf handling (#13843). Currently it outputs the same result as float64.

codecov-io · 2016-07-30T10:49:47Z

Current coverage is 85.28% (diff: 98.00%)

Merging #13848 into master will increase coverage by <.01%

@@             master     #13848   diff @@
==========================================
  Files           139        139          
  Lines         50020      50046    +26   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42657      42682    +25   
- Misses         7363       7364     +1   
  Partials          0          0

Powered by Codecov. Last update 97de42a...f101e66

TomAugspurger · 2016-07-30T12:01:05Z

Haven't had a chance to look through the code yet, but what are the rules around alignment and potentially recasting the dtype?

import numpy as np
import pandas as pd

s1 = pd.SparseSeries(np.arange(4), dtype=np.int64, fill_value=0)
s2 = pd.SparseSeries(np.arange(4), index=range(1, 5), dtype=np.int64, fill_value=0)

s1 + s1  # OK
s1 + s2  # error

Traceback (most recent call last):
  File "script.py", line 8, in <module>
    s1 + s2  # error
  File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/series.py", line 56, in wrapper
    return _sparse_series_op(self, other, op, name)
  File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/series.py", line 81, in _sparse_series_op
    series=True)
  File "/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/sparse/array.py", line 119, in _sparse_array_op
    sparse_op = getattr(splib, opname)
AttributeError: module 'pandas._sparse' has no attribute 'sparse_add_float64'

sinhrks · 2016-07-30T12:12:55Z

@TomAugspurger The latter case looks work on my branch, the error seems to show that sparse.pyx is not re-compiled properly.

I'm adding more tests related to alignment:)

TomAugspurger · 2016-07-30T12:20:28Z

My bad, just got to that section of the code. Recompiled and it does indeed work 👍

jreback · 2016-08-01T10:45:56Z

doc/source/whatsnew/v0.19.0.txt

+Sparse changes
+~~~~~~~~~~~~~~
+
+These changes conform sparse data to support more dtypes, and for work to make a smoother experience with data handling.


These changes allow pandas to handle sparse data with more dtypes.

jreback · 2016-08-02T10:48:39Z

rebase in light of changes #13787

jreback · 2016-08-03T22:48:45Z

thanks!

nice cleanup

jreback · 2016-08-04T10:31:54Z

FYI: 8ec7406

as we no longer depend on generated; was causing recompilation of algos.pyx every time :<

jreback · 2016-08-04T10:34:54Z

small dtype adj needed on windows

(Pdb) c
E........................................................................................................................................
..............................................................................S.........................S................................
...........................................
======================================================================
ERROR: test_int_array_comparison (pandas.sparse.tests.test_arithmetics.TestSparseArrayArithmetics)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\tests\test_arithmetics.py", line 292, in test_int_array_comparison
    self._check_comparison_ops(a, b, values, rvalues)
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\tests\test_arithmetics.py", line 93, in _check_comparison_ops
    self._check_bool_result(a == b_dense)
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\array.py", line 54, in wrapper
    return _sparse_array_op(self, other, op, name)
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\array.py", line 98, in _sparse_array_op
    dtype = _maybe_match_dtype(left, right)
  File "C:\Users\conda\Documents\pandas3.5\pandas\sparse\array.py", line 75, in _maybe_match_dtype
    raise NotImplementedError('dtypes must be identical')
NotImplementedError: dtypes must be identical

----------------------------------------------------------------------
Ran 331 tests in 49.517s

FAILED (SKIP=2, errors=1)

(pandas3.5) C:\Users\conda\Documents\pandas3.5>nosetests pandas\sparse --pdb
........> c:\users\conda\documents\pandas3.5\pandas\sparse\array.py(75)_maybe_match_dtype()
-> raise NotImplementedError('dtypes must be identical')
(Pdb) u
> c:\users\conda\documents\pandas3.5\pandas\sparse\array.py(98)_sparse_array_op()
-> dtype = _maybe_match_dtype(left, right)
(Pdb) u
> c:\users\conda\documents\pandas3.5\pandas\sparse\array.py(54)wrapper()
-> return _sparse_array_op(self, other, op, name)
(Pdb) d
> c:\users\conda\documents\pandas3.5\pandas\sparse\array.py(98)_sparse_array_op()
-> dtype = _maybe_match_dtype(left, right)
(Pdb) p left
[0, 1, 2, 0, 0, 0, 1, 2, 1, 0]
Fill: nan
IntIndex
Indices: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

(Pdb) p right
[2, 0, 2, 3, 0, 0, 1, 5, 2, 0]
Fill: nan
IntIndex
Indices: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

(Pdb) p left.dtype
dtype('int64')
(Pdb) p right.dtype
dtype('int32')
(Pdb) u

sinhrks · 2016-08-04T10:40:27Z

Thx, will fix.

sinhrks added Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type labels Jul 30, 2016

sinhrks added this to the 0.19.0 milestone Jul 30, 2016

sinhrks mentioned this pull request Jul 30, 2016

ENH: Sparse int64 and bool dtype support enhancement #13849

Merged

4 tasks

sinhrks added the Dtype Conversions Unexpected or buggy dtype conversions label Jul 30, 2016

sinhrks force-pushed the sparse_op2 branch from 66ccad5 to 723bb08 Compare July 30, 2016 11:16

sinhrks force-pushed the sparse_op2 branch from 723bb08 to af8706b Compare July 30, 2016 12:17

jreback reviewed Aug 1, 2016
View reviewed changes

sinhrks force-pushed the sparse_op2 branch from af8706b to 4177c8e Compare August 1, 2016 13:22

sinhrks force-pushed the sparse_op2 branch 2 times, most recently from 9f69e99 to 75dfca7 Compare August 2, 2016 21:00

ENH: add sparse op for other dtypes

f101e66

sinhrks force-pushed the sparse_op2 branch from 75dfca7 to f101e66 Compare August 3, 2016 13:31

jreback closed this in 45d54d0 Aug 3, 2016

sinhrks deleted the sparse_op2 branch August 3, 2016 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add sparse op for int64 dtypes #13848

ENH: add sparse op for int64 dtypes #13848

sinhrks commented Jul 30, 2016 •

edited

Loading

codecov-io commented Jul 30, 2016 •

edited

Loading

TomAugspurger commented Jul 30, 2016

sinhrks commented Jul 30, 2016

TomAugspurger commented Jul 30, 2016

jreback Aug 1, 2016

jreback commented Aug 2, 2016

jreback commented Aug 3, 2016

jreback commented Aug 4, 2016

jreback commented Aug 4, 2016

sinhrks commented Aug 4, 2016

ENH: add sparse op for int64 dtypes #13848

ENH: add sparse op for int64 dtypes #13848

Conversation

sinhrks commented Jul 30, 2016 • edited Loading

codecov-io commented Jul 30, 2016 • edited Loading

Current coverage is 85.28% (diff: 98.00%)

TomAugspurger commented Jul 30, 2016

sinhrks commented Jul 30, 2016

TomAugspurger commented Jul 30, 2016

jreback Aug 1, 2016

Choose a reason for hiding this comment

jreback commented Aug 2, 2016

jreback commented Aug 3, 2016

jreback commented Aug 4, 2016

jreback commented Aug 4, 2016

sinhrks commented Aug 4, 2016

sinhrks commented Jul 30, 2016 •

edited

Loading

codecov-io commented Jul 30, 2016 •

edited

Loading