-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: multi-type SparseDataFrame fixes and improvements #13917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: multi-type SparseDataFrame fixes and improvements #13917
Conversation
Current coverage is 85.30% (diff: 100%)@@ master #13917 diff @@
==========================================
Files 139 139
Lines 50157 50157
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 42785 42785
Misses 7372 7372
Partials 0 0
|
@@ -4440,7 +4440,10 @@ def _lcd_dtype(l): | |||
""" find the lowest dtype that can accomodate the given types """ | |||
m = l[0].dtype | |||
for x in l[1:]: | |||
if x.dtype.itemsize > m.itemsize: | |||
# the new dtype must either be wider or a strict subtype | |||
if (x.dtype.itemsize > m.itemsize or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't np.find_common_type
do this? (or .common_type
)?
should create a pandas version of this to isolate (e.g. this won't handle non-numpy types)
as sparse mainly supports |
Types were incorrectly determined when slicing SparseDataFrames with multiple dtypes (such as float and object). Also enables type inference for SparseArrays by default.
I used numpy's find_common_type instead of that local function, this changed a test ( Another (maybe non-minor) change is the default parameter in I also added some new tests as suggested, but don't feel confident in adding the proposed pandas alternative to the numpy find_common_type - my knowledge of pandas dtypes isn't really great. It would likely be similar to |
def _lcd_dtype(l): | ||
""" find the lowest dtype that can accomodate the given types """ | ||
m = l[0].dtype | ||
for x in l[1:]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant was can start off by simply moving this (your version or new one with np.find_common_type
to pandas.types.cast
(and if we have specific tests move similarly; if not, ideally add some). we can add the pandas specific functionaility later.
can u also add tests to check normal current default ( |
It turns out this PR works without the default argument change, I was too hasty to change it. Your PR fixes that better, so I reverted the change. Common type discovery moved to |
yep, need to fix this. @sstanovnik can you create a new issue for this.
|
|
||
values = self.mixed_int.as_matrix(['A', 'D']) | ||
self.assertEqual(values.dtype, np.int64) | ||
|
||
# guess all ints are cast to uints.... | ||
# B uint64 forces float because there are other signed int types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might fix another bug, can you search for uint64 issues and see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue as a reference here
and in the whatsnew
Opened issue. Moved the tests, added new tests. Found and processed #10364. |
@@ -188,6 +191,42 @@ def test_possibly_convert_objects_copy(self): | |||
self.assertTrue(values is not out) | |||
|
|||
|
|||
class TestCommonTypes(tm.TestCase): | |||
def setUp(self): | |||
super(TestCommonTypes, self).setUp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need setUp here
self.assertEqual(_find_common_type([np.object]), np.object) | ||
|
||
self.assertEqual(_find_common_type([np.int16, np.int64]), | ||
np.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group them
eg ints
floats
easier to read
@@ -437,6 +437,7 @@ API changes | |||
- ``pd.Timedelta(None)`` is now accepted and will return ``NaT``, mirroring ``pd.Timestamp`` (:issue:`13687`) | |||
- ``Timestamp``, ``Period``, ``DatetimeIndex``, ``PeriodIndex`` and ``.dt`` accessor have gained a ``.is_leap_year`` property to check whether the date belongs to a leap year. (:issue:`13727`) | |||
- ``pd.read_hdf`` will now raise a ``ValueError`` instead of ``KeyError``, if a mode other than ``r``, ``r+`` and ``a`` is supplied. (:issue:`13623`) | |||
- ``.values`` will now return ``np.float64`` with a ``DataFrame`` with ``np.int64`` and ``np.uint64`` dtypes, conforming to ``np.find_common_type`` (:issue:`10364`, :issue:`13917`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame.values
will now ....with a frame of mixed int64
and uint64
dtypes.....
@@ -764,6 +765,7 @@ Note that the limitation is applied to ``fill_value`` which default is ``np.nan` | |||
- Bug in ``SparseDataFrame`` doesn't respect passed ``SparseArray`` or ``SparseSeries`` 's dtype and ``fill_value`` (:issue:`13866`) | |||
- Bug in ``SparseArray`` and ``SparseSeries`` don't apply ufunc to ``fill_value`` (:issue:`13853`) | |||
- Bug in ``SparseSeries.abs`` incorrectly keeps negative ``fill_value`` (:issue:`13853`) | |||
- Bug in single row slicing on multi-type ``SparseDataFrame``s: types were previously forced to float (:issue:`13917`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, types where previously...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads fine to me? There's a colon after SparseDataFrame
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change : to ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait sorry I misread your comment.
lgtm. @sinhrks ? |
Thanks for your patience. |
ha! thanks for yours |
lgtm, thx @sstanovnik ! |
thanks! |
git diff upstream/master | flake8 --diff
Types were incorrectly determined when slicing SparseDataFrames with
multiple dtypes (such as float and object) into SparseSeries.
No existing issue covers this.