ENH: Use IntergerArray to avoid forced conversion from integer to float #27335

jiangyue12392 · 2019-07-11T04:56:41Z

This is a new attempt to solve issue #16918 after an attempt with PR #27073

WillAyd · 2019-07-11T14:33:40Z

Is there a way to make this work without adding a keyword to the DataFrame constructor? That approach seems off to me

Also be sure to add tests with this - its not entirely clear that this does anything as is

jreback

@jiangyue12392 not what I wanted here. sure fixing maybe_convert_numeric to allow return integer_array might close #27283 , but the rest is completely orthogonal and we cannot propagate a keyword thru DataFrame.

jiangyue12392 · 2019-07-11T14:44:23Z

@WillAyd @jreback If the original implementation of using DataFrame to do the expansion of list of dict to 2d array and the type determination for columns is to be maintained then it seems to me that there is no way to circumvent propagating a new key in the DataFrame to make it backwards compatible. Is it more desirable to pull out the expansion and type determination logic from DataFrame and have some code duplications?

I will add tests later, but I want to know what is the more accepted methods here first.

jreback · 2019-07-11T14:48:00Z

@jiangyue12392 we are likely going to do the inference for expanding a int dtype with nulls to integer-na (rather than float) at some point, but currently this is NOT the default. We need some more infrastructure first (see the linked issue).

jiangyue12392 · 2019-07-12T01:57:56Z

@jreback Ok, I will wait till the infrastructure is in place after #27283 is sorted out

jreback · 2019-07-12T02:20:16Z

@jiangyue12392 actually part of this PR could partially satisfy that issue

jiangyue12392 · 2019-07-12T02:34:11Z

@jreback I suppose the part where the change in lib.pyx serves the purpose of distinguishing an integer_na. However, on that issue, both @TomAugspurger and @jorisvandenbossche express their concern to keep behaviour consistent with the past where the default datatype is float64. I am not sure how users can choose between the new Int64 behaviour or the old float64 for backward compatibility

jreback · 2019-07-12T12:45:02Z

so what you do is make the return string integer-na (this is a small api change)
there are a small number of places then that would have to interpret this type (as float) for casting purposes

that’s it

jreback · 2019-09-08T15:57:26Z

@jiangyue12392 can you merge master

jiangyue12392 · 2019-09-09T04:18:08Z

@jiangyue12392 can you merge master

Sorry, I have been busy with other stuff recently, will merge as soon as I get my hands free

jreback · 2019-10-06T22:51:38Z

@jiangyue12392 if you can merge master; I think parts of this patch we would want to merge.

jiangyue12392 · 2019-10-07T07:38:00Z

@jiangyue12392 if you can merge master; I think parts of this patch we would want to merge.

Master merged. Sorry for the delay.

jreback

if you just do the internal change in maybe_convert_objects and add a test for that we can get this in. pls merge master as wel.

jreback · 2019-10-18T21:35:13Z

pandas/core/frame.py

@@ -397,6 +397,7 @@ def __init__(
        columns: Optional[Axes] = None,
        dtype: Optional[Dtype] = None,
        copy: bool = False,
+        to_integer_array: bool = False,


we can't have this keyword here

External keywords removed

jreback · 2019-10-18T21:36:24Z

pandas/_libs/lib.pyx

@@ -1951,7 +1953,7 @@ def maybe_convert_numeric(ndarray[object] values, set na_values,
 @cython.wraparound(False)
 def maybe_convert_objects(ndarray[object] objects, bint try_float=0,
                          bint safe=0, bint convert_datetime=0,
-                          bint convert_timedelta=0):
+                          bint convert_timedelta=0, to_integer_array=False):


ok with a keyword (here as this is in internal funtion), but can you name
bint convert_to_nullable_integer=False

Variable name refactored.

jiangyue12392 · 2019-10-27T13:38:05Z

if you just do the internal change in maybe_convert_objects and add a test for that we can get this in. pls merge master as wel.

I have made the required changes and added the test.

jreback · 2019-10-30T12:13:18Z

pandas/tests/dtypes/test_inference.py

+        # GH27335
+        arr = np.array([2, np.NaN], dtype=object)
+        result = lib.maybe_convert_objects(arr, convert_to_nullable_integer=1)
+        from pandas.core.arrays import IntegerArray


can import at the top.

Shifted import statement

jreback · 2019-10-30T12:14:53Z

pandas/tests/dtypes/test_inference.py

+        result = lib.maybe_convert_objects(arr, convert_to_nullable_integer=1)
+        from pandas.core.arrays import IntegerArray
+
+        exp = IntegerArray(np.array([2, 0], dtype="i8"), np.array([False, True]))


can you parameterize this test with an int64 array as well

Parameterized as requested

jreback · 2019-10-30T12:15:30Z

pandas/tests/dtypes/test_inference.py

+        from pandas.core.arrays import IntegerArray
+
+        exp = IntegerArray(np.array([2, 0], dtype="i8"), np.array([False, True]))
+        tm.assert_equal(result, exp)


use tm.assert_extension_array_equal

Assertion method changed

jreback · 2019-10-30T12:16:28Z

pandas/_libs/lib.pyx

@@ -2085,11 +2094,19 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,

    if not seen.object_:
        if not safe:
-            if seen.null_:
+            if seen.null_ or seen.nan_:


separate issue is we should change line 2083 to use a DatetimeArray (can be separate PR) or here if it works out.

can you also update the doc-string (well add it really :->) thanks for workign on this.

I will leave line 2083 out as it is a separate issue.
Which version of the doc-string shall I modify?

jreback

small comment, pls rebase and ping on green.

jreback · 2019-11-08T14:58:06Z

pandas/_libs/lib.pyx

@@ -1955,7 +1957,8 @@ def maybe_convert_numeric(ndarray[object] values, set na_values,
 @cython.wraparound(False)
 def maybe_convert_objects(ndarray[object] objects, bint try_float=0,
                          bint safe=0, bint convert_datetime=0,
-                          bint convert_timedelta=0):
+                          bint convert_timedelta=0,
+                          bint convert_to_nullable_integer=0):
    """


if you can update this doc-string here e.g. Returns / Parameters

Doc-string added

…ert_equal

jiangyue12392 · 2019-11-12T02:46:31Z

small comment, pls rebase and ping on green.

Green now.

jreback · 2019-11-13T02:06:56Z

thanks @jiangyue12392 nice patch. This should enable us to selectively convert to nullable integers with the user facing routines.

…at (pandas-dev#27335)

jiangyue12392 changed the title ~~Use IntergerArray to avoid forced conversion from integer to float~~ ENH: Use IntergerArray to avoid forced conversion from integer to float Jul 11, 2019

jiangyue12392 mentioned this pull request Jul 11, 2019

ENH: Json fill_value for missing fields #27073

Closed

4 tasks

jreback requested changes Jul 11, 2019

View reviewed changes

gfyoung added Enhancement IO JSON read_json, to_json, json_normalize labels Jul 12, 2019

jreback mentioned this pull request Jul 12, 2019

ENH: infer_dtype should infer integer-na #27283

Closed

jiangyue12392 mentioned this pull request Jul 15, 2019

ENH: Infer integer-na in infer_dtype #27392

Merged

5 tasks

jiangyue12392 force-pushed the dataframe_int_dtype branch from 4ac8859 to 6a5f33b Compare October 7, 2019 07:02

jreback requested changes Oct 18, 2019

View reviewed changes

jreback requested changes Oct 30, 2019

View reviewed changes

jreback mentioned this pull request Nov 4, 2019

Series of object/strings cannot be converted to Int64Dtype() #28599

Closed

jreback requested changes Nov 8, 2019

View reviewed changes

Jiang Yue and others added 6 commits November 11, 2019 21:18

Use IntergerArray for integer arrays with null

475ed6d

Reformat with black

f47b60f

Remove to_integer_array keyword in frame.py

57c613a

Remove to_integer_array keyword in internals/construction.py

b5698c0

Remove to_integer_array keyword in io/json/_normalize.py

e2b9803

Refactor keyword 'to_integer_array' to 'convert_to_nullable_integer'

63d4bdd

jiangyue12392 added 5 commits November 11, 2019 21:19

Add test for IntegerArray conversion

a74e473

Reformat with black

7672fa6

Parameterize test and use assert_extension_array_equal instead of ass…

c24de12

…ert_equal

Sort import sequence with isort

f071bf6

Add doc-string for maybe_convert_objects

e8591ef

jiangyue12392 force-pushed the dataframe_int_dtype branch from 09c7168 to e8591ef Compare November 11, 2019 13:21

Solve unwanted import pattern

cc179be

jreback added this to the 1.0 milestone Nov 13, 2019

jreback approved these changes Nov 13, 2019

View reviewed changes

jreback merged commit 6b62e50 into pandas-dev:master Nov 13, 2019

Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019

ENH: Use IntergerArray to avoid forced conversion from integer to flo…

5d6482a

…at (pandas-dev#27335)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

ENH: Use IntergerArray to avoid forced conversion from integer to flo…

f4bd2ba

…at (pandas-dev#27335)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

ENH: Use IntergerArray to avoid forced conversion from integer to flo…

f3a26d8

…at (pandas-dev#27335)

jorisvandenbossche mentioned this pull request Feb 21, 2020

REGR: Series repr of object Index with bools and NaN is wrong #32146

Closed

jzwinck mentioned this pull request Apr 28, 2020

ENH: json_normalize() avoid loss of precision for int64 with missing values #16918

Closed

jiangyue12392 deleted the dataframe_int_dtype branch April 6, 2022 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Use IntergerArray to avoid forced conversion from integer to float #27335

ENH: Use IntergerArray to avoid forced conversion from integer to float #27335

jiangyue12392 commented Jul 11, 2019 •

edited

Loading

WillAyd commented Jul 11, 2019

jreback left a comment •

edited

Loading

jiangyue12392 commented Jul 11, 2019

jreback commented Jul 11, 2019

jiangyue12392 commented Jul 12, 2019

jreback commented Jul 12, 2019

jiangyue12392 commented Jul 12, 2019

jreback commented Jul 12, 2019

jreback commented Sep 8, 2019

jiangyue12392 commented Sep 9, 2019

jreback commented Oct 6, 2019

jiangyue12392 commented Oct 7, 2019

jreback left a comment

jreback Oct 18, 2019

jiangyue12392 Oct 27, 2019

jreback Oct 18, 2019

jiangyue12392 Oct 27, 2019

jiangyue12392 commented Oct 27, 2019

jreback Oct 30, 2019

jiangyue12392 Nov 3, 2019

jreback Oct 30, 2019

jiangyue12392 Nov 3, 2019

jreback Oct 30, 2019

jiangyue12392 Nov 3, 2019

jreback Oct 30, 2019

jiangyue12392 Nov 3, 2019

jreback left a comment

jreback Nov 8, 2019

jiangyue12392 Nov 12, 2019

jiangyue12392 commented Nov 12, 2019

jreback commented Nov 13, 2019

ENH: Use IntergerArray to avoid forced conversion from integer to float #27335

ENH: Use IntergerArray to avoid forced conversion from integer to float #27335

Conversation

jiangyue12392 commented Jul 11, 2019 • edited Loading

WillAyd commented Jul 11, 2019

jreback left a comment • edited Loading

Choose a reason for hiding this comment

jiangyue12392 commented Jul 11, 2019

jreback commented Jul 11, 2019

jiangyue12392 commented Jul 12, 2019

jreback commented Jul 12, 2019

jiangyue12392 commented Jul 12, 2019

jreback commented Jul 12, 2019

jreback commented Sep 8, 2019

jiangyue12392 commented Sep 9, 2019

jreback commented Oct 6, 2019

jiangyue12392 commented Oct 7, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiangyue12392 commented Oct 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiangyue12392 commented Nov 12, 2019

jreback commented Nov 13, 2019

jiangyue12392 commented Jul 11, 2019 •

edited

Loading

jreback left a comment •

edited

Loading