ENH: Add nullable keyword to read_sql #50048

phofl · 2022-12-03T22:01:39Z

Sits on top of #50047

Functionality wise, this should work now.

More broadly, I am not sure that this is the best approach we could take here. Since the convert_to_nullable_type in lib.maybe_convert_objects is not used right now except here, we could also make this strict and return the appropriate Array from the Cython code, not only when nulls are present. This would avoid the re-cast in the non-cython code part.

…ts_nullable_boolean # Conflicts: # pandas/_libs/lib.pyx

WillAyd · 2022-12-10T19:15:31Z

I think makes sense and like your vision on the interaction with Cython. Is there any performance difference here moving away from the .from_records construction of the DataFrame?

phofl · 2022-12-10T19:55:41Z

Also discussed this with @mroeschke in another pr, he leaned more towards keeping maybe_convert_objects as is.

Maybe a bit faster? The previous implementation was running through the same function that we are calling directly now. Definitively no slowdown

WillAyd

I'm OK with this as is - @mroeschke

mroeschke · 2022-12-13T02:15:26Z

Since the convert_to_nullable_type in lib.maybe_convert_objects is not used right now except here, we could also make this strict and return the appropriate Array from the Cython code, not only when nulls are present

I think in a future PR it might make sense to do this; I didn't realized this was the only location where this is used. I guess the only slight downside is run time importing the Arrays in the Cython code?

mroeschke · 2022-12-13T02:15:50Z

Thanks @phofl

jbrockmendel · 2023-05-03T22:45:09Z

pandas/core/internals/construction.py


            if dtype is None:
                if arr.dtype == np.dtype("O"):
                    # i.e. maybe_convert_objects didn't convert
                    arr = maybe_infer_to_datetimelike(arr)
+                    if use_nullable_dtypes and arr.dtype == np.dtype("O"):
+                        arr = StringDtype().construct_array_type()._from_sequence(arr)


this seems weird to me. why are we casting potentially non-strings to strings?

jbrockmendel · 2023-05-03T22:45:33Z

pandas/core/internals/construction.py

+                        arr = IntegerArray(arr, np.zeros(arr.shape, dtype=np.bool_))
+                    elif is_bool_dtype(arr.dtype):
+                        arr = BooleanArray(arr, np.zeros(arr.shape, dtype=np.bool_))
+                    elif is_float_dtype(arr.dtype):


could re-use pd.array for L1011-L1016?

jbrockmendel · 2023-05-03T22:47:16Z

pandas/io/sql.py

+        dtype=None,
+        coerce_float=coerce_float,
+        use_nullable_dtypes=use_nullable_dtypes,
+    )


looks like this has gotten some Arrow-specific logic added here in the interim. can we move the backend-specific stuff from core.internals here?

phofl added 15 commits December 3, 2022 19:08

Start sql implementation

4d44e84

BUG: Fix bug in maybe_convert_objects with None and nullable

aff00f1

Add gh ref

9070acd

Merge branch 'maybe_convert_objects' into sql

a66e8bd

Continue sql implementation

f69d6d8

ENH: maybe_convert_objects add boolean support with NA

3d6958d

Merge remote-tracking branch 'upstream/main' into maybe_convert_objec…

48f41a2

…ts_nullable_boolean # Conflicts: # pandas/_libs/lib.pyx

Fix merge error

43545c5

Add gh ref

1ed72bf

Merge remote-tracking branch 'upstream/main' into sql

9675341

Merge branch 'maybe_convert_objects_nullable_boolean' into sql

e5bec77

Add to api

4e93577

Add tests

d5d0047

Fix test

62c798f

Merge branch 'maybe_convert_objects_nullable_boolean' into sql

a8e7fcd

phofl marked this pull request as draft December 3, 2022 22:01

phofl added 11 commits December 4, 2022 01:26

Fix test

da4f4ae

Simplify

85c995a

Merge branch 'maybe_convert_objects_nullable_boolean' into sql

8f6b859

Implement string support

87743f3

Add support for table

3363466

Add docstring

a0233cd

Add whatsnew

58561d7

Fix tests

6540859

Merge remote-tracking branch 'upstream/main' into sql

0b882ae

Fix pylint

80b61e3

Fix docstring

940bca7

phofl changed the title ~~ENH: Add nullable keyword to read_sql/ DRAFT~~ ENH: Add nullable keyword to read_sql Dec 5, 2022

phofl marked this pull request as ready for review December 5, 2022 14:40

phofl added the IO SQL to_sql, read_sql, read_sql_query label Dec 5, 2022

phofl added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Dec 5, 2022

Merge remote-tracking branch 'upstream/main' into sql

839ef57

Merge remote-tracking branch 'upstream/main' into sql

e67a0c8

WillAyd approved these changes Dec 10, 2022

View reviewed changes

mroeschke added this to the 2.0 milestone Dec 13, 2022

mroeschke approved these changes Dec 13, 2022

View reviewed changes

mroeschke merged commit 7c5017c into pandas-dev:main Dec 13, 2022

phofl deleted the sql branch December 13, 2022 22:32

asishm mentioned this pull request May 2, 2023

BUG: #53028

Open

3 tasks

jbrockmendel reviewed May 3, 2023

View reviewed changes

jorisvandenbossche mentioned this pull request May 6, 2023

REGR?: read_sql no longer supports duplicate column names #53117

Closed

asishm mentioned this pull request Jul 4, 2024

BUG: read_sql_query duplicates column names in cells in pandas v2.0.0 #52437

Open

3 tasks

asishm mentioned this pull request Jul 15, 2024

BUG: read_sql tries to convert blob/varbinary to string with pyarrow backend #59242

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add nullable keyword to read_sql #50048

ENH: Add nullable keyword to read_sql #50048

phofl commented Dec 3, 2022 •

edited

Loading

WillAyd commented Dec 10, 2022

phofl commented Dec 10, 2022

WillAyd left a comment

mroeschke commented Dec 13, 2022

mroeschke commented Dec 13, 2022

jbrockmendel May 3, 2023

jbrockmendel May 3, 2023

jbrockmendel May 3, 2023

ENH: Add nullable keyword to read_sql #50048

ENH: Add nullable keyword to read_sql #50048

Conversation

phofl commented Dec 3, 2022 • edited Loading

WillAyd commented Dec 10, 2022

phofl commented Dec 10, 2022

WillAyd left a comment

Choose a reason for hiding this comment

mroeschke commented Dec 13, 2022

mroeschke commented Dec 13, 2022

jbrockmendel May 3, 2023

Choose a reason for hiding this comment

jbrockmendel May 3, 2023

Choose a reason for hiding this comment

jbrockmendel May 3, 2023

Choose a reason for hiding this comment

phofl commented Dec 3, 2022 •

edited

Loading