Skip to content

Commit 1a1bd66

Browse files
authored
Don't identify decimals as strings. (#7710)
As documented in [this pandas issue](pandas-dev/pandas#15585), `is_string_type` for pandas is not strict and will characterize a whole bunch of things as strings that aren't. For our purposes, this is problematic because basically all subclasses of `ExtensionDType` will be classified as strings by that function. This is definitely not appropriate, so I modified our version of `is_string_dtype` to explicitly reject all of our extension dtypes (previously it was only excluding categorical types). I'm not 100% confident that no other parts of the code base rely on the current (erroneous) behavior, but the cudf tests all passed for me locally and my attempt to trace all calls of `utils.is_string_dtype` all look to be places where the change gives more correct behavior, so I think our best bet is to just move forward with this change. Any problems that result from this change in the future due to other code relying on the current behavior should probably be characterized as bugs in the calling code and fixed there. The same goes for for external codes that relied on this behavior; this change is potentially breaking for them as well, but again is something that they should be addressing. Authors: - Vyas Ramasubramani (@vyasr) Approvers: - Keith Kraus (@kkraus14) URL: #7710
1 parent eb92145 commit 1a1bd66

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

python/cudf/cudf/utils/dtypes.py

+9-1
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,15 @@ def is_numerical_dtype(obj):
154154

155155

156156
def is_string_dtype(obj):
157-
return pd.api.types.is_string_dtype(obj) and not is_categorical_dtype(obj)
157+
return (
158+
pd.api.types.is_string_dtype(obj)
159+
# Reject all cudf extension types.
160+
and not is_categorical_dtype(obj)
161+
and not is_decimal_dtype(obj)
162+
and not is_list_dtype(obj)
163+
and not is_struct_dtype(obj)
164+
and not is_interval_dtype(obj)
165+
)
158166

159167

160168
def is_datetime_dtype(obj):

0 commit comments

Comments
 (0)