BUG (string dtype): comparison of string column to mixed object column fails #60228 (fixed) #60392

TEARFEAR · 2024-11-22T00:47:29Z

Changes :

Updated comparison_op function in pandas/core/ops/array_ops.py:
- Added handling for comparisons between string and object dtypes.
- Ensured string and object types are casted to a common type (string) before comparison.
Added a test case to validate the fix:
- Verified comparison works for string vs object with homogeneous and mixed types.
- Verified behavior with PyArrow-based strings enabled (pd.options.future.infer_string = True).

--

closes BUG (string dtype): comparison of string column to mixed object column fails #60228 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

…s-dev#60228)

…n fails pandas-dev#60228

…v#60397) remove trim function on comment-commands.yml

…andas-dev#60182)

jorisvandenbossche

Thanks for working on this!
Added a few suggestions

jorisvandenbossche · 2024-11-23T09:17:19Z

pandas/core/ops/array_ops.py

+    if (is_string_dtype(lvalues) and is_object_dtype(rvalues)) or (
+        is_object_dtype(lvalues) and is_string_dtype(rvalues)


Checking for string dtype for an array can be expensive in case the array is object dtype (at that point it will scan all values to check if they are strings). So we might want to try avoid that at this level.
I think we could handle the issue specifically for the ArrowExtensionArray itself (see the code I referenced in #60228 (comment))

jorisvandenbossche · 2024-11-23T09:20:50Z

pandas/core/ops/array_ops.py

+    ):
+        if lvalues.dtype.name == "string" and rvalues.dtype == object:
+            lvalues = lvalues.astype("string")
+            rvalues = pd_array(rvalues, dtype="string")


We might need to do the casting the other way around. Instead of casting the object to string and then compare both as strings, I think we have to cast the string to object and compare both as object dtype.

The reason for this is that casting to string might actually convert values to strings, and then we are no longer doing the comparison for the original values.

>>> ser_string = pd.Series(["1", "b"]) >>> ser_mixed = pd.Series([1, "b"]) >>> ser_string == ser_mixed 0 False 1 True dtype: bool >>> ser_string == ser_mixed.astype("string") 0 True 1 True dtype: bool

So if we would do that casting under the hood, the result would change in this case.

And we should add this case to the tests!

jorisvandenbossche · 2024-11-23T09:23:07Z

pandas/tests/series/methods/test_compare.py

+
+def test_comparison_string_mixed_object():
+    # Issue https://github.com/pandas-dev/pandas/issues/60228
+    pd.options.future.infer_string = True


You don't need to add this for CI, because we have a separate CI build that enables this option for the full test suite.

Now, this can still be useful to test locally, but the way you can do this is with setting an environment variable (on linux I can do PANDAS_FUTURE_INFER_STRING=1 pytest ... to run the test with the option enabled.

…ult) (pandas-dev#60407) first

…v#60403)

…60399)

)

Fix formatting of complex numbers with exponents

Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.21.3 to 2.22.0. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.21.3...v2.22.0) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthew Roeschke <[email protected]>

* fix docstring api.types.is_re_compilable * fix lint error

…e" by "reshape" (pandas-dev#60420) Clarifying pandas.melt method documentation by replacing "massage" by "reshape" Meanwhile, "massage" is correct in a colloquial sense to mean transforming or reshaping data. This is far from accessible for a non-English speaker (as I am). Using the term `reshape` or `transform` is more meaningful while being accurate.

…performance (pandas-dev#60329)

Co-authored-by: Matthew Roeschke <[email protected]>

…pyarrow]" while PyArrow is not installed (pandas-dev#60413) * Add test * Fix * Add note * Update pandas/tests/dtypes/test_common.py Co-authored-by: Matthew Roeschke <[email protected]> * update * Fix doc warning --------- Co-authored-by: Matthew Roeschke <[email protected]>

…s-dev#60228)

…n fails pandas-dev#60228

…e-60228

…n fails pandas-dev#60228

…e-60228

…n fails pandas-dev#60228

jorisvandenbossche · 2024-11-29T09:37:21Z

@TEARFEAR git workflow related comment: it seems you have picked up some unrelated commits. In general doing something like

git fetch upstream
git merge upstream/main
git push origin bug-update-60228

should fix that (assuming origin is your fork and upstream is the pandas-dev/pandas remote)

jorisvandenbossche · 2024-12-18T18:47:30Z

(small ping to see if you have some time to update this PR)

github-actions · 2025-01-18T00:06:47Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

jorisvandenbossche · 2025-01-24T19:39:36Z

@TEARFEAR would you have time to update this PR? (otherwise I am also happy to update it, as we would like to include this in the 2.3 release)

github-actions · 2025-02-24T00:07:57Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2025-04-02T16:22:49Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

TEARFEAR and others added 9 commits November 21, 2024 14:35

fixed comparison of string column to mixed object column (issue panda…

4bc49ed

…s-dev#60228)

BUG (string dtype): comparison of string column to mixed object colum…

0def761

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

c4da919

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

900f3b1

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

8db4edc

…n fails pandas-dev#60228

Merge branch 'main' into bug-update-60228

d4ea527

CI/BUG: Remove trim() function on comment-commands.yml (pandas-de…

d4ae654

…v#60397) remove trim function on comment-commands.yml

DOC: Fixed spelling of 'behaviour' to 'behavior' (pandas-dev#60398)

eaa8b47

BUG: Convert output type in Excel for MultiIndex with period levels (p…

ee0902a

…andas-dev#60182)

jorisvandenbossche added this to the 2.3 milestone Nov 23, 2024

jorisvandenbossche added the Strings String extension data type and string data label Nov 23, 2024

jorisvandenbossche reviewed Nov 23, 2024

View reviewed changes

partev and others added 18 commits November 25, 2024 10:36

fix issue pandas-dev#60410 (pandas-dev#60412)

a2ceb52

DOC: fix SA01 for pandas.errors.UnsortedIndexError (pandas-dev#60404)

e78df6f

Fix BUG: Cannot shift Intervals that are not closed='right' (the defa…

cbd90ba

…ult) (pandas-dev#60407) first

DOC: fix SA01,ES01 for pandas.errors.PossibleDataLossError (pandas-de…

bca4b1c

…v#60403)

DOC: fix SA01 for pandas.errors.OutOfBoundsTimedelta (pandas-dev#60402)

582740b

DOC: fix SA01,ES01 for pandas.errors.DuplicateLabelError (pandas-dev#…

9fab4eb

…60399)

DOC: fix SA01,ES01 for pandas.errors.InvalidIndexError (pandas-dev#60400

00c2207

)

DOC: fix SA01 for pandas.errors.NumExprClobberingError (pandas-dev#60401

39dcbb4

)

TST: Avoid hashing np.timedelta64 without unit (pandas-dev#60416)

0b6cece

BUG: Fix formatting of complex numbers with exponents (pandas-dev#60417)

759874e

Fix formatting of complex numbers with exponents

DOC: fix docstring api.types.is_re_compilable (pandas-dev#60419)

ab757ff

* fix docstring api.types.is_re_compilable * fix lint error

replace twitter->X (pandas-dev#60426)

fd570f4

String dtype: use ObjectEngine for indexing for now correctness over …

98f7e4d

…performance (pandas-dev#60329)

DOC: Add type hint for squeeze method (pandas-dev#60415)

106f33c

Co-authored-by: Matthew Roeschke <[email protected]>

fixed comparison of string column to mixed object column (issue panda…

89e2efc

…s-dev#60228)

TEARFEAR added 6 commits November 28, 2024 10:06

BUG (string dtype): comparison of string column to mixed object colum…

a832418

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

7152b01

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

61ffbc0

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

104a60f

…n fails pandas-dev#60228

Merge remote-tracking branch 'origin/bug-update-60228' into bug-updat…

82625a3

…e-60228

BUG (string dtype): comparison of string column to mixed object colum…

658f757

…n fails pandas-dev#60228

TEARFEAR requested review from MarcoGorelli, WillAyd and mroeschke as code owners November 28, 2024 01:27

TEARFEAR added 4 commits November 28, 2024 10:28

BUG (string dtype): comparison of string column to mixed object colum…

0129c68

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

65ae2e2

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

497e8a6

…n fails pandas-dev#60228

Merge remote-tracking branch 'origin/bug-update-60228' into bug-updat…

62239ee

…e-60228

TEARFEAR marked this pull request as draft November 28, 2024 01:47

TEARFEAR added 3 commits November 28, 2024 14:19

BUG (string dtype): comparison of string column to mixed object colum…

56bc8b1

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

b301ac0

…n fails pandas-dev#60228

BUG (string dtype): comparison of string column to mixed object colum…

01887f8

…n fails pandas-dev#60228

TEARFEAR marked this pull request as ready for review November 28, 2024 15:15

TEARFEAR requested a review from jorisvandenbossche November 29, 2024 07:18

github-actions bot added the Stale label Jan 18, 2025

jorisvandenbossche removed the Stale label Jan 24, 2025

github-actions bot added the Stale label Feb 24, 2025

mroeschke closed this Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG (string dtype): comparison of string column to mixed object column fails #60228 (fixed) #60392

BUG (string dtype): comparison of string column to mixed object column fails #60228 (fixed) #60392

TEARFEAR commented Nov 22, 2024 •

edited by jorisvandenbossche

Loading

jorisvandenbossche left a comment

jorisvandenbossche Nov 23, 2024

jorisvandenbossche Nov 23, 2024

jorisvandenbossche Nov 23, 2024

jorisvandenbossche Nov 23, 2024

jorisvandenbossche commented Nov 29, 2024

jorisvandenbossche commented Dec 18, 2024

github-actions bot commented Jan 18, 2025

jorisvandenbossche commented Jan 24, 2025

github-actions bot commented Feb 24, 2025

mroeschke commented Apr 2, 2025

		if (is_string_dtype(lvalues) and is_object_dtype(rvalues)) or (
		is_object_dtype(lvalues) and is_string_dtype(rvalues)

BUG (string dtype): comparison of string column to mixed object column fails #60228 (fixed) #60392

BUG (string dtype): comparison of string column to mixed object column fails #60228 (fixed) #60392

Conversation

TEARFEAR commented Nov 22, 2024 • edited by jorisvandenbossche Loading

Changes :

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Nov 23, 2024

Choose a reason for hiding this comment

jorisvandenbossche Nov 23, 2024

Choose a reason for hiding this comment

jorisvandenbossche Nov 23, 2024

Choose a reason for hiding this comment

jorisvandenbossche Nov 23, 2024

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 29, 2024

jorisvandenbossche commented Dec 18, 2024

github-actions bot commented Jan 18, 2025

jorisvandenbossche commented Jan 24, 2025

github-actions bot commented Feb 24, 2025

mroeschke commented Apr 2, 2025

TEARFEAR commented Nov 22, 2024 •

edited by jorisvandenbossche

Loading