BUG: Fixed bug when creating new column with missing values when setting a single string value #56321

MarcoGorelli · 2023-12-04T15:15:15Z

closes BUG: Creating a string column on a mask results in NaN being stringyfied and potentially truncated based on the 'maxchar' value of the column #56204 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

… single string value

phofl · 2023-12-06T17:14:07Z

doc/source/whatsnew/v2.1.4.rst

@@ -13,6 +13,7 @@ including other versions of pandas.

 Fixed regressions
 ~~~~~~~~~~~~~~~~~
+- Fixed bug when creating new column with missing values when setting a single string value (:issue:`56204`)


Can you put this into the bug column?

pandas/core/dtypes/missing.py

jbrockmendel · 2023-12-06T18:43:50Z

pandas/tests/frame/indexing/test_indexing.py

+def test_adding_new_conditional_column_with_string() -> None:
+    # https://github.com/pandas-dev/pandas/issues/56204
+    df = DataFrame({"a": [1, 2], "b": [3, 4]})
+    df.loc[lambda x: x.a == 1, "c"] = "1"


is it important that this be a lambda?

jbrockmendel · 2023-12-06T18:44:44Z

pandas/tests/frame/indexing/test_indexing.py

+    tm.assert_frame_equal(df, expected)
+
+
+def test_adding_new_conditional_column_with_infer_string() -> None:


maybe make this one test and parametrize? in 3.0 we'll only want to end up with one test here right?

MarcoGorelli · 2023-12-06T19:44:04Z

thanks for your reviews! have updated

jbrockmendel · 2023-12-06T23:36:27Z

Just tracked down an old branch where past-me was trying to get rid of infer_fill_value. The only recoverable bit of that branch looks like it would go just before L1880 in indexing.py

                        elif not is_list_like(value):
                            # Find our empty_value dtype by constructing an array
                            #  from our value and doing a .take on it
                            arr = sanitize_array(value, Index(range(1)), copy=False)
                            taker = -1 * np.ones(len(self.obj), dtype=np.intp)
                            empty_value = algos.take_nd(arr, taker)
                            self.obj[key] = empty_value

would that solve the issue here robustly? i like past-me's idea of using sanitize_array rather than yet-another constructor-esque function

MarcoGorelli · 2023-12-07T11:04:13Z

That would also avoid the warning when expanding column-wise:

main:

In [1]: df = pd.DataFrame({'a': [.1, .2], 'b': [.3, .4]})

In [2]: df.loc[1, 'c'] = True
<ipython-input-2-d3d953ae007a>:1: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'True' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df.loc[1, 'c'] = True

In [3]: df
Out[3]: 
     a    b     c
0  0.1  0.3   NaN
1  0.2  0.4  True

In [4]: df.dtypes
Out[4]: 
a    float64
b    float64
c     object
dtype: object

here:

In [1]: df = pd.DataFrame({'a': [.1, .2], 'b': [.3, .4]})

In [2]: df.loc[1, 'c'] = True

In [3]: df
Out[3]: 
     a    b     c
0  0.1  0.3   NaN
1  0.2  0.4  True

In [4]: df.dtypes
Out[4]: 
a    float64
b    float64
c     object
dtype: object

jbrockmendel

LGTM

mroeschke · 2023-12-20T18:07:35Z

doc/source/whatsnew/v2.1.4.rst

@@ -35,6 +35,7 @@ Bug fixes
 - Fixed bug in :meth:`Series.reset_index` not preserving object dtype when ``infer_string`` is set (:issue:`56160`)
 - Fixed bug in :meth:`Series.str.split` and :meth:`Series.str.rsplit` when ``pat=None`` for :class:`ArrowDtype` with ``pyarrow.string`` (:issue:`56271`)
 - Fixed bug in :meth:`Series.str.translate` losing object dtype when string option is set (:issue:`56152`)
+- Fixed bug when creating new column with missing values when setting a single string value (:issue:`56204`)


Could you move this 2.2.0.rst?

MarcoGorelli · 2023-12-20T21:04:28Z

thanks for your reviews! merging then

…ing a single string value (pandas-dev#56321)

MarcoGorelli added 3 commits December 4, 2023 13:55

wip

910ed9b

Fixed bug when creating new column with missing values when setting a…

b3774b1

… single string value

typing

5ab3d09

MarcoGorelli force-pushed the pdep6-regression branch from 00bed88 to b3774b1 Compare December 4, 2023 15:16

MarcoGorelli added 2 commits December 6, 2023 09:44

Merge remote-tracking branch 'upstream/main' into pdep6-regression

8c87c1d

use pd.array instead

61cdef1

MarcoGorelli marked this pull request as ready for review December 6, 2023 10:38

MarcoGorelli requested review from jbrockmendel and phofl December 6, 2023 15:03

phofl reviewed Dec 6, 2023

View reviewed changes

pandas/core/dtypes/missing.py Outdated Show resolved Hide resolved

mroeschke added Strings String extension data type and string data Arrow pyarrow functionality labels Dec 6, 2023

move in whatsnew section, parametrise

9e7055a

jbrockmendel reviewed Dec 6, 2023

View reviewed changes

MarcoGorelli added 3 commits December 7, 2023 10:36

Merge remote-tracking branch 'upstream/main' into pdep6-regression

9dca984

find empty_value dtype by constructing array and doing take

20898db

revert

8c30505

Merge branch 'main' into pdep6-regression

1f6095d

jbrockmendel approved these changes Dec 20, 2023

View reviewed changes

mroeschke reviewed Dec 20, 2023

View reviewed changes

MarcoGorelli added 2 commits December 20, 2023 19:08

Merge remote-tracking branch 'upstream/main' into pdep6-regression

1b27efe

move whatsnew note to 2.2.0

40093c2

MarcoGorelli merged commit 6a65c64 into pandas-dev:main Dec 20, 2023

cbpygit pushed a commit to cbpygit/pandas that referenced this pull request Jan 2, 2024

BUG: Fixed bug when creating new column with missing values when sett…

9420ae2

…ing a single string value (pandas-dev#56321)

MarcoGorelli mentioned this pull request Feb 15, 2024

BUG: wrong future Warning on string assignment in certain condition #57402

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fixed bug when creating new column with missing values when setting a single string value #56321

BUG: Fixed bug when creating new column with missing values when setting a single string value #56321

Uh oh!

MarcoGorelli commented Dec 4, 2023

Uh oh!

phofl Dec 6, 2023

Uh oh!

Uh oh!

jbrockmendel Dec 6, 2023

Uh oh!

jbrockmendel Dec 6, 2023

Uh oh!

MarcoGorelli commented Dec 6, 2023

Uh oh!

jbrockmendel commented Dec 6, 2023

Uh oh!

MarcoGorelli commented Dec 7, 2023

Uh oh!

jbrockmendel left a comment

Uh oh!

mroeschke Dec 20, 2023

Uh oh!

MarcoGorelli commented Dec 20, 2023

Uh oh!

Uh oh!

		tm.assert_frame_equal(df, expected)


		def test_adding_new_conditional_column_with_infer_string() -> None:

Uh oh!

BUG: Fixed bug when creating new column with missing values when setting a single string value #56321

BUG: Fixed bug when creating new column with missing values when setting a single string value #56321

Uh oh!

Conversation

MarcoGorelli commented Dec 4, 2023

Uh oh!

phofl Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbrockmendel Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Dec 6, 2023

Uh oh!

jbrockmendel commented Dec 6, 2023

Uh oh!

MarcoGorelli commented Dec 7, 2023

Uh oh!

jbrockmendel left a comment

Choose a reason for hiding this comment

Uh oh!

mroeschke Dec 20, 2023

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Dec 20, 2023

Uh oh!

Uh oh!