BUG: Interchange protocol implementation allows non-string column names #57174

MarcoGorelli · 2024-01-31T14:22:15Z

closes BUG: Interchange protocol implementation allows non-string column names #56701 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

MarcoGorelli · 2024-01-31T15:27:35Z

pandas/tests/interchange/test_impl.py

-    df2 = df.__dataframe__()
-


not sure what this line was ever meant to achieve, as nothing is done with df2, it's just re-assigned to in L172

keeping it in results in

> df.loc[null_idx, col] = None pandas/tests/interchange/test_impl.py:172: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/core/indexing.py:[91](https://github.com/pandas-dev/pandas/actions/runs/7727513398/job/21066166133#step:8:93)2: in __setitem__ iloc._setitem_with_indexer(indexer, value, self.name) pandas/core/indexing.py:1[94](https://github.com/pandas-dev/pandas/actions/runs/7727513398/job/21066166133#step:8:96)6: in _setitem_with_indexer self._setitem_with_indexer_split_path(indexer, value, name) pandas/core/indexing.py:2039: in _setitem_with_indexer_split_path self._setitem_single_column(loc, value, pi) pandas/core/indexing.py:2170: in _setitem_single_column self.obj._mgr.column_setitem(loc, plane_indexer, value) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = BlockManager Items: Index(['x', 'y', 'z'], dtype='object') Axis 1: RangeIndex(start=0, stop=5, step=1) NumpyBlock: slice(0, 3, 1), 3 x 5, dtype: float64 loc = 0, idx = array([2, 3, 1, 0]), value = None, inplace_only = False def column_setitem( self, loc: int, idx: int | slice | np.ndarray, value, inplace_only: bool = False ) -> None: """ Set values ("setitem") into a single column (not setting the full column). This is a method on the BlockManager level, to avoid creating an intermediate Series at the DataFrame level (`s = df[loc]; s[idx] = value`) """ needs_to_warn = False if warn_copy_on_write() and not self._has_no_reference(loc): if not isinstance( self.blocks[self.blknos[loc]].values, (ArrowExtensionArray, ArrowStringArray), ): # We might raise if we are in an expansion case, so defer # warning till we actually updated needs_to_warn = True elif using_copy_on_write() and not self._has_no_reference(loc): blkno = self.blknos[loc] # Split blocks to only copy the column we want to modify blk_loc = self.blklocs[loc] # Copy our values values = self.blocks[blkno].values if values.ndim == 1: values = values.copy() else: # Use [blk_loc] as indexer to keep ndim=2, this already results in a # copy values = values[[blk_loc]] self._iset_split_block(blkno, [blk_loc], values) # this manager is only created temporarily to mutate the values in place # so don't track references, otherwise the `setitem` would perform CoW again col_mgr = self.iget(loc, track_ref=False) if inplace_only: col_mgr.setitem_inplace(idx, value) else: new_mgr = col_mgr.setitem((idx,), value) self.iset(loc, new_mgr._block.values, inplace=True) if needs_to_warn: > warnings.warn( COW_WARNING_GENERAL_MSG, FutureWarning, stacklevel=find_stack_level(), ) E FutureWarning: Setting a value on a view: behaviour will change in pandas 3.0. E You are mutating a Series or DataFrame object, and currently this mutation will E also have effect on other Series or DataFrame objects that share data with this E object. In pandas 3.0 (with Copy-on-Write), updating one Series or DataFrame object E will never modify another.

I think the simplest thing is to just remove it?

mroeschke

Looks good just a merge conflict

WillAyd

lgtm

pandas/tests/interchange/test_impl.py

…n allows non-string column names

mroeschke · 2024-02-02T01:26:33Z

Thanks @MarcoGorelli

…ntation allows non-string column names) (#57203) Backport PR #57174: BUG: Interchange protocol implementation allows non-string column names Co-authored-by: Marco Edward Gorelli <[email protected]>

…es (pandas-dev#57174) * convert non-string colnames to strings in interchange protocol * remove irrelevant statement * informative error message if two columns end up becoming duplicates

MarcoGorelli added 2 commits January 31, 2024 12:58

convert non-string colnames to strings in interchange protocol

3cfa488

remove irrelevant statement

3d3a0f4

MarcoGorelli force-pushed the non-str-names-interchange branch from 05aaf61 to 3cfa488 Compare January 31, 2024 14:23

MarcoGorelli commented Jan 31, 2024

View reviewed changes

MarcoGorelli requested a review from jorisvandenbossche January 31, 2024 15:27

MarcoGorelli marked this pull request as ready for review January 31, 2024 16:42

mroeschke reviewed Jan 31, 2024

View reviewed changes

mroeschke added the Interchange Dataframe Interchange Protocol label Jan 31, 2024

mroeschke modified the milestones: 3.0, 2.2.1 Jan 31, 2024

Merge branch 'main' into non-str-names-interchange

1c347b5

WillAyd approved these changes Jan 31, 2024

View reviewed changes

pandas/tests/interchange/test_impl.py Show resolved Hide resolved

MarcoGorelli and others added 3 commits February 1, 2024 17:16

Merge branch 'main' into non-str-names-interchange

f4657b5

informative error message if two columns end up becoming duplicates

af6b43d

Merge branch 'main' into non-str-names-interchange

ba9eeff

WillAyd approved these changes Feb 1, 2024

View reviewed changes

mroeschke approved these changes Feb 2, 2024

View reviewed changes

mroeschke merged commit 8ed7dae into pandas-dev:main Feb 2, 2024

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 2, 2024

Backport PR pandas-dev#57174: BUG: Interchange protocol implementatio…

27a814b

…n allows non-string column names

meeseeksmachine mentioned this pull request Feb 2, 2024

Backport PR #57174 on branch 2.2.x (BUG: Interchange protocol implementation allows non-string column names) #57203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Interchange protocol implementation allows non-string column names #57174

BUG: Interchange protocol implementation allows non-string column names #57174

MarcoGorelli commented Jan 31, 2024

MarcoGorelli Jan 31, 2024

mroeschke left a comment

WillAyd left a comment

mroeschke commented Feb 2, 2024

BUG: Interchange protocol implementation allows non-string column names #57174

BUG: Interchange protocol implementation allows non-string column names #57174

Conversation

MarcoGorelli commented Jan 31, 2024

MarcoGorelli Jan 31, 2024

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

mroeschke commented Feb 2, 2024