-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Interchange protocol implementation allows non-string column names #57174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Interchange protocol implementation allows non-string column names #57174
Conversation
05aaf61
to
3cfa488
Compare
df2 = df.__dataframe__() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what this line was ever meant to achieve, as nothing is done with df2
, it's just re-assigned to in L172
keeping it in results in
> df.loc[null_idx, col] = None
pandas/tests/interchange/test_impl.py:172:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/core/indexing.py:[91](https://github.com/pandas-dev/pandas/actions/runs/7727513398/job/21066166133#step:8:93)2: in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
pandas/core/indexing.py:1[94](https://github.com/pandas-dev/pandas/actions/runs/7727513398/job/21066166133#step:8:96)6: in _setitem_with_indexer
self._setitem_with_indexer_split_path(indexer, value, name)
pandas/core/indexing.py:2039: in _setitem_with_indexer_split_path
self._setitem_single_column(loc, value, pi)
pandas/core/indexing.py:2170: in _setitem_single_column
self.obj._mgr.column_setitem(loc, plane_indexer, value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = BlockManager
Items: Index(['x', 'y', 'z'], dtype='object')
Axis 1: RangeIndex(start=0, stop=5, step=1)
NumpyBlock: slice(0, 3, 1), 3 x 5, dtype: float64
loc = 0, idx = array([2, 3, 1, 0]), value = None, inplace_only = False
def column_setitem(
self, loc: int, idx: int | slice | np.ndarray, value, inplace_only: bool = False
) -> None:
"""
Set values ("setitem") into a single column (not setting the full column).
This is a method on the BlockManager level, to avoid creating an
intermediate Series at the DataFrame level (`s = df[loc]; s[idx] = value`)
"""
needs_to_warn = False
if warn_copy_on_write() and not self._has_no_reference(loc):
if not isinstance(
self.blocks[self.blknos[loc]].values,
(ArrowExtensionArray, ArrowStringArray),
):
# We might raise if we are in an expansion case, so defer
# warning till we actually updated
needs_to_warn = True
elif using_copy_on_write() and not self._has_no_reference(loc):
blkno = self.blknos[loc]
# Split blocks to only copy the column we want to modify
blk_loc = self.blklocs[loc]
# Copy our values
values = self.blocks[blkno].values
if values.ndim == 1:
values = values.copy()
else:
# Use [blk_loc] as indexer to keep ndim=2, this already results in a
# copy
values = values[[blk_loc]]
self._iset_split_block(blkno, [blk_loc], values)
# this manager is only created temporarily to mutate the values in place
# so don't track references, otherwise the `setitem` would perform CoW again
col_mgr = self.iget(loc, track_ref=False)
if inplace_only:
col_mgr.setitem_inplace(idx, value)
else:
new_mgr = col_mgr.setitem((idx,), value)
self.iset(loc, new_mgr._block.values, inplace=True)
if needs_to_warn:
> warnings.warn(
COW_WARNING_GENERAL_MSG,
FutureWarning,
stacklevel=find_stack_level(),
)
E FutureWarning: Setting a value on a view: behaviour will change in pandas 3.0.
E You are mutating a Series or DataFrame object, and currently this mutation will
E also have effect on other Series or DataFrame objects that share data with this
E object. In pandas 3.0 (with Copy-on-Write), updating one Series or DataFrame object
E will never modify another.
I think the simplest thing is to just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good just a merge conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…n allows non-string column names
Thanks @MarcoGorelli |
…ntation allows non-string column names) (#57203) Backport PR #57174: BUG: Interchange protocol implementation allows non-string column names Co-authored-by: Marco Edward Gorelli <[email protected]>
…es (pandas-dev#57174) * convert non-string colnames to strings in interchange protocol * remove irrelevant statement * informative error message if two columns end up becoming duplicates
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.