Slicing columns with mixed types <str>,<int> fails with ValueError #20975

achampion · 2018-05-07T15:28:50Z

Code Sample, a copy-pastable example if possible

In []:
df = pd.DataFrame({'test':1, 1:2, 2:3}, index=[0])
df.loc[:, 'test':]
Out[]:
   test  1  2
0     1  2  3

In []:
df.loc[:, 1:]
Out[]:
TypeError                                 Traceback (most recent call last)
<ipython-input-655-3b97c047dcab> in <module>()
      1 df = pd.DataFrame({'test':1, 1:2, 2:3}, index=[0])
----> 2 df.loc[:, 1:]
[snip]
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/base.py in _invalid_indexer(self, form, key)
   1574                         "indexers [{key}] of {kind}".format(
   1575                             form=form, klass=type(self), key=key,
-> 1576                             kind=type(key)))
   1577 
   1578     def get_duplicates(self):

TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

In []:
df = pd.DataFrame({0:1, 1:2, 2:3}, index=[0])
df.loc[:, 1:]
Out[]:
   1  2
0  2  3

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]
When having a columns with mixed types, e.g. <str>, <int> any slicing using the int columns causes a ValueError.

When you have just int columns slicing by int does not cause an issue.

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.2.2
pip: 10.0.1
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.2
patsy: None
dateutil: 2.6.1
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0

The text was updated successfully, but these errors were encountered:

phofl · 2020-11-10T20:18:35Z

This works now.

df.loc[:, 1:]

returns

   1  2
0  2  3

hungyiwu · 2020-12-27T10:08:32Z

Hi, has the test been added? If not, I'd like to try adding this test.

~~Is pandas/tests/indexing/test_indexing.py a good place to add test for this?~~
[update] Is pandas/tests/indexing/test_loc.py TestLabelSlicing (link) a good place to add test for this?

hungyiwu · 2020-12-27T10:50:38Z

I tried adding a test under class TestLabelSlicing as below:

def test_loc_getitem_slice_columns_mixed_dtype(self):
    df1 = pd.DataFrame({'test':1, 1:2, 2:3}, index=[0])
    df2 = pd.DataFrame({1:2, 2:3}, index=[0])
    tm.assert_equal(df1.loc[:, 1:], df2)

and got this error:

AssertionError: DataFrame.columns are different

DataFrame.columns classes are not equivalent
[left]:  Index([1, 2], dtype='object')
[right]: Int64Index([1, 2], dtype='int64')

I understand pandas creates different indices when there's mixed dtype. The question is should I change from tm.assert_equal to other assertion methods, like assert df1.loc[:, 1:].equals(df2)? Or should I stay with tm.assert_equal and try to work around the dtype difference?

phofl · 2020-12-27T10:55:20Z

The assert_frame_equal is correct. You could create your df2 (please call it expected)differently. For example: columns=Index([1,2], dtype=object)

hungyiwu · 2020-12-27T11:14:00Z

Thank you for the help! I've opened a pull request.

…0975 (#38717)

…ndas-dev#20975 (pandas-dev#38717)

toobaz added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels May 8, 2018

jbrockmendel removed the Difficulty Intermediate label Oct 21, 2019

phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Nov 10, 2020

hungyiwu mentioned this issue Dec 27, 2020

Slicing columns with mixed types <str>,<int> fails with ValueError #20975 #38717

Merged

5 tasks

jreback added this to the 1.3 milestone Dec 27, 2020

jreback closed this as completed in #38717 Dec 27, 2020

jreback pushed a commit that referenced this issue Dec 27, 2020

Slicing columns with mixed types <str>,<int> fails with ValueError #2…

6d7b1c5

…0975 (#38717)

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this issue Jan 20, 2021

Slicing columns with mixed types <str>,<int> fails with ValueError pa…

5b10ed6

…ndas-dev#20975 (pandas-dev#38717)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slicing columns with mixed types <str>,<int> fails with ValueError #20975

Slicing columns with mixed types <str>,<int> fails with ValueError #20975

achampion commented May 7, 2018

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

phofl commented Nov 10, 2020

hungyiwu commented Dec 27, 2020 •

edited

Loading

hungyiwu commented Dec 27, 2020

phofl commented Dec 27, 2020

hungyiwu commented Dec 27, 2020

Slicing columns with mixed types <str>,<int> fails with ValueError #20975

Slicing columns with mixed types <str>,<int> fails with ValueError #20975

Comments

achampion commented May 7, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

phofl commented Nov 10, 2020

hungyiwu commented Dec 27, 2020 • edited Loading

hungyiwu commented Dec 27, 2020

phofl commented Dec 27, 2020

hungyiwu commented Dec 27, 2020

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

hungyiwu commented Dec 27, 2020 •

edited

Loading