Skip to content

Slicing columns with mixed types <str>,<int> fails with ValueError #20975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
achampion opened this issue May 7, 2018 · 5 comments · Fixed by #38717
Closed

Slicing columns with mixed types <str>,<int> fails with ValueError #20975

achampion opened this issue May 7, 2018 · 5 comments · Fixed by #38717
Labels
Dtype Conversions Unexpected or buggy dtype conversions good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@achampion
Copy link

Code Sample, a copy-pastable example if possible

In []:
df = pd.DataFrame({'test':1, 1:2, 2:3}, index=[0])
df.loc[:, 'test':]
Out[]:
   test  1  2
0     1  2  3

In []:
df.loc[:, 1:]
Out[]:
TypeError                                 Traceback (most recent call last)
<ipython-input-655-3b97c047dcab> in <module>()
      1 df = pd.DataFrame({'test':1, 1:2, 2:3}, index=[0])
----> 2 df.loc[:, 1:]
[snip]
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/base.py in _invalid_indexer(self, form, key)
   1574                         "indexers [{key}] of {kind}".format(
   1575                             form=form, klass=type(self), key=key,
-> 1576                             kind=type(key)))
   1577 
   1578     def get_duplicates(self):

TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

In []:
df = pd.DataFrame({0:1, 1:2, 2:3}, index=[0])
df.loc[:, 1:]
Out[]:
   1  2
0  2  3

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]
When having a columns with mixed types, e.g. <str>, <int> any slicing using the int columns causes a ValueError.

When you have just int columns slicing by int does not cause an issue.

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.2.2
pip: 10.0.1
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.2
patsy: None
dateutil: 2.6.1
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0

@toobaz toobaz added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels May 8, 2018
@phofl
Copy link
Member

phofl commented Nov 10, 2020

This works now.

df.loc[:, 1:]

returns

   1  2
0  2  3

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Nov 10, 2020
@hungyiwu
Copy link
Contributor

hungyiwu commented Dec 27, 2020

Hi, has the test been added? If not, I'd like to try adding this test.

Is pandas/tests/indexing/test_indexing.py a good place to add test for this?
[update] Is pandas/tests/indexing/test_loc.py TestLabelSlicing (link) a good place to add test for this?

@hungyiwu
Copy link
Contributor

I tried adding a test under class TestLabelSlicing as below:

def test_loc_getitem_slice_columns_mixed_dtype(self):
    df1 = pd.DataFrame({'test':1, 1:2, 2:3}, index=[0])
    df2 = pd.DataFrame({1:2, 2:3}, index=[0])
    tm.assert_equal(df1.loc[:, 1:], df2)

and got this error:

AssertionError: DataFrame.columns are different

DataFrame.columns classes are not equivalent
[left]:  Index([1, 2], dtype='object')
[right]: Int64Index([1, 2], dtype='int64')

I understand pandas creates different indices when there's mixed dtype. The question is should I change from tm.assert_equal to other assertion methods, like assert df1.loc[:, 1:].equals(df2)? Or should I stay with tm.assert_equal and try to work around the dtype difference?

@phofl
Copy link
Member

phofl commented Dec 27, 2020

The assert_frame_equal is correct. You could create your df2 (please call it expected)differently. For example: columns=Index([1,2], dtype=object)

@hungyiwu
Copy link
Contributor

Thank you for the help! I've opened a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants