Skip to content

Typing names argument in select_columns_by_name #212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mroeschke opened this issue Jul 27, 2023 · 3 comments
Closed

Typing names argument in select_columns_by_name #212

mroeschke opened this issue Jul 27, 2023 · 3 comments

Comments

@mroeschke
Copy link

Currently DataFrame.select_columns_by_name has names typed as names: Sequence[str].

I think the intention here is Sequence is a non-scalar container of string labels, but Sequence[str] also matches a pure str.

@MarcoGorelli
Copy link
Contributor

hey - this looks correct to me

a str is an sequence of str. If you have single-letter column names, and pass a string with the column names, I'd expect it to work

and it does:

In [20]: pd.api.interchange.from_dataframe(pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [7,8,9]}).__d
    ...: ataframe__().select_columns_by_name('ab'))
Out[20]:
   a  b
0  1  4
1  2  5
2  3  6

@mroeschke
Copy link
Author

Ah okay I forgot about that possibility. Closing then

@rgommers
Copy link
Member

.select_columns_by_name('ab')

seems pretty ambiguous and bug-prone to me. I'd expect that to give me a single column named 'ab', not two columns named 'a' and 'b'. I think the intention was for this to be spelled ['a', 'b'].

Unfortunately I'm not sure if there's a way to fix this to make it unambiguous while allowing all non-string sequences. And list[str, ...] may be too restrictive. So I guess we have to leave it as is either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants