Skip to content

add __len__ and __getitem__ to Column #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 4, 2023

Conversation

MarcoGorelli
Copy link
Contributor

No description provided.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @MarcoGorelli. Let's leave it open for a couple of days to give others a chance to comment.

Comment on lines 9 to 12
def __getitem__(self, row) -> object:
"""
Get the element at row index `row`.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected type of "row" here? An integer?

I wonder if we should have more explicit methods than __getitem__ since intuitively people will try to feed both scalars and Sequences / Columns into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, forgot to annotate that one, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have more explicit methods than getitem since intuitively people will try to feed both scalars and Sequences / Columns into it.

could you expand please?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, Pandas and other dataframe libraries have allowed passing scalar-like objects here and getting a scalar-like object in return or passing Sequence-like / Column-like objects here and getting Column-like objects in return. I.E. for Pandas:

import pandas as pd

pds = pd.Series(['a', 'b', 'c', 'd', 'e'])

print(pds[0])  # Scalar-like input --> Scalar-like output
print(pds[[0]])  # Sequence-like input --> Column-like output

For our DataFrame object we have the get_rows method that takes a Sequence-like (should probably take a Column-like) where we could presumably have the same function for our Column object, and then instead of using __getitem__ we could have something like get_element or get_scalar or something like that.

If we wanted to keep the __getitem__ method I would suggest we handle both the Scalar-like and Sequence-like / Column-like cases as opposed to just the Scalar-like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the get_rows method that takes a Sequence-like (should probably take a Column-like)

yeah, I came across this - I'll open a separate issue

Copy link
Contributor Author

@MarcoGorelli MarcoGorelli Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to keep the getitem method I would suggest we handle both the Scalar-like and Sequence-like / Column-like cases as opposed to just the Scalar-like.

I've pushed a commit but am not overly keen on it, how about Column.get_rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added another commit with get_rows, better to avoid overloads if we can help it

rgommers added a commit to rgommers/dataframe-api that referenced this pull request Apr 26, 2023
This addresses a concern that has come up a number of times about
whether it's okay to implement a library-specific object instead
of a builtin type. E.g.,
data-apis#140 (comment)
Comment on lines 11 to 14
def __getitem__(self, row: int) -> object:
"""
Get the element at row index `key`.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still feels a bit funky to me to only support the scalar case via __getitem__ where I think it would go against folks natural intuition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's already get_rows for that - perhaps we could start like this, and then we can always overload this one if necessary based on feedback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another use case for getitem (or separate method) is slicing (#89 indicates adding a separate method for that as well)

MarcoGorelli pushed a commit that referenced this pull request Apr 27, 2023
* Add design topic page on use of Python builtin types

This addresses a concern that has come up a number of times about
whether it's okay to implement a library-specific object instead
of a builtin type. E.g.,
#140 (comment)

* skipna -> `skip_nulls`
@MarcoGorelli
Copy link
Contributor Author

as discussed yesterday, I've removed __getitem__ and added get_value

for some existing methods (e.g. Column.mean), we have dtype at the return value to mean "something of the dtype of the column". I've done the same here - orthogonal to this issue, but we should fine a better way of expressing that

@jorisvandenbossche
Copy link
Member

for some existing methods (e.g. Column.mean), we have dtype at the return value to mean "something of the dtype of the column". I've done the same here - orthogonal to this issue, but we should fine a better way of expressing that

Should that use Scalar instead? (or Scalar[dtype] if we want to indicate it is a scalar of the specific dtype)

@MarcoGorelli
Copy link
Contributor Author

yeah I like that more. will address in a separate PR though as there's some unrelated methods which would need changing too, so easier to do them all in one go and stay consistent with this PR

@MarcoGorelli
Copy link
Contributor Author

right, let's move forwards with this, thanks for reviews

@MarcoGorelli MarcoGorelli merged commit e6de485 into data-apis:main May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants