-
Notifications
You must be signed in to change notification settings - Fork 21
add __len__ and __getitem__ to Column #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add __len__ and __getitem__ to Column #140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @MarcoGorelli. Let's leave it open for a couple of days to give others a chance to comment.
def __getitem__(self, row) -> object: | ||
""" | ||
Get the element at row index `row`. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected type of "row" here? An integer?
I wonder if we should have more explicit methods than __getitem__
since intuitively people will try to feed both scalars and Sequences / Columns into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, forgot to annotate that one, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have more explicit methods than getitem since intuitively people will try to feed both scalars and Sequences / Columns into it.
could you expand please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, Pandas and other dataframe libraries have allowed passing scalar-like objects here and getting a scalar-like object in return or passing Sequence-like / Column-like objects here and getting Column-like objects in return. I.E. for Pandas:
import pandas as pd
pds = pd.Series(['a', 'b', 'c', 'd', 'e'])
print(pds[0]) # Scalar-like input --> Scalar-like output
print(pds[[0]]) # Sequence-like input --> Column-like output
For our DataFrame object we have the get_rows
method that takes a Sequence-like (should probably take a Column-like) where we could presumably have the same function for our Column object, and then instead of using __getitem__
we could have something like get_element
or get_scalar
or something like that.
If we wanted to keep the __getitem__
method I would suggest we handle both the Scalar-like and Sequence-like / Column-like cases as opposed to just the Scalar-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have the get_rows method that takes a Sequence-like (should probably take a Column-like)
yeah, I came across this - I'll open a separate issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to keep the getitem method I would suggest we handle both the Scalar-like and Sequence-like / Column-like cases as opposed to just the Scalar-like.
I've pushed a commit but am not overly keen on it, how about Column.get_rows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added another commit with get_rows, better to avoid overloads if we can help it
This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g., data-apis#140 (comment)
def __getitem__(self, row: int) -> object: | ||
""" | ||
Get the element at row index `key`. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still feels a bit funky to me to only support the scalar case via __getitem__
where I think it would go against folks natural intuition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's already get_rows
for that - perhaps we could start like this, and then we can always overload this one if necessary based on feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another use case for getitem (or separate method) is slicing (#89 indicates adding a separate method for that as well)
* Add design topic page on use of Python builtin types This addresses a concern that has come up a number of times about whether it's okay to implement a library-specific object instead of a builtin type. E.g., #140 (comment) * skipna -> `skip_nulls`
as discussed yesterday, I've removed for some existing methods (e.g. |
Should that use |
yeah I like that more. will address in a separate PR though as there's some unrelated methods which would need changing too, so easier to do them all in one go and stay consistent with this PR |
right, let's move forwards with this, thanks for reviews |
No description provided.