-
Notifications
You must be signed in to change notification settings - Fork 21
Add a fill_nan
method to dataframe and column
#167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
1b98168
3c7f4a5
8c6e694
5acee9d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -456,3 +456,17 @@ def unique_indices(self, *, skip_nulls: bool = True) -> Column[int]: | |
To get the unique values, you can do ``col.get_rows(col.unique_indices())``. | ||
""" | ||
... | ||
|
||
def fill_nan(self, value: float | 'null', /): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A bit unrelated to this PR, but having There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We don't have numpy-style scalars (i.e., instances of a dtype) though? That's why we need a separate We could add dtype instances and specify that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: we can construct a column with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, but I think the thing that was agreed as the correct path forward was 0d arrays which we don't have on the DataFrame side. Those 0d arrays are strongly typed and don't have to deal with nulls. The issue that I see is that someone could do something like:
For example, PyArrow handles this by having an explicit Maybe we just need an explicit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We do have exactly that already: docs for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is an object for a It feels counter-intuitive that Columns are type-erased (i.e. just a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Either way, this should go into a new issue instead of this PR. Just the typing felt a bit funky to me here. I'll open a new issue for discussion and approve this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, a new issue sounds good for this. I had not thought before about a need for a null dtype; if there is one we should indeed consider it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Small clarification here: while pyarrow indeed has a "null" data type, we also have type-specific null scalars for each data type. And so in your specific example, the |
||
""" | ||
Fill floating point ``nan`` values with the given fill value. | ||
|
||
Parameters | ||
---------- | ||
value : float or `null` | ||
Value used to replace any ``nan`` in the column with. Must be | ||
of the Python scalar type matching the dtype of the column (or | ||
be `null`). | ||
|
||
""" | ||
... |
Uh oh!
There was an error while loading. Please reload this page.