-
Notifications
You must be signed in to change notification settings - Fork 21
Add 2d constructor #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 2d constructor #208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable addition to me, especially given that scikit-learn would need it and we want a dataframe standard <--> array standard bridge.
@@ -96,6 +96,20 @@ def dataframe_from_dict(data: Mapping[str, Column[Any]]) -> DataFrame: | |||
""" | |||
... | |||
|
|||
def dataframe_from_2d_array(array: Any) -> DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have *, names: list[str]
I think, with the length of the list matching the number of columns?
Also a dtype
keyword, because we don't really want to get in the business of inferring it (going from array.dtype
to a dtype in this namespace is nontrivial; much easier for the caller who has the array namespace at hand).
""" | ||
Construct DataFrame from 2D array. | ||
|
||
See `column_from_sequece` for related 1D function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More importantly, I foresee a column_from_1d_array
request, because a 1-D array is not a sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware, thanks
just for my understanding, what makes it not a sequence? from the docs https://docs.python.org/3/glossary.html#term-sequence :
An iterable which supports efficient element access using integer indices via the getitem() special method and defines a len() method that returns the length of the sequence
, don't arrays meet the requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See data-apis/array-api#481 for len()
. A second issue is that iterating over it is going to return 0-D arrays of the dtype of the input array, not Python scalars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are arrays iterable per the array API? I would strongly prefer to differentiate array handling vs iterable handling since you can often handle array zero copy or at least in very efficient ways relative to iterables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I've added column from 1d too
dtypes : Mapping[str, DType] | ||
Dtype of each column. Must be the same length as ``array.shape[1]``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of casting do we allow vs disallow here? I.E. if someone passes an array of type float32
and a mapping that uses bool
dtypes here, is that something we expect to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No cross-kind casting? So, float32 to bool would be disallowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow downcasting within kind or only upcasting? I.e. is float64 array --> float32 column allowed? Or only float32 array --> float32 column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any array-api precedent to go off of? (haven't looked to carefully yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is. Here are the casting rules: https://data-apis.org/array-api/latest/API_specification/type_promotion.html#type-promotion. In summary:
- cross-kind casting is undefined, given that not all libraries allow that
- downcasting is disallow in general, and if you're sure that that is what you want, there is a single function (
astype
) to explicitly downcast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - OK to just go with that?
merging then, thanks for discussions and review! |
Also a request from scikit-learn