Skip to content

Add 2d constructor #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 1, 2023
Merged

Conversation

MarcoGorelli
Copy link
Contributor

Also a request from scikit-learn

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable addition to me, especially given that scikit-learn would need it and we want a dataframe standard <--> array standard bridge.

@@ -96,6 +96,20 @@ def dataframe_from_dict(data: Mapping[str, Column[Any]]) -> DataFrame:
"""
...

def dataframe_from_2d_array(array: Any) -> DataFrame:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have *, names: list[str] I think, with the length of the list matching the number of columns?

Also a dtype keyword, because we don't really want to get in the business of inferring it (going from array.dtype to a dtype in this namespace is nontrivial; much easier for the caller who has the array namespace at hand).

"""
Construct DataFrame from 2D array.

See `column_from_sequece` for related 1D function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in sequence

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly, I foresee a column_from_1d_array request, because a 1-D array is not a sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware, thanks

just for my understanding, what makes it not a sequence? from the docs https://docs.python.org/3/glossary.html#term-sequence :

An iterable which supports efficient element access using integer indices via the getitem() special method and defines a len() method that returns the length of the sequence

, don't arrays meet the requirement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See data-apis/array-api#481 for len(). A second issue is that iterating over it is going to return 0-D arrays of the dtype of the input array, not Python scalars.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are arrays iterable per the array API? I would strongly prefer to differentiate array handling vs iterable handling since you can often handle array zero copy or at least in very efficient ways relative to iterables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I've added column from 1d too

Comment on lines +139 to +140
dtypes : Mapping[str, DType]
Dtype of each column. Must be the same length as ``array.shape[1]``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of casting do we allow vs disallow here? I.E. if someone passes an array of type float32 and a mapping that uses bool dtypes here, is that something we expect to support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No cross-kind casting? So, float32 to bool would be disallowed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow downcasting within kind or only upcasting? I.e. is float64 array --> float32 column allowed? Or only float32 array --> float32 column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any array-api precedent to go off of? (haven't looked to carefully yet)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is. Here are the casting rules: https://data-apis.org/array-api/latest/API_specification/type_promotion.html#type-promotion. In summary:

  • cross-kind casting is undefined, given that not all libraries allow that
  • downcasting is disallow in general, and if you're sure that that is what you want, there is a single function (astype) to explicitly downcast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - OK to just go with that?

@MarcoGorelli
Copy link
Contributor Author

merging then, thanks for discussions and review!

@MarcoGorelli MarcoGorelli merged commit d6ac338 into data-apis:main Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants