Skip to content

Add details to expectations for scalars #308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
751a131
note what may raise
MarcoGorelli Oct 31, 2023
7cb90ea
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 7, 2023
fc65648
list required methods
MarcoGorelli Nov 8, 2023
7c24afd
add scalar class
MarcoGorelli Nov 8, 2023
2714c13
reword
MarcoGorelli Nov 8, 2023
99b91a5
fixup
MarcoGorelli Nov 8, 2023
a867f00
fixup
MarcoGorelli Nov 8, 2023
9e13924
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 8, 2023
f197672
fixup
MarcoGorelli Nov 8, 2023
a417b1c
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 14, 2023
409d8f3
replace Scalar|NullType with Scalar
MarcoGorelli Nov 14, 2023
0db9871
type null as Scalar
MarcoGorelli Nov 14, 2023
6520ac4
add example of working with scalars
MarcoGorelli Nov 14, 2023
46bc08c
use AnyScalar;
MarcoGorelli Nov 15, 2023
b879f31
add Scalar.dtype and Scalar.persist
MarcoGorelli Nov 15, 2023
456c152
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 15, 2023
b8011c7
update shift arg
MarcoGorelli Nov 15, 2023
97d8f9a
use BoolScalar
MarcoGorelli Nov 15, 2023
3b7bcb6
use float scalar in some parts
MarcoGorelli Nov 15, 2023
d598a8d
use float scalar in some parts
MarcoGorelli Nov 15, 2023
a12585b
string scalar for rename
MarcoGorelli Nov 15, 2023
35cd4ed
intscalar for shift
MarcoGorelli Nov 15, 2023
fade164
numeric scalar for correction
MarcoGorelli Nov 15, 2023
29ceed2
simplify
MarcoGorelli Nov 15, 2023
bee402f
update python builtin types desc
MarcoGorelli Nov 15, 2023
d1f4daf
Merge remote-tracking branch 'upstream/main' into expand-on-scalars
MarcoGorelli Nov 15, 2023
15090ac
fixup
MarcoGorelli Nov 15, 2023
24d2ad8
enable extra ruff rule, note AnyScalar
MarcoGorelli Nov 15, 2023
8360d96
remove some unnecessary nitpick ignores
MarcoGorelli Nov 15, 2023
f69679a
return Self from Scalar.persist, add column.persist
MarcoGorelli Nov 17, 2023
216b5e6
fixup
MarcoGorelli Nov 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions spec/API_specification/dataframe_api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@
"""
from __future__ import annotations

from typing import Dict, Sequence, Any, TYPE_CHECKING
from typing import TYPE_CHECKING, Any, Dict, Sequence

from .column_object import *
from .dataframe_object import DataFrame
from .groupby_object import *
from .dtypes import *
from .groupby_object import *

if TYPE_CHECKING:
from .typing import DType, Scalar
Expand Down
6 changes: 4 additions & 2 deletions spec/API_specification/dataframe_api/column_object.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
from __future__ import annotations

from typing import Any,NoReturn, TYPE_CHECKING, Literal, Protocol
from typing import TYPE_CHECKING, Any, Literal, NoReturn, Protocol

if TYPE_CHECKING:
from .typing import NullType, Scalar, DType, Namespace
from typing_extensions import Self

from .scalar_object import Scalar
from .typing import DType, Namespace, NullType


__all__ = ['Column']

Expand Down
9 changes: 5 additions & 4 deletions spec/API_specification/dataframe_api/dataframe_object.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
from __future__ import annotations

from typing import Any, Literal, Mapping, Sequence, TYPE_CHECKING, NoReturn, Protocol

from typing import TYPE_CHECKING, Any, Literal, Mapping, NoReturn, Protocol, Sequence

if TYPE_CHECKING:
from typing_extensions import Self

from .column_object import Column
from .groupby_object import GroupBy
from .typing import NullType, Scalar, Namespace, DType, SupportsDataFrameAPI
from typing_extensions import Self
from .scalar_object import Scalar
from .typing import DType, Namespace, NullType, SupportsDataFrameAPI


__all__ = ["DataFrame"]
Expand Down
88 changes: 88 additions & 0 deletions spec/API_specification/dataframe_api/scalar_object.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
from __future__ import annotations

from typing import Any, Protocol

__all__ = ['Scalar']


class Scalar(Protocol):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a dtype property similar to what we have with Columns?

Copy link
Contributor Author

@MarcoGorelli MarcoGorelli Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest I'd be fine with departing completely from the idea of ducktyped Python scalars and just adding extra things (like dtype or persist) if they're useful

let's discuss this part in the next call

this would mean that Python scalars would no longer implement the Scalar Protocol

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a clean way from a typing perspective to allow people to pass either Scalar objects or typical Python scalars that we can handle constructing Scalars from without issue? Without that I think the API will be a bit of a pain with people having to sprinkle Scalar(...) or Scalar.from_py(...) all over.

We should also have some way to go from Python scalars to these Scalar objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've added an example (spec/API_specification/examples/05_scalars_example.py)

Python scalars do in fact implement the Scalar protocol, so they can be passed without any issue

I think this is an argument against adding dtype and other methods to our scalars which aren't supported by Python ones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python scalars do in fact implement the Scalar protocol, so they can be passed without any issue

at least, Python floats do

trying to do fill_null('foo') wouldn't be fine

I'm starting to see the writing on the wall for FloatScalar / IntScalar / StringScalar ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine Scalars would be typed following the Column dtypes, which then allows us to define a consistent set of type handling and type promotion rules. Otherwise, if Scalars are not typed similarly to Columns, you may need to introspect the value of a Scalar for example in calling fill_null against an Int8 dtype column with an IntScalar where the value is above what is supported by Int8.

"""
Scalar object

Not meant to be instantiated directly, but rather created via
`:meth:Column.get_value` or one of the column reductions such
as `:meth:`Column.sum`.
"""
def __lt__(self, other: Any) -> Scalar:
...

def __le__(self, other: Any) -> Scalar:
...

def __eq__(self, other: Any) -> Scalar: # type: ignore[override]
...

def __ne__(self, other: Any) -> Scalar: # type: ignore[override]
...

def __gt__(self, other: Any) -> Scalar:
...

def __ge__(self, other: Any) -> Scalar:
...

def __add__(self, other: Any) -> Scalar:
...

def __radd__(self, other: Any) -> Scalar:
...

def __sub__(self, other: Any) -> Scalar:
...

def __rsub__(self, other: Any) -> Scalar:
...

def __mul__(self, other: Any) -> Scalar:
...

def __rmul__(self, other: Any) -> Scalar:
...

def __mod__(self, other: Any) -> Scalar:
...

def __rmod__(self, other: Any) -> Scalar:
...

def __pow__(self, other: Any) -> Scalar:
...

def __rpow__(self, other: Any) -> Scalar:
...

def __floordiv__(self, other: Any) -> Scalar:
...

def __rfloordiv__(self, other: Any) -> Scalar:
...

def __truediv__(self, other: Any) -> Scalar:
...

def __rtruediv__(self, other: Any) -> Scalar:
...

def __neg__(self) -> Scalar:
...

def __abs__(self) -> Scalar:
...

def __bool__(self) -> bool:
"""
Note that this return a Python scalar.

Depending on the implementation, this may raise or trigger computation.
"""
...
37 changes: 14 additions & 23 deletions spec/API_specification/dataframe_api/typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,38 +3,32 @@
"""
from __future__ import annotations

from typing import (
TYPE_CHECKING,
Any,
Literal,
Dict,
Protocol,
Sequence,
Union,
)
from typing import TYPE_CHECKING, Any, Dict, Literal, Protocol, Sequence, Union

from dataframe_api.column_object import Column
from dataframe_api.dataframe_object import DataFrame
from dataframe_api.groupby_object import GroupBy, Aggregation as AggregationT
from dataframe_api.groupby_object import Aggregation as AggregationT
from dataframe_api.groupby_object import GroupBy

if TYPE_CHECKING:
from .dtypes import (
Bool,
Float64,
Float32,
Int64,
Int32,
Int16,
Int8,
UInt64,
UInt32,
UInt16,
UInt8,
Date,
Datetime,
Duration,
Float32,
Float64,
Int8,
Int16,
Int32,
Int64,
String,
UInt8,
UInt16,
UInt32,
UInt64,
)
from .scalar_object import Scalar

DType = Union[
Bool,
Expand All @@ -54,9 +48,6 @@
Duration,
]

# Type alias: Mypy needs Any, but for readability we need to make clear this
# is a Python scalar (i.e., an instance of `bool`, `int`, `float`, `str`, etc.)
Scalar = Any
# null is a special object which represents a missing value.
# It is not valid as a type.
NullType = Any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can just have the special NULL scalar singleton be of type Scalar instead of a special type here? Not sure what NULL.dtype would yield though since we don't have a null or empty dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, brilliant point, thanks

I guess it could just return None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from #308 (comment) , I think we may need to end up with NullScalar

Expand Down