Skip to content

Add some more dtypes: Date, Datetime, Duration, String #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Sep 30, 2023
6 changes: 5 additions & 1 deletion spec/API_specification/dataframe_api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""
from __future__ import annotations

from typing import Mapping, Sequence, Any, TYPE_CHECKING
from typing import Mapping, Sequence, Any, Literal, TYPE_CHECKING

from .column_object import *
from .dataframe_object import DataFrame
Expand Down Expand Up @@ -35,6 +35,10 @@
"Float64",
"Float32",
"Bool",
"Date",
"Datetime",
"Duration",
"String",
"is_dtype",
]

Expand Down
23 changes: 20 additions & 3 deletions spec/API_specification/dataframe_api/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@
UInt32,
UInt16,
UInt8,
Date,
Datetime,
String,
)

DType = Union[Bool, Float64, Float32, Int64, Int32, Int16, Int8, UInt64, UInt32, UInt16, UInt8]
Expand All @@ -57,14 +60,16 @@ def Column() -> ColumnType:
...

@staticmethod
def Int64() -> Int64:...
@staticmethod
def Int16() -> Int16:...
def Int64() -> Int64:
...

@staticmethod
def Int32() -> Int32:
...

@staticmethod
def Int16() -> Int16:
...

@staticmethod
def Int8() -> Int8:
Expand Down Expand Up @@ -98,6 +103,18 @@ def Float32() -> Float32:
def Bool() -> Bool:
...

@staticmethod
def Date() -> Date:
...

@staticmethod
def Datetime(time_unit: Literal['ms', 'us'], time_zone: str | None) -> Datetime:
...

@staticmethod
def String() -> String:
...

@staticmethod
def concat(dataframes: Sequence[DataFrameType]) -> DataFrameType:
...
Expand Down
35 changes: 35 additions & 0 deletions spec/API_specification/dataframe_api/dtypes.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
from typing import Literal


class Int64:
"""Integer type with 64 bits of precision."""

Expand Down Expand Up @@ -31,3 +34,35 @@ class Float32:
class Bool:
"""Boolean type with 8 bits of precision."""

class Date:
"""
Date type.

There is no guarantee about the range of dates available.
"""
Comment on lines +39 to +44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this date supposed to be represented? (Keith asked the same question at the time of review (#197 (comment)), but resolved the comment without that there was any answer).

I certainly understand that we can say there is no guarantee about the range of dates that is supported by each library, but we still need to specify how to interpret the data. This is an interchange protocol where users access the buffers, not a standard API where only behaviour matters. So what does the buffer of (supposingly) integers mean here?

Is it like a numpy datetime64[D]?
For example Arrow supports integers representing both days or milliseconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, sorry ignore my comment, I thought I was commenting on a PR for the interchange protocol, but this is for the standard API ;) So here of course only behaviour matters, and the underlying buffers are an implementation detail of the library.

Was confused because we linked to this from a discussion about supporting "date" in the interchange protocol in the pyarrow implementation.


class Datetime:
"""
Datetime type.

Attributes
----------
time_unit : Literal['ms', 'us']
Precision of the datetime type. There is no guarantee that the full
range of dates available for the specified precision is supported.
time_zone : str | None
Time zone of the datetime type. Only IANA time zones are supported.
`None` indicates time-zone-naive data.
"""
def __init__(self, *, time_unit: Literal['ms', 'us'], time_zone: str | None):
...

time_unit: Literal['ms', 'us']
time_zone: str | None # Only IANA time zones are supported

class Duration:
"""Duration type."""
time_unit: Literal['ms', 'us']

class String:
"""String type."""
4 changes: 4 additions & 0 deletions spec/API_specification/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ of objects and functions in the top-level namespace. The latter are:
Float64
Float32
Bool
Date
Datetime
Duration
String
is_dtype
column_from_sequence
column_from_1d_array
Expand Down