Skip to content

WIP: prototype for unit support #10349 #17153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
245 changes: 245 additions & 0 deletions pandas/core/dimensioned.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
import numpy as np

from pandas.core.base import (PandasObject)
from pandas.util._decorators import cache_readonly
from pandas import compat
from pandas.core.common import is_null_slice


class Dimensional(PandasObject):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is meant to be a dimensioned scalar?

Copy link
Contributor Author

@Bernhard10 Bernhard10 Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ment to be an array with units. (I did not find a better word for "quantity with unit" than dimensional - see https://english.stackexchange.com/a/48069)
It stores the data as float array, but has its own dtype. (Modelled after the Categorical class)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

units is a property of the dtype, NOT the array itself. You don't need another array-like class. Just a proper dtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that. But I didn't get it to work properly. I guess I don't know enough about the internals of pandas to get it to work.

Copy link
Contributor Author

@Bernhard10 Bernhard10 Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy did not allow me to create an array with my costum dtype. Maybe I missed something in the dtype class construction.
Numpy gives me TypeError: data type not understood, which is desired according to test_dtypes.Base.test_numpy_informed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we barely use numpy except to hold plain vanilla things. The way to handle this is to simply make a regular float Series with a parameterized dtype of the unit. These would be constructed similar to DatetimeTZDtype, maybe float64['ergs'] or whatever The machinery is already in place to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. What you said is exactly what I planned to do, but I somehow don't get it how to do it.

The problem is that Series does not seem to habe a dtype other than the numpy array's dtype:
If I understood the code correctly, then the following happens:

If I do Series([1,2,3], dtype=DimensionedFloatDtype("meter")), then the code
hits data = _sanitize_array(data, index, dtype, copy, raise_cast_failure=True) which returns an np.array.
This array gets passed into the SingleBlockManager which is passed to NDFrame.__init__, where it is stored as _data.

Now Series.dtype is a property that just retrieves the dtype from the SingleBlockManager (which was constructed with a np.array, and according to _interleaved_dtype(blocks) disallows ExtensionDtypes).

The way DatetimeTZDtype does it is by creating a pandas.core.indexes.DatetimeIndex, categorical uses Categorical to mimic an array.

I'm sorry if I missed something obvious, but I cannot see how to do this without the need for a wrapper class that holds my float data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other option I see would be to explicitly store the dtype in the Series object, but that would be a change that potentially affects things unrelated to numbers with units.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to step thru code where we use an existing dtype and see how its done. its not easy but all is there. start with simple tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get it.
I looked at DatetimeTZDtype, but it has its own block type DatetimeTZBlock.
I looked at categorical data, it has its own class Categorical(PandasObject).
I looked at the PeriodDtype, but it is only used for indexes and I cannot instantiate a Series like pd.Series([1,2,3], dtype="period[S]").

I'm sorry if I overlooked it, but if it is all there, I just don't see where it is.
I can create the Dtype easily enough, I just can't get it to integrate well with pd.Series.

"""

__array_priority__ = 10
_typ = 'dimensional'

def __init__(self, values, dtype):
# TODO: Sanitize
self.values = values
self.dtype = dtype

@property
def _constructor(self):
return Dimensional

def copy(self):
""" Copy constructor. """
return self._constructor(self.values.copy(), self.dtype)

def astype(self, dtype, copy=True):
"""
Coerce this type to another dtype
"""
return np.array(self, dtype=dtype, copy=copy)

@cache_readonly
def ndim(self):
"""Number of dimensions """
return self.values.ndim

@cache_readonly
def size(self):
""" return the len of myself """
return len(self)

@property
def base(self):
""" compat, we are always our own object """
return None

# for Series/ndarray like compat
@property
def shape(self):
""" Shape of the Categorical.

For internal compatibility with numpy arrays.

Returns
-------
shape : tuple
"""
return tuple([len(self.values)])

def __array__(self, dtype=None):
"""
The numpy array interface.

Returns
-------
values : numpy array
A numpy array of either the specified dtype or,
if dtype==None (default), the same dtype as
categorical.categories.dtype
"""
if dtype:
return np.asarray(self.values, dtype)
return self.values

@property
def T(self):
return self

def isna(self):
raise NotImplementedError
isnull = isna

def notna(self):
"""
Inverse of isna

Both missing values (-1 in .codes) and NA as a category are detected as
null.

Returns
-------
a boolean array of whether my values are not null

See also
--------
notna : top-level notna
notnull : alias of notna
Categorical.isna : boolean inverse of Categorical.notna

"""
return ~self.isna()
notnull = notna

def put(self, *args, **kwargs):
"""
Replace specific elements in the Categorical with given values.
"""
raise NotImplementedError(("'put' is not yet implemented "
"for Categorical"))

def dropna(self):
raise NotImplementedError

def get_values(self):
""" Return the values.

For internal compatibility with pandas formatting.

Returns
-------
values : numpy array
A numpy array of the same dtype as categorical.categories.dtype or
Index if datetime / periods
"""
return np.array(self)

def ravel(self, order='C'):
""" Return a flattened (numpy) array.

For internal compatibility with numpy arrays.

Returns
-------
raveled : numpy array
"""
return np.array(self)

def view(self):
"""Return a view of myself.

For internal compatibility with numpy arrays.

Returns
-------
view : Categorical
Returns `self`!
"""
return self

def to_dense(self):
"""Return my 'dense' representation

For internal compatibility with numpy arrays.

Returns
-------
dense : array
"""
return np.asarray(self)

def fillna(self, value=None, method=None, limit=None):
""" Fill NA/NaN values using the specified method.

Parameters
----------
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap
value : scalar
Value to use to fill holes (e.g. 0)
limit : int, default None
(Not implemented yet for Categorical!)
If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled.

Returns
-------
filled : Categorical with NA/NaN filled
"""
raise NotImplementedError

def _slice(self, slicer):
""" Return a slice of myself.

For internal compatibility with numpy arrays.
"""

# only allow 1 dimensional slicing, but can
# in a 2-d case be passd (slice(None),....)
if isinstance(slicer, tuple) and len(slicer) == 2:
if not is_null_slice(slicer[0]):
raise AssertionError("invalid slicing for a 1-ndim "
"categorical")
slicer = slicer[1]

return self._constructor(self.values[slicer], self.dtype)

def __len__(self):
"""The length of this Categorical."""
return len(self.values)

def __iter__(self):
"""Returns an Iterator over the values of this Categorical."""
return iter(self.get_values())

def _tidy_repr(self, max_vals=10, footer=True):
""" a short repr displaying only max_vals and an optional (but default
footer)
"""
num = max_vals // 2
head = self[:num]._get_repr(length=False, footer=False)
tail = self[-(max_vals - num):]._get_repr(length=False, footer=False)

result = '%s, ..., %s' % (head[:-1], tail[1:])
if footer:
result = '%s\n%s' % (result, self._repr_footer())

return compat.text_type(result)

def _repr_footer(self):
return 'Length: %d' % (len(self))

def _get_repr(self, length=True, na_rep='NaN', footer=True):
return "Dimensional {}".format(self.__array__())
# TODO: Implement properly

def __unicode__(self):
""" Unicode representation. """
# TODO: implement
return self._tidy_repr()

def __getitem__(self, key):
""" Return an item. """
return Dimensional(values=self.values[key], dtype=self.dtype)

def __setitem__(self, key, value):
raise NotImplementedError
26 changes: 25 additions & 1 deletion pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
PeriodDtype, PeriodDtypeType,
IntervalDtype, IntervalDtypeType,
ExtensionDtype)
from .units import DimensionedFloatDtype, DimensionedFloatDtypeType
from .generic import (ABCCategorical, ABCPeriodIndex,
ABCDatetimeIndex, ABCSeries,
ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex,
Expand Down Expand Up @@ -508,6 +509,12 @@ def is_categorical_dtype(arr_or_dtype):
return CategoricalDtype.is_dtype(arr_or_dtype)


def is_dimensionedFloat_dtype(arr_or_dtype):
if arr_or_dtype is None:
return False
return DimensionedFloatDtype.is_dtype(arr_or_dtype)


def is_string_dtype(arr_or_dtype):
"""
Check whether the provided array or dtype is of the string dtype.
Expand Down Expand Up @@ -686,7 +693,6 @@ def is_dtype_equal(source, target):
target = _get_dtype(target)
return source == target
except (TypeError, AttributeError):

# invalid comparison
# object == category will hit this
return False
Expand Down Expand Up @@ -1615,6 +1621,8 @@ def is_extension_type(arr):
return True
elif is_datetimetz(arr):
return True
elif is_dimensionedFloat_dtype(arr):
return True
return False


Expand Down Expand Up @@ -1717,6 +1725,9 @@ def _get_dtype(arr_or_dtype):
return arr_or_dtype
elif isinstance(arr_or_dtype, IntervalDtype):
return arr_or_dtype
elif isinstance(arr_or_dtype, DimensionedFloatDtype):
return arr_or_dtype

elif isinstance(arr_or_dtype, string_types):
if is_categorical_dtype(arr_or_dtype):
return CategoricalDtype.construct_from_string(arr_or_dtype)
Expand All @@ -1726,6 +1737,8 @@ def _get_dtype(arr_or_dtype):
return PeriodDtype.construct_from_string(arr_or_dtype)
elif is_interval_dtype(arr_or_dtype):
return IntervalDtype.construct_from_string(arr_or_dtype)
elif arr_or_dtype.startswith("dimensionedFloat"):
return DimensionedFloatDtype.construct_from_string(arr_or_dtype)
elif isinstance(arr_or_dtype, (ABCCategorical, ABCCategoricalIndex)):
return arr_or_dtype.dtype

Expand Down Expand Up @@ -1762,6 +1775,8 @@ def _get_dtype_type(arr_or_dtype):
return IntervalDtypeType
elif isinstance(arr_or_dtype, PeriodDtype):
return PeriodDtypeType
elif isinstance(arr_or_dtype, DimensionedFloatDtype):
return DimensionedFloatDtypeType
elif isinstance(arr_or_dtype, string_types):
if is_categorical_dtype(arr_or_dtype):
return CategoricalDtypeType
Expand All @@ -1771,6 +1786,8 @@ def _get_dtype_type(arr_or_dtype):
return PeriodDtypeType
elif is_interval_dtype(arr_or_dtype):
return IntervalDtypeType
elif arr_or_dtype.startswith("dimensionedFloat"):
return DimensionedFloatDtypeType
return _get_dtype_type(np.dtype(arr_or_dtype))
try:
return arr_or_dtype.dtype.type
Expand Down Expand Up @@ -1879,6 +1896,8 @@ def pandas_dtype(dtype):

if isinstance(dtype, DatetimeTZDtype):
return dtype
elif isinstance(dtype, DimensionedFloatDtype):
return dtype
elif isinstance(dtype, PeriodDtype):
return dtype
elif isinstance(dtype, CategoricalDtype):
Expand All @@ -1904,6 +1923,11 @@ def pandas_dtype(dtype):
except TypeError:
pass

elif dtype.startswith('dimensionedFloat['):
try:
return DimensionedFloatDtype.construct_from_string(dtype)
except TypeError:
pass
try:
return CategoricalDtype.construct_from_string(dtype)
except TypeError:
Expand Down
Loading