-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
WIP: prototype for unit support #10349 #17153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 6 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
517f068
Started to implement extentionDtype for units
Bernhard10 10e04d6
Added files missing from prev. commit
Bernhard10 59daf76
Basic concept seems to work
Bernhard10 cfa7d43
Removed debug-print statements
Bernhard10 586fd2d
Added missing file
Bernhard10 541cc8d
pep8
Bernhard10 7bfbee2
Added pint to travis requirenments. (In the future, we need to remove…
Bernhard10 b71b77b
Let Series store the dtype and get rig of Dimensional class
Bernhard10 950d922
Restructured Code. Now providing a framework with which external libr…
Bernhard10 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
import numpy as np | ||
|
||
from pandas.core.base import (PandasObject) | ||
from pandas.util._decorators import cache_readonly | ||
from pandas import compat | ||
from pandas.core.common import is_null_slice | ||
|
||
|
||
class Dimensional(PandasObject): | ||
""" | ||
""" | ||
|
||
__array_priority__ = 10 | ||
_typ = 'dimensional' | ||
|
||
def __init__(self, values, dtype): | ||
# TODO: Sanitize | ||
self.values = values | ||
self.dtype = dtype | ||
|
||
@property | ||
def _constructor(self): | ||
return Dimensional | ||
|
||
def copy(self): | ||
""" Copy constructor. """ | ||
return self._constructor(self.values.copy(), self.dtype) | ||
|
||
def astype(self, dtype, copy=True): | ||
""" | ||
Coerce this type to another dtype | ||
""" | ||
return np.array(self, dtype=dtype, copy=copy) | ||
|
||
@cache_readonly | ||
def ndim(self): | ||
"""Number of dimensions """ | ||
return self.values.ndim | ||
|
||
@cache_readonly | ||
def size(self): | ||
""" return the len of myself """ | ||
return len(self) | ||
|
||
@property | ||
def base(self): | ||
""" compat, we are always our own object """ | ||
return None | ||
|
||
# for Series/ndarray like compat | ||
@property | ||
def shape(self): | ||
""" Shape of the Categorical. | ||
|
||
For internal compatibility with numpy arrays. | ||
|
||
Returns | ||
------- | ||
shape : tuple | ||
""" | ||
return tuple([len(self.values)]) | ||
|
||
def __array__(self, dtype=None): | ||
""" | ||
The numpy array interface. | ||
|
||
Returns | ||
------- | ||
values : numpy array | ||
A numpy array of either the specified dtype or, | ||
if dtype==None (default), the same dtype as | ||
categorical.categories.dtype | ||
""" | ||
if dtype: | ||
return np.asarray(self.values, dtype) | ||
return self.values | ||
|
||
@property | ||
def T(self): | ||
return self | ||
|
||
def isna(self): | ||
raise NotImplementedError | ||
isnull = isna | ||
|
||
def notna(self): | ||
""" | ||
Inverse of isna | ||
|
||
Both missing values (-1 in .codes) and NA as a category are detected as | ||
null. | ||
|
||
Returns | ||
------- | ||
a boolean array of whether my values are not null | ||
|
||
See also | ||
-------- | ||
notna : top-level notna | ||
notnull : alias of notna | ||
Categorical.isna : boolean inverse of Categorical.notna | ||
|
||
""" | ||
return ~self.isna() | ||
notnull = notna | ||
|
||
def put(self, *args, **kwargs): | ||
""" | ||
Replace specific elements in the Categorical with given values. | ||
""" | ||
raise NotImplementedError(("'put' is not yet implemented " | ||
"for Categorical")) | ||
|
||
def dropna(self): | ||
raise NotImplementedError | ||
|
||
def get_values(self): | ||
""" Return the values. | ||
|
||
For internal compatibility with pandas formatting. | ||
|
||
Returns | ||
------- | ||
values : numpy array | ||
A numpy array of the same dtype as categorical.categories.dtype or | ||
Index if datetime / periods | ||
""" | ||
return np.array(self) | ||
|
||
def ravel(self, order='C'): | ||
""" Return a flattened (numpy) array. | ||
|
||
For internal compatibility with numpy arrays. | ||
|
||
Returns | ||
------- | ||
raveled : numpy array | ||
""" | ||
return np.array(self) | ||
|
||
def view(self): | ||
"""Return a view of myself. | ||
|
||
For internal compatibility with numpy arrays. | ||
|
||
Returns | ||
------- | ||
view : Categorical | ||
Returns `self`! | ||
""" | ||
return self | ||
|
||
def to_dense(self): | ||
"""Return my 'dense' representation | ||
|
||
For internal compatibility with numpy arrays. | ||
|
||
Returns | ||
------- | ||
dense : array | ||
""" | ||
return np.asarray(self) | ||
|
||
def fillna(self, value=None, method=None, limit=None): | ||
""" Fill NA/NaN values using the specified method. | ||
|
||
Parameters | ||
---------- | ||
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None | ||
Method to use for filling holes in reindexed Series | ||
pad / ffill: propagate last valid observation forward to next valid | ||
backfill / bfill: use NEXT valid observation to fill gap | ||
value : scalar | ||
Value to use to fill holes (e.g. 0) | ||
limit : int, default None | ||
(Not implemented yet for Categorical!) | ||
If method is specified, this is the maximum number of consecutive | ||
NaN values to forward/backward fill. In other words, if there is | ||
a gap with more than this number of consecutive NaNs, it will only | ||
be partially filled. If method is not specified, this is the | ||
maximum number of entries along the entire axis where NaNs will be | ||
filled. | ||
|
||
Returns | ||
------- | ||
filled : Categorical with NA/NaN filled | ||
""" | ||
raise NotImplementedError | ||
|
||
def _slice(self, slicer): | ||
""" Return a slice of myself. | ||
|
||
For internal compatibility with numpy arrays. | ||
""" | ||
|
||
# only allow 1 dimensional slicing, but can | ||
# in a 2-d case be passd (slice(None),....) | ||
if isinstance(slicer, tuple) and len(slicer) == 2: | ||
if not is_null_slice(slicer[0]): | ||
raise AssertionError("invalid slicing for a 1-ndim " | ||
"categorical") | ||
slicer = slicer[1] | ||
|
||
return self._constructor(self.values[slicer], self.dtype) | ||
|
||
def __len__(self): | ||
"""The length of this Categorical.""" | ||
return len(self.values) | ||
|
||
def __iter__(self): | ||
"""Returns an Iterator over the values of this Categorical.""" | ||
return iter(self.get_values()) | ||
|
||
def _tidy_repr(self, max_vals=10, footer=True): | ||
""" a short repr displaying only max_vals and an optional (but default | ||
footer) | ||
""" | ||
num = max_vals // 2 | ||
head = self[:num]._get_repr(length=False, footer=False) | ||
tail = self[-(max_vals - num):]._get_repr(length=False, footer=False) | ||
|
||
result = '%s, ..., %s' % (head[:-1], tail[1:]) | ||
if footer: | ||
result = '%s\n%s' % (result, self._repr_footer()) | ||
|
||
return compat.text_type(result) | ||
|
||
def _repr_footer(self): | ||
return 'Length: %d' % (len(self)) | ||
|
||
def _get_repr(self, length=True, na_rep='NaN', footer=True): | ||
return "Dimensional {}".format(self.__array__()) | ||
# TODO: Implement properly | ||
|
||
def __unicode__(self): | ||
""" Unicode representation. """ | ||
# TODO: implement | ||
return self._tidy_repr() | ||
|
||
def __getitem__(self, key): | ||
""" Return an item. """ | ||
return Dimensional(values=self.values[key], dtype=self.dtype) | ||
|
||
def __setitem__(self, key, value): | ||
raise NotImplementedError |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is meant to be a dimensioned scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ment to be an array with units. (I did not find a better word for "quantity with unit" than dimensional - see https://english.stackexchange.com/a/48069)
It stores the data as float array, but has its own dtype. (Modelled after the
Categorical
class)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
units is a property of the dtype, NOT the array itself. You don't need another array-like class. Just a proper dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that. But I didn't get it to work properly. I guess I don't know enough about the internals of pandas to get it to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy did not allow me to create an array with my costum dtype. Maybe I missed something in the dtype class construction.
Numpy gives me
TypeError: data type not understood
, which is desired according totest_dtypes.Base.test_numpy_informed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we barely use numpy except to hold plain vanilla things. The way to handle this is to simply make a regular float Series with a parameterized dtype of the unit. These would be constructed similar to
DatetimeTZDtype
, maybefloat64['ergs']
or whatever The machinery is already in place to do this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. What you said is exactly what I planned to do, but I somehow don't get it how to do it.
The problem is that
Series
does not seem to habe a dtype other than the numpy array's dtype:If I understood the code correctly, then the following happens:
If I do
Series([1,2,3], dtype=DimensionedFloatDtype("meter"))
, then the codehits
data = _sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
which returns annp.array
.This array gets passed into the SingleBlockManager which is passed to
NDFrame.__init__
, where it is stored as_data
.Now
Series.dtype
is a property that just retrieves the dtype from the SingleBlockManager (which was constructed with a np.array, and according to_interleaved_dtype(blocks)
disallows ExtensionDtypes).The way DatetimeTZDtype does it is by creating a
pandas.core.indexes.DatetimeIndex
, categorical usesCategorical
to mimic an array.I'm sorry if I missed something obvious, but I cannot see how to do this without the need for a wrapper class that holds my float data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other option I see would be to explicitly store the dtype in the Series object, but that would be a change that potentially affects things unrelated to numbers with units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to step thru code where we use an existing dtype and see how its done. its not easy but all is there. start with simple tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get it.
I looked at
DatetimeTZDtype
, but it has its own block typeDatetimeTZBlock
.I looked at categorical data, it has its own
class Categorical(PandasObject)
.I looked at the PeriodDtype, but it is only used for indexes and I cannot instantiate a Series like
pd.Series([1,2,3], dtype="period[S]")
.I'm sorry if I overlooked it, but if it is all there, I just don't see where it is.
I can create the Dtype easily enough, I just can't get it to integrate well with pd.Series.