-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
WIP: prototype for unit support #10349 #17153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @Bernhard10! Thanks for updating the PR.
Comment last updated on August 03, 2017 at 10:49 Hours UTC |
pandas/core/dimensioned.py
Outdated
|
||
|
||
class Dimensional(PandasObject): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is meant to be a dimensioned scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ment to be an array with units. (I did not find a better word for "quantity with unit" than dimensional - see https://english.stackexchange.com/a/48069)
It stores the data as float array, but has its own dtype. (Modelled after the Categorical
class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
units is a property of the dtype, NOT the array itself. You don't need another array-like class. Just a proper dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that. But I didn't get it to work properly. I guess I don't know enough about the internals of pandas to get it to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy did not allow me to create an array with my costum dtype. Maybe I missed something in the dtype class construction.
Numpy gives me TypeError: data type not understood
, which is desired according to test_dtypes.Base.test_numpy_informed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we barely use numpy except to hold plain vanilla things. The way to handle this is to simply make a regular float Series with a parameterized dtype of the unit. These would be constructed similar to DatetimeTZDtype
, maybe float64['ergs']
or whatever The machinery is already in place to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. What you said is exactly what I planned to do, but I somehow don't get it how to do it.
The problem is that Series
does not seem to habe a dtype other than the numpy array's dtype:
If I understood the code correctly, then the following happens:
If I do Series([1,2,3], dtype=DimensionedFloatDtype("meter"))
, then the code
hits data = _sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
which returns an np.array
.
This array gets passed into the SingleBlockManager which is passed to NDFrame.__init__
, where it is stored as _data
.
Now Series.dtype
is a property that just retrieves the dtype from the SingleBlockManager (which was constructed with a np.array, and according to _interleaved_dtype(blocks)
disallows ExtensionDtypes).
The way DatetimeTZDtype does it is by creating a pandas.core.indexes.DatetimeIndex
, categorical uses Categorical
to mimic an array.
I'm sorry if I missed something obvious, but I cannot see how to do this without the need for a wrapper class that holds my float data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other option I see would be to explicitly store the dtype in the Series object, but that would be a change that potentially affects things unrelated to numbers with units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to step thru code where we use an existing dtype and see how its done. its not easy but all is there. start with simple tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get it.
I looked at DatetimeTZDtype
, but it has its own block type DatetimeTZBlock
.
I looked at categorical data, it has its own class Categorical(PandasObject)
.
I looked at the PeriodDtype, but it is only used for indexes and I cannot instantiate a Series like pd.Series([1,2,3], dtype="period[S]")
.
I'm sorry if I overlooked it, but if it is all there, I just don't see where it is.
I can create the Dtype easily enough, I just can't get it to integrate well with pd.Series.
If I am allowed to store the dtype in the Series class (for some reason I assumed I was not allowed to do this), then everything is indeed there already in pandas. The latest commit stored the dtype directly in Series and works straight forward now. |
The next step would be to remove the direct dependency on pint and make it generic. |
…aries can interface with pandas dtypes.
With Series storing the dtype, everything else seems to perfectly fall in place. The current state of the code has (currently) nothing to do with units, but provides a framework how The idea of There is still some work to do: *) this works only with Series, not with SparseSeries etc. Unfortunately I currently have no time to complete this pull-request, but I will continue to work on it in September, if the basic direction of where this is going is deemed fitting for pandas. Then I could also provide a sample implementation for the use with Pint. |
@Bernhard10 welcome to have this, though this would require extensive effort / testing, e.g. see for example #16015 |
@Bernhard10 good effort here. going to close. but if you would like to continue working you certainly can. This would need extensive testing in order to be accepted I think. |
Thanks for the feedback and sorry for the delayed response. I hope I will find time to continue working on this. |
Trying to follow @jreback idea of adding a dtype for holding the unit.
Related to issue #10349
git diff upstream/master -u -- "*.py" | flake8 --diff
(except for one warning)