You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assume we have a Categorical, and want to convert to a dense array (not encoded). We have np.asarray(..) and the to_dense() method (which uses asarray under the hood):
In [1]: cat = pd.Categorical(['a', 'b', 'a'])
In [2]: np.asarray(cat)
Out[2]: array(['a', 'b', 'a'], dtype=object)
In [3]: cat.to_dense()
Out[3]: array(['a', 'b', 'a'], dtype=object)
In addition, we also have get_values:
In [4]: cat.get_values()
Out[4]: array(['a', 'b', 'a'], dtype=object)
get_values is mostly the same, with the exception that returns an Index for datetime/period/timedelta, and an object array for integers if there are missing values instead of float array:
In [10]: cat = pd.Categorical(pd.date_range("2012", periods=3))
In [11]: cat.to_dense()
Out[11]:
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
'2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
In [12]: cat.get_values()
Out[12]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='D')
With the result that it preserves somewhat more the dtype (although only specifically for datetime-like, it will not do it for any EA)
While looking into the deprecation of get_values (#26409), I was wondering: do we want some method to actually get a "dense" version of the array, but with the exact same dtype? (so returning an EA in case the categories have an extension dtype)
And should we deprecate to_dense() ?
The text was updated successfully, but these errors were encountered:
I'd be +1 to a method that so that we can roundtrip and other ExtensionArrays. It could be useful simplify going to/from MultiIndex and DataFrame colunms.
to_dense is a wrapper around np. asarray so is quite useless. I'm +1 to deprecate that.
I'm on board for getting rid of Categorical.to_dense in its current form. If deprecation cycles weren't an issue, I would actually be inclined to alias to_dense to _internal_get_values, since to_dense is a more descriptive name, and that matches the SparseArray behavior.
FWIW in the ongoing work to get rid of _internal_get_values, it looks like Categorical is going to be the sticking point.
I see you deprecated Categorical.to_dense (#32639).
I was just thinking, if we find to_dense the best name for this "get densified values" (I am not yet fully sure about it, but I also can't think of a better alternative), we can actually keep the method but only deprecate the default behaviour.
More specifically, we could introduce a new keyword to get the new behaviour (returning a densified array, potentially extension array), and only raise the deprecation warning if that keyword is not specified (and then internally we just have to always use the keyword).
Assume we have a Categorical, and want to convert to a dense array (not encoded). We have
np.asarray(..)
and theto_dense()
method (which uses asarray under the hood):In addition, we also have
get_values
:get_values
is mostly the same, with the exception that returns an Index for datetime/period/timedelta, and an object array for integers if there are missing values instead of float array:With the result that it preserves somewhat more the dtype (although only specifically for datetime-like, it will not do it for any EA)
While looking into the deprecation of
get_values
(#26409), I was wondering: do we want some method to actually get a "dense" version of the array, but with the exact same dtype? (so returning an EA in case the categories have an extension dtype)And should we deprecate
to_dense()
?The text was updated successfully, but these errors were encountered: