-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: remove special-casing for internal EAs from format_array #26965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: remove special-casing for internal EAs from format_array #26965
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments (didn't look in detail yet). This can certainly be a nice clean-up, but I don't think it closes the discussion in #26837
@@ -881,7 +881,7 @@ def _formatter( | |||
``boxed=True``. | |||
""" | |||
if boxed: | |||
return str | |||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can None ever be a correct formatter function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to pass nested values through unchanged to allow custom formatting options on the underlying floats
>>> import scipy.sparse
>>> from pandas import SparseSeries
>>> sparse = SparseSeries.from_coo(scipy.sparse.rand(350, 18))
C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\sparse\scipy_sparse.py:151: FutureWarning: Series.to_sparse is deprecated and will be removed in a future version
s = s.to_sparse() # TODO: specify kind?
>>> sparse._formatting_values()._formatting_values()
array([0.1057938 , 0.12346779, 0.66926098, 0.63135293, 0.51537009,
0.83084751, 0.11707738, 0.94582471, 0.53306677, 0.76971943,
0.43649237, 0.21300667, 0.36538291, 0.75462779, 0.2525123 ,
0.97204424, 0.62077108, 0.77137921, 0.77416063, 0.06010261,
0.0758256 , 0.81359155, 0.493831 , 0.86517892, 0.99948485,
0.16560856, 0.10064932, 0.2234172 , 0.37112586, 0.35745351,
0.03196741, 0.99648038, 0.39904145, 0.17162981, 0.98518366,
0.85796259, 0.91980664, 0.78111288, 0.00913085, 0.61513879,
0.68807133, 0.13308309, 0.99738934, 0.41772456, 0.30984901,
0.23902977, 0.52296826, 0.90989797, 0.25194294, 0.28347783,
0.76426202, 0.68137012, 0.00918782, 0.6749851 , 0.3858443 ,
0.82595983, 0.72062635, 0.63630261, 0.13853372, 0.40467435,
0.03361887, 0.90591611, 0.50676725])
>>>
>>> sparse._formatting_values()._formatter
<bound method SparseArray._formatter of [0.10579380195508936, 0.123467793554345, 0.6692609807557325, 0.631352928127322, 0.5153700914920036, 0.8308475120205755, 0.11707738027292702, 0.9458247148462586, 0.53306677
34689285, 0.7697194280043602, 0.436492370827794, 0.21300667109238547, 0.3653829053917591, 0.7546277896220214, 0.2525123038343048, 0.9720442422636566, 0.62077107713076, 0.7713792114209488, 0.7741606333773862, 0.0
60102611123280636, 0.07582560258774373, 0.8135915546849382, 0.4938310020146096, 0.865178922460138, 0.9994848548516182, 0.16560855997590118, 0.10064932274817606, 0.22341720411731247, 0.3711258583673639, 0.3574535
112638453, 0.03196740969420808, 0.9964803793350895, 0.399041447277493, 0.17162981330873828, 0.985183664900777, 0.8579625879222734, 0.9198066431764218, 0.7811128816049261, 0.009130849663441576, 0.6151387882173937
, 0.6880713270054183, 0.13308308845790984, 0.997389339850816, 0.4177245615919253, 0.3098490139048895, 0.23902976566001577, 0.5229682615072562, 0.9098979745269118, 0.25194293787078936, 0.28347782689329026, 0.7642
620222948321, 0.6813701212402914, 0.009187820750616082, 0.6749850993894757, 0.3858442970367967, 0.8259598335077263, 0.7206263455689199, 0.6363026132729022, 0.1385337245480187, 0.4046743457416371, 0.0336188668302
50727, 0.905916112003116, 0.5067672489799548]
Fill: nan
BlockIndex
Block locations: array([0])
Block lengths: array([63])>
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the example you show.
But in any case, I would argue that such a fix should not be in the base array, but only in those EAs that need it? (but maybe that doesn't make sense, as I don't really understand the fix :-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the formatting is now recursive. so multiple formatters can be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be a better example of recursing until a numpy array returned.
>>> from pandas import period_range, Categorical
>>> idx = period_range('2011-01-01 09:00', freq='H', periods=5)
>>> c = Categorical(idx, ordered=True)
>>> exp = """[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00]
... Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 <
... 2011-01-01 13:00]""" # noqa
>>>
>>> c
[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00]
Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 <
2011-01-01 13:00]
>>>
>>>
>>> type(c)
<class 'pandas.core.arrays.categorical.Categorical'>
>>>
>>> c._formatting_values()
PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00',
'2011-01-01 12:00', '2011-01-01 13:00'],
dtype='period[H]', freq='H')
>>>
>>> type(c._formatting_values())
<class 'pandas.core.indexes.period.PeriodIndex'>
>>>
>>> c._formatting_values().dtype
period[H]
>>>
>>> type(c._formatter(boxed=True))
<class 'NoneType'>
>>>
>>> c._formatting_values()._formatting_values()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'PeriodIndex' object has no attribute '_formatting_values'
>>>
>>> type(c.get_values())
<class 'pandas.core.indexes.period.PeriodIndex'>
>>>
>>> c.get_values()
PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00',
'2011-01-01 12:00', '2011-01-01 13:00'],
dtype='period[H]', freq='H')
>>>
>>> c.get_values()._values
<PeriodArray>
['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00',
'2011-01-01 12:00', '2011-01-01 13:00']
Length: 5, dtype: period[H]
>>>
>>> c.get_values()._values._formatter
<bound method PeriodArray._formatter of <PeriodArray>
['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00',
'2011-01-01 12:00', '2011-01-01 13:00']
Length: 5, dtype: period[H]>
>>>
>>> c.get_values()._values._formatting_values()
array([Period('2011-01-01 09:00', 'H'), Period('2011-01-01 10:00', 'H'),
Period('2011-01-01 11:00', 'H'), Period('2011-01-01 12:00', 'H'),
Period('2011-01-01 13:00', 'H')], dtype=object)
>>>
@@ -457,6 +457,9 @@ def _formatter(self, boxed=False): | |||
# Defer to CategoricalFormatter's formatter. | |||
return None | |||
|
|||
def _formatting_values(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_formatting_values
is currently deprecated on EAs, so why was this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to support the recursive behaviour and remove the special casing.
Codecov Report
@@ Coverage Diff @@
## master #26965 +/- ##
==========================================
- Coverage 91.87% 91.86% -0.01%
==========================================
Files 180 180
Lines 50746 50743 -3
==========================================
- Hits 46623 46616 -7
- Misses 4123 4127 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26965 +/- ##
==========================================
- Coverage 91.87% 90.46% -1.42%
==========================================
Files 180 180
Lines 50746 50741 -5
==========================================
- Hits 46623 45902 -721
- Misses 4123 4839 +716
Continue to review full report at Codecov.
|
Sorry, that is still too cryptic for me. Can you try to explain what is going on? |
So this is specific for Categorical ? Where the actual values (whether they be string, datetime, etc) can still format themselves? |
if the special-casing is removed we can close that issue. New issues can be raised for the other points being raised during that discussion. |
I personally think we need a better reason to un-deprecate |
these are internal EAs. EA authors may need similar nesting functionality. Internal EAs should not be special cased and use an interface available to all. |
@jorisvandenbossche do you want to move the discussion back to the issue? This PR is a proof of concept and will obviously require further discussion and agreement before _formatting_values is undeprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice code removal; do we need some tests around _get_formatted_values()?
@@ -883,14 +880,34 @@ def format_array(values, formatter, float_format=None, na_rep='NaN', | |||
List[str] | |||
""" | |||
|
|||
def _get_formatted_values(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type & add a doc-string
except AttributeError: | ||
formatter = None | ||
|
||
def _format_values(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a module level function (with types & doc-string), maybe _format_as_array
|
||
try: | ||
values = values._formatting_values() | ||
return _format_values(_get_formatted_values(values)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the try/except just be around values = values._formatting_values()
?
@simonjayhawkins can you rebase |
@jbrockmendel I don't think is ready for getting merged, see above discussion. |
closing for now |
git diff upstream/master -u -- "*.py" | flake8 --diff
cc @jorisvandenbossche
changes to tests for spacing before Datetime64TZ, which now have additional space.