REF: remove special-casing for internal EAs from format_array #26965

simonjayhawkins · 2019-06-20T12:58:31Z

closes Refactor format_array in pandas\io\formats\format.py #26837
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
[n/a ] whatsnew entry

changes to tests for spacing before Datetime64TZ, which now have additional space.

jorisvandenbossche

Some quick comments (didn't look in detail yet). This can certainly be a nice clean-up, but I don't think it closes the discussion in #26837

jorisvandenbossche · 2019-06-20T13:10:05Z

pandas/core/arrays/base.py

@@ -881,7 +881,7 @@ def _formatter(
            ``boxed=True``.
        """
        if boxed:
-            return str
+            return None


How can None ever be a correct formatter function?

to pass nested values through unchanged to allow custom formatting options on the underlying floats

>>> import scipy.sparse >>> from pandas import SparseSeries >>> sparse = SparseSeries.from_coo(scipy.sparse.rand(350, 18)) C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\sparse\scipy_sparse.py:151: FutureWarning: Series.to_sparse is deprecated and will be removed in a future version s = s.to_sparse() # TODO: specify kind? >>> sparse._formatting_values()._formatting_values() array([0.1057938 , 0.12346779, 0.66926098, 0.63135293, 0.51537009, 0.83084751, 0.11707738, 0.94582471, 0.53306677, 0.76971943, 0.43649237, 0.21300667, 0.36538291, 0.75462779, 0.2525123 , 0.97204424, 0.62077108, 0.77137921, 0.77416063, 0.06010261, 0.0758256 , 0.81359155, 0.493831 , 0.86517892, 0.99948485, 0.16560856, 0.10064932, 0.2234172 , 0.37112586, 0.35745351, 0.03196741, 0.99648038, 0.39904145, 0.17162981, 0.98518366, 0.85796259, 0.91980664, 0.78111288, 0.00913085, 0.61513879, 0.68807133, 0.13308309, 0.99738934, 0.41772456, 0.30984901, 0.23902977, 0.52296826, 0.90989797, 0.25194294, 0.28347783, 0.76426202, 0.68137012, 0.00918782, 0.6749851 , 0.3858443 , 0.82595983, 0.72062635, 0.63630261, 0.13853372, 0.40467435, 0.03361887, 0.90591611, 0.50676725]) >>> >>> sparse._formatting_values()._formatter <bound method SparseArray._formatter of [0.10579380195508936, 0.123467793554345, 0.6692609807557325, 0.631352928127322, 0.5153700914920036, 0.8308475120205755, 0.11707738027292702, 0.9458247148462586, 0.53306677 34689285, 0.7697194280043602, 0.436492370827794, 0.21300667109238547, 0.3653829053917591, 0.7546277896220214, 0.2525123038343048, 0.9720442422636566, 0.62077107713076, 0.7713792114209488, 0.7741606333773862, 0.0 60102611123280636, 0.07582560258774373, 0.8135915546849382, 0.4938310020146096, 0.865178922460138, 0.9994848548516182, 0.16560855997590118, 0.10064932274817606, 0.22341720411731247, 0.3711258583673639, 0.3574535 112638453, 0.03196740969420808, 0.9964803793350895, 0.399041447277493, 0.17162981330873828, 0.985183664900777, 0.8579625879222734, 0.9198066431764218, 0.7811128816049261, 0.009130849663441576, 0.6151387882173937 , 0.6880713270054183, 0.13308308845790984, 0.997389339850816, 0.4177245615919253, 0.3098490139048895, 0.23902976566001577, 0.5229682615072562, 0.9098979745269118, 0.25194293787078936, 0.28347782689329026, 0.7642 620222948321, 0.6813701212402914, 0.009187820750616082, 0.6749850993894757, 0.3858442970367967, 0.8259598335077263, 0.7206263455689199, 0.6363026132729022, 0.1385337245480187, 0.4046743457416371, 0.0336188668302 50727, 0.905916112003116, 0.5067672489799548] Fill: nan BlockIndex Block locations: array([0]) Block lengths: array([63])> >>>

I don't fully understand the example you show.

But in any case, I would argue that such a fix should not be in the base array, but only in those EAs that need it? (but maybe that doesn't make sense, as I don't really understand the fix :-))

the formatting is now recursive. so multiple formatters can be applied.

this may be a better example of recursing until a numpy array returned.

>>> from pandas import period_range, Categorical >>> idx = period_range('2011-01-01 09:00', freq='H', periods=5) >>> c = Categorical(idx, ordered=True) >>> exp = """[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00] ... Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 < ... 2011-01-01 13:00]""" # noqa >>> >>> c [2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00] Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 < 2011-01-01 13:00] >>> >>> >>> type(c) <class 'pandas.core.arrays.categorical.Categorical'> >>> >>> c._formatting_values() PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00', '2011-01-01 12:00', '2011-01-01 13:00'], dtype='period[H]', freq='H') >>> >>> type(c._formatting_values()) <class 'pandas.core.indexes.period.PeriodIndex'> >>> >>> c._formatting_values().dtype period[H] >>> >>> type(c._formatter(boxed=True)) <class 'NoneType'> >>> >>> c._formatting_values()._formatting_values() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'PeriodIndex' object has no attribute '_formatting_values' >>> >>> type(c.get_values()) <class 'pandas.core.indexes.period.PeriodIndex'> >>> >>> c.get_values() PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00', '2011-01-01 12:00', '2011-01-01 13:00'], dtype='period[H]', freq='H') >>> >>> c.get_values()._values <PeriodArray> ['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00', '2011-01-01 12:00', '2011-01-01 13:00'] Length: 5, dtype: period[H] >>> >>> c.get_values()._values._formatter <bound method PeriodArray._formatter of <PeriodArray> ['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00', '2011-01-01 12:00', '2011-01-01 13:00'] Length: 5, dtype: period[H]> >>> >>> c.get_values()._values._formatting_values() array([Period('2011-01-01 09:00', 'H'), Period('2011-01-01 10:00', 'H'), Period('2011-01-01 11:00', 'H'), Period('2011-01-01 12:00', 'H'), Period('2011-01-01 13:00', 'H')], dtype=object) >>>

jorisvandenbossche · 2019-06-20T13:10:31Z

pandas/core/arrays/categorical.py

@@ -457,6 +457,9 @@ def _formatter(self, boxed=False):
        # Defer to CategoricalFormatter's formatter.
        return None

+    def _formatting_values(self):


_formatting_values is currently deprecated on EAs, so why was this needed?

to support the recursive behaviour and remove the special casing.

codecov · 2019-06-20T13:33:29Z

Codecov Report

Merging #26965 into master will decrease coverage by <.01%.
The diff coverage is 94.28%.

@@            Coverage Diff             @@
##           master   #26965      +/-   ##
==========================================
- Coverage   91.87%   91.86%   -0.01%     
==========================================
  Files         180      180              
  Lines       50746    50743       -3     
==========================================
- Hits        46623    46616       -7     
- Misses       4123     4127       +4

Flag	Coverage Δ
#multiple	`90.46% <94.28%> (-0.01%)`	⬇️
#single	`41.12% <62.85%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/arrays/base.py	`100% <100%> (+0.56%)`	⬆️
pandas/io/formats/format.py	`98% <100%> (+0.09%)`	⬆️
pandas/core/arrays/timedeltas.py	`88.82% <100%> (ø)`	⬆️
pandas/core/indexes/interval.py	`96.44% <100%> (ø)`	⬆️
pandas/core/arrays/categorical.py	`95.81% <100%> (-0.12%)`	⬇️
pandas/core/arrays/datetimelike.py	`97.6% <66.66%> (-0.33%)`	⬇️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/frame.py	`96.89% <0%> (-0.12%)`	⬇️
pandas/io/formats/printing.py	`87.2% <0%> (+0.47%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfd65e9...5d023cc. Read the comment docs.

codecov · 2019-06-20T13:33:36Z

Codecov Report

Merging #26965 into master will decrease coverage by 1.41%.
The diff coverage is 94.28%.

@@            Coverage Diff             @@
##           master   #26965      +/-   ##
==========================================
- Coverage   91.87%   90.46%   -1.42%     
==========================================
  Files         180      180              
  Lines       50746    50741       -5     
==========================================
- Hits        46623    45902     -721     
- Misses       4123     4839     +716

Flag	Coverage Δ
#multiple	`90.46% <94.28%> (-0.01%)`	⬇️
#single	`?`

Impacted Files	Coverage Δ
pandas/core/internals/blocks.py	`94.37% <ø> (-0.01%)`	⬇️
pandas/core/arrays/base.py	`100% <100%> (+0.56%)`	⬆️
pandas/io/formats/format.py	`98% <100%> (+0.09%)`	⬆️
pandas/core/arrays/timedeltas.py	`88.82% <100%> (ø)`	⬆️
pandas/core/indexes/interval.py	`96.44% <100%> (ø)`	⬆️
pandas/core/arrays/categorical.py	`95.68% <100%> (-0.25%)`	⬇️
pandas/core/arrays/datetimelike.py	`97.6% <66.66%> (-0.33%)`	⬇️
pandas/core/computation/pytables.py	`62.5% <0%> (-27.75%)`	⬇️
pandas/io/pytables.py	`63.82% <0%> (-26.48%)`	⬇️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfd65e9...1ef1969. Read the comment docs.

jorisvandenbossche · 2019-06-20T13:51:45Z

the formatting is now recursive. so multiple formatters can be applied.

Sorry, that is still too cryptic for me. Can you try to explain what is going on?

jorisvandenbossche · 2019-06-20T14:23:03Z

the formatting is now recursive. so multiple formatters can be applied.

So this is specific for Categorical ? Where the actual values (whether they be string, datetime, etc) can still format themselves?
And before, this was handled with some special casing in formats.py itself I suppose?

simonjayhawkins · 2019-06-20T14:24:45Z

Some quick comments (didn't look in detail yet). This can certainly be a nice clean-up, but I don't think it closes the discussion in #26837

if the special-casing is removed we can close that issue. New issues can be raised for the other points being raised during that discussion.

jorisvandenbossche · 2019-06-20T14:25:48Z

I personally think we need a better reason to un-deprecate EA._formatting_values. There needs to be a clear, recommended interface for EAs, now we are adding something back just for Categorical (and maybe Sparse?). That moves the special case from our formatting code to the EA interface, not sure that is a net improvement.

simonjayhawkins · 2019-06-20T14:32:50Z

these are internal EAs. EA authors may need similar nesting functionality.

Internal EAs should not be special cased and use an interface available to all.

simonjayhawkins · 2019-06-20T14:36:28Z

@jorisvandenbossche do you want to move the discussion back to the issue? This PR is a proof of concept and will obviously require further discussion and agreement before _formatting_values is undeprecated.

jreback

nice code removal; do we need some tests around _get_formatted_values()?

jreback · 2019-06-21T00:56:47Z

pandas/io/formats/format.py

@@ -883,14 +880,34 @@ def format_array(values, formatter, float_format=None, na_rep='NaN',
    List[str]
    """

+    def _get_formatted_values(values):


can you type & add a doc-string

jreback · 2019-06-21T00:57:36Z

pandas/io/formats/format.py

+        except AttributeError:
+            formatter = None
+
+        def _format_values(values):


I would make this a module level function (with types & doc-string), maybe _format_as_array

jreback · 2019-06-21T00:59:27Z

pandas/io/formats/format.py

+
+        try:
+            values = values._formatting_values()
+            return _format_values(_get_formatted_values(values))


can the try/except just be around values = values._formatting_values()?

jbrockmendel · 2019-08-03T00:35:53Z

@simonjayhawkins can you rebase

jorisvandenbossche · 2019-08-04T08:22:06Z

@jbrockmendel I don't think is ready for getting merged, see above discussion.

simonjayhawkins · 2019-08-28T16:46:04Z

closing for now

REF: remove special-casing for internal EAs from format_array

5d023cc

simonjayhawkins added ExtensionArray Extending pandas with custom dtypes or arrays. Output-Formatting __repr__ of pandas objects, to_string labels Jun 20, 2019

jorisvandenbossche reviewed Jun 20, 2019

View reviewed changes

simonjayhawkins added 2 commits June 20, 2019 15:13

remove DeprecationWarning

96546ae

remove test_formatting_values_deprecated

1ef1969

jreback requested changes Jun 21, 2019

View reviewed changes

simonjayhawkins closed this Aug 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: remove special-casing for internal EAs from format_array #26965

REF: remove special-casing for internal EAs from format_array #26965

simonjayhawkins commented Jun 20, 2019

jorisvandenbossche left a comment

jorisvandenbossche Jun 20, 2019

simonjayhawkins Jun 20, 2019

jorisvandenbossche Jun 20, 2019

simonjayhawkins Jun 20, 2019

simonjayhawkins Jun 20, 2019

jorisvandenbossche Jun 20, 2019

simonjayhawkins Jun 20, 2019

codecov bot commented Jun 20, 2019

codecov bot commented Jun 20, 2019 •

edited

Loading

jorisvandenbossche commented Jun 20, 2019

jorisvandenbossche commented Jun 20, 2019

simonjayhawkins commented Jun 20, 2019

jorisvandenbossche commented Jun 20, 2019

simonjayhawkins commented Jun 20, 2019

simonjayhawkins commented Jun 20, 2019

jreback left a comment

jreback Jun 21, 2019

jreback Jun 21, 2019

jreback Jun 21, 2019

jbrockmendel commented Aug 3, 2019

jorisvandenbossche commented Aug 4, 2019

simonjayhawkins commented Aug 28, 2019

REF: remove special-casing for internal EAs from format_array #26965

REF: remove special-casing for internal EAs from format_array #26965

Conversation

simonjayhawkins commented Jun 20, 2019

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 20, 2019

Codecov Report

codecov bot commented Jun 20, 2019 • edited Loading

Codecov Report

jorisvandenbossche commented Jun 20, 2019

jorisvandenbossche commented Jun 20, 2019

simonjayhawkins commented Jun 20, 2019

jorisvandenbossche commented Jun 20, 2019

simonjayhawkins commented Jun 20, 2019

simonjayhawkins commented Jun 20, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Aug 3, 2019

jorisvandenbossche commented Aug 4, 2019

simonjayhawkins commented Aug 28, 2019

codecov bot commented Jun 20, 2019 •

edited

Loading