diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst index e72a9d86daeaf..3b1a3c5e380d3 100644 --- a/doc/source/ecosystem.rst +++ b/doc/source/ecosystem.rst @@ -98,7 +98,8 @@ which can be used for a wide variety of time series data mining tasks. Visualization ------------- -While :ref:`pandas has built-in support for data visualization with matplotlib `, +`Pandas has its own Styler class for table visualization `_, and while +:ref:`pandas also has built-in support for data visualization through charts with matplotlib `, there are a number of other pandas-compatible libraries. `Altair `__ diff --git a/doc/source/user_guide/index.rst b/doc/source/user_guide/index.rst index 901f42097b911..6b6e212cde635 100644 --- a/doc/source/user_guide/index.rst +++ b/doc/source/user_guide/index.rst @@ -38,12 +38,12 @@ Further information on any specific method can be obtained in the integer_na boolean visualization + style computation groupby window timeseries timedeltas - style options enhancingperf scale diff --git a/doc/source/user_guide/style.ipynb b/doc/source/user_guide/style.ipynb index a67bac0c65462..b8119477407c0 100644 --- a/doc/source/user_guide/style.ipynb +++ b/doc/source/user_guide/style.ipynb @@ -4,30 +4,28 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Styling\n", + "# Table Visualization\n", "\n", - "This document is written as a Jupyter Notebook, and can be viewed or downloaded [here](https://nbviewer.ipython.org/github/pandas-dev/pandas/blob/master/doc/source/user_guide/style.ipynb).\n", + "This section demonstrates visualization of tabular data using the [Styler][styler]\n", + "class. For information on visualization with charting please see [Chart Visualization][viz]. This document is written as a Jupyter Notebook, and can be viewed or downloaded [here][download].\n", "\n", - "You can apply **conditional formatting**, the visual styling of a DataFrame\n", - "depending on the data within, by using the ``DataFrame.style`` property.\n", - "This is a property that returns a ``Styler`` object, which has\n", - "useful methods for formatting and displaying DataFrames.\n", - "\n", - "The styling is accomplished using CSS.\n", - "You write \"style functions\" that take scalars, `DataFrame`s or `Series`, and return *like-indexed* DataFrames or Series with CSS `\"attribute: value\"` pairs for the values.\n", - "These functions can be incrementally passed to the `Styler` which collects the styles before rendering.\n", - "\n", - "CSS is a flexible language and as such there may be multiple ways of achieving the same result, with potential\n", - "advantages or disadvantages, which we try to illustrate." + "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n", + "[viz]: visualization.rst\n", + "[download]: https://nbviewer.ipython.org/github/pandas-dev/pandas/blob/master/doc/source/user_guide/style.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Styler Object\n", + "## Styler Object and HTML \n", + "\n", + "Styling should be performed after the data in a DataFrame has been processed. The [Styler][styler] creates an HTML `` and leverages CSS styling language to manipulate many parameters including colors, fonts, borders, background, etc. See [here][w3schools] for more information on styling HTML tables. This allows a lot of flexibility out of the box, and even enables web developers to integrate DataFrames into their exiting user interface designs.\n", + " \n", + "The `DataFrame.style` attribute is a property that returns a [Styler][styler] object. It has a `_repr_html_` method defined on it so they are rendered automatically in Jupyter Notebook.\n", "\n", - "The `DataFrame.style` attribute is a property that returns a `Styler` object. `Styler` has a `_repr_html_` method defined on it so they are rendered automatically. If you want the actual HTML back for further processing or for writing to file call the `.render()` method which returns a string." + "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n", + "[w3schools]: https://www.w3schools.com/html/html_tables.asp" ] }, { @@ -52,12 +50,9 @@ "import pandas as pd\n", "import numpy as np\n", "\n", - "np.random.seed(24)\n", - "df = pd.DataFrame({'A': np.linspace(1, 10, 10)})\n", - "df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],\n", - " axis=1)\n", - "df.iloc[3, 3] = np.nan\n", - "df.iloc[0, 2] = np.nan\n", + "df = pd.DataFrame([[38.0, 2.0, 18.0, 22.0, 21, np.nan],[19, 439, 6, 452, 226,232]], \n", + " index=pd.Index(['Tumour (Positive)', 'Non-Tumour (Negative)'], name='Actual Label:'), \n", + " columns=pd.MultiIndex.from_product([['Decision Tree', 'Regression', 'Random'],['Tumour', 'Non-Tumour']], names=['Model:', 'Predicted:']))\n", "df.style" ] }, @@ -65,49 +60,105 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The above output looks very similar to the standard DataFrame HTML representation. But we've done some work behind the scenes to attach CSS classes to each cell. We can view these by calling the `.render` method." + "The above output looks very similar to the standard DataFrame HTML representation. But the HTML here has already attached some CSS classes to each cell, even if we haven't yet created any styles. We can view these by calling the [.render()][render] method, which returns the raw HTML as string, which is useful for further processing or adding to a file - read on in [More about CSS and HTML](#More-About-CSS-and-HTML). Below we will show how we can use these to format the DataFrame to be more communicative. For example how we can build `s`:\n", + "\n", + "[render]: ../reference/api/pandas.io.formats.style.Styler.render.rst" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "nbsphinx": "hidden" + }, "outputs": [], "source": [ - "df.style.render().split('\\n')[:10]" + "# Hidden cell to just create the below example: code is covered throughout the guide.\n", + "s = df.style\\\n", + " .hide_columns([('Random', 'Tumour'), ('Random', 'Non-Tumour')])\\\n", + " .format('{:.0f}')\\\n", + " .set_table_styles([{\n", + " 'selector': '',\n", + " 'props': 'border-collapse: separate;'\n", + " },{\n", + " 'selector': 'caption',\n", + " 'props': 'caption-side: bottom; font-size:1.3em;'\n", + " },{\n", + " 'selector': '.index_name',\n", + " 'props': 'font-style: italic; color: darkgrey; font-weight:normal;'\n", + " },{\n", + " 'selector': 'th:not(.index_name)',\n", + " 'props': 'background-color: #000066; color: white;'\n", + " },{\n", + " 'selector': 'th.col_heading',\n", + " 'props': 'text-align: center;'\n", + " },{\n", + " 'selector': 'th.col_heading.level0',\n", + " 'props': 'font-size: 1.5em;'\n", + " },{\n", + " 'selector': 'th.col2',\n", + " 'props': 'border-left: 1px solid white;'\n", + " },{\n", + " 'selector': '.col2',\n", + " 'props': 'border-left: 1px solid #000066;'\n", + " },{\n", + " 'selector': 'td',\n", + " 'props': 'text-align: center; font-weight:bold;'\n", + " },{\n", + " 'selector': '.true',\n", + " 'props': 'background-color: #e6ffe6;'\n", + " },{\n", + " 'selector': '.false',\n", + " 'props': 'background-color: #ffe6e6;'\n", + " },{\n", + " 'selector': '.border-red',\n", + " 'props': 'border: 2px dashed red;'\n", + " },{\n", + " 'selector': '.border-green',\n", + " 'props': 'border: 2px dashed green;'\n", + " },{\n", + " 'selector': 'td:hover',\n", + " 'props': 'background-color: #ffffb3;'\n", + " }])\\\n", + " .set_td_classes(pd.DataFrame([['true border-green', 'false', 'true', 'false border-red', '', ''],\n", + " ['false', 'true', 'false', 'true', '', '']], \n", + " index=df.index, columns=df.columns))\\\n", + " .set_caption(\"Confusion matrix for multiple cancer prediction models.\")\\\n", + " .set_tooltips(pd.DataFrame([['This model has a very strong true positive rate', '', '', \"This model's total number of false negatives is too high\", '', ''],\n", + " ['', '', '', '', '', '']], \n", + " index=df.index, columns=df.columns),\n", + " css_class='pd-tt', props=\n", + " 'visibility: hidden; position: absolute; z-index: 1; border: 1px solid #000066;'\n", + " 'background-color: white; color: #000066; font-size: 0.8em;' \n", + " 'transform: translate(0px, -24px); padding: 0.6em; border-radius: 0.5em;')\n" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "The `row0_col2` is the identifier for that particular cell. We've also prepended each row/column identifier with a UUID unique to each DataFrame so that the style from one doesn't collide with the styling from another within the same notebook or page (you can set the `uuid` if you'd like to tie together the styling of two DataFrames, or remove it if you want to optimise HTML transfer for larger tables)." + "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Building styles\n", + "## Formatting the Display\n", "\n", - "There are 3 primary methods of adding custom styles to DataFrames using CSS and matching it to cells:\n", + "### Formatting Values\n", "\n", - "- Directly linking external CSS classes to your individual cells using `Styler.set_td_classes`.\n", - "- Using `table_styles` to control broader areas of the DataFrame with internal CSS.\n", - "- Using the `Styler.apply` and `Styler.applymap` functions for more specific control with internal CSS. \n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Linking External CSS\n", + "Before adding styles it is useful to show that the [Styler][styler] can distinguish the *display* value from the *actual* value. To control the display value, the text is printed in each cell, and we can use the [.format()][formatfunc] method to manipulate this according to a [format spec string][format] or a callable that takes a single value and returns a string. It is possible to define this for the whole table or for individual columns. \n", "\n", - "*New in version 1.2.0*\n", + "Additionally, the format function has a **precision** argument to specifically help formatting floats, an **na_rep** argument to display missing data, and an **escape** argument to help displaying safe-HTML. The default formatter is configured to adopt pandas' regular `display.precision` option, controllable using `with pd.option_context('display.precision', 2):`\n", "\n", - "If you have designed a website then it is likely you will already have an external CSS file that controls the styling of table and cell objects within your website.\n", + "Here is an example of using the multiple options to control the formatting generally and with specific column formatters.\n", "\n", - "For example, suppose we have an external CSS which controls table properties and has some additional classes to style individual elements (here we manually add one to this notebook):" + "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n", + "[format]: https://docs.python.org/3/library/string.html#format-specification-mini-language\n", + "[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst" ] }, { @@ -116,22 +167,28 @@ "metadata": {}, "outputs": [], "source": [ - "from IPython.display import HTML\n", - "style = \\\n", - "\"\"\n", - "HTML(style)" + "df.style.format(precision=0, na_rep='MISSING', \n", + " formatter={('Decision Tree', 'Tumour'): \"{:.2f}\",\n", + " ('Regression', 'Non-Tumour'): lambda x: \"$ {:,.1f}\".format(x*-1e3)\n", + " })" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now we can manually link these to our DataFrame using the `Styler.set_table_attributes` and `Styler.set_td_classes` methods (note that table level 'table-cls' is overwritten here by Jupyters own CSS, but in HTML the default text color will be grey)." + "### Hiding Data\n", + "\n", + "The index can be hidden from rendering by calling [.hide_index()][hideidx], which might be useful if your index is integer based.\n", + "\n", + "Columns can be hidden from rendering by calling [.hide_columns()][hidecols] and passing in the name of a column, or a slice of columns.\n", + "\n", + "Hiding does not change the integer arrangement of CSS classes, e.g. hiding the first two columns of a DataFrame means the column class indexing will start at `col2`, since `col0` and `col1` are simply ignored.\n", + "\n", + "We can update our `Styler` object to hide some data and format the values.\n", + "\n", + "[hideidx]: ../reference/api/pandas.io.formats.style.Styler.hide_index.rst\n", + "[hidecols]: ../reference/api/pandas.io.formats.style.Styler.hide_columns.rst" ] }, { @@ -140,33 +197,65 @@ "metadata": {}, "outputs": [], "source": [ - "css_classes = pd.DataFrame(data=[['cls1', None], ['cls3', 'cls2 cls3']], index=[0,2], columns=['A', 'C'])\n", - "df.style.\\\n", - " set_table_attributes('class=\"table-cls\"').\\\n", - " set_td_classes(css_classes)" + "s = df.style.format('{:.0f}').hide_columns([('Random', 'Tumour'), ('Random', 'Non-Tumour')])\n", + "s" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting \n", + "s.set_uuid('after_hide')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The **advantage** of linking to external CSS is that it can be applied very easily. One can build a DataFrame of (multiple) CSS classes to add to each cell dynamically using traditional `DataFrame.apply` and `DataFrame.applymap` methods, or otherwise, and then add those to the Styler. It will integrate with your website's existing CSS styling.\n", + "## Methods to Add Styles\n", + "\n", + "There are **3 primary methods of adding custom CSS styles** to [Styler][styler]:\n", "\n", - "The **disadvantage** of this approach is that it is not easy to transmit files standalone. For example the external CSS must be included or the styling will simply be lost. It is also, as this example shows, not well suited (at a table level) for Jupyter Notebooks. Also this method cannot be used for exporting to Excel, for example, since the external CSS cannot be referenced either by the exporters or by Excel itself." + "- Using [.set_table_styles()][table] to control broader areas of the table with specified internal CSS. Although table styles allow the flexibility to add CSS selectors and properties controlling all individual parts of the table, they are unwieldy for individual cell specifications. Also, note that table styles cannot be exported to Excel. \n", + "- Using [.set_td_classes()][td_class] to directly link either external CSS classes to your data cells or link the internal CSS classes created by [.set_table_styles()][table]. See [here](#Setting-Classes-and-Linking-to-External-CSS). These cannot be used on column header rows or indexes, and also won't export to Excel. \n", + "- Using the [.apply()][apply] and [.applymap()][applymap] functions to add direct internal CSS to specific data cells. See [here](#Styler-Functions). These cannot be used on column header rows or indexes, but only these methods add styles that will export to Excel. These methods work in a similar way to [DataFrame.apply()][dfapply] and [DataFrame.applymap()][dfapplymap].\n", + "\n", + "[table]: ../reference/api/pandas.io.formats.style.Styler.set_table_styles.rst\n", + "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n", + "[td_class]: ../reference/api/pandas.io.formats.style.Styler.set_td_classes.rst\n", + "[apply]: ../reference/api/pandas.io.formats.style.Styler.apply.rst\n", + "[applymap]: ../reference/api/pandas.io.formats.style.Styler.applymap.rst\n", + "[dfapply]: ../reference/api/pandas.DataFrame.apply.rst\n", + "[dfapplymap]: ../reference/api/pandas.DataFrame.applymap.rst" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Using Table Styles\n", + "## Table Styles\n", + "\n", + "Table styles are flexible enough to control all individual parts of the table, including column headers and indexes. \n", + "However, they can be unwieldy to type for individual data cells or for any kind of conditional formatting, so we recommend that table styles are used for broad styling, such as entire rows or columns at a time.\n", "\n", - "Table styles allow you to control broader areas of the DataFrame, i.e. the whole table or specific columns or rows, with minimal HTML transfer. Much of the functionality of `Styler` uses individual HTML id tags to manipulate the output, which may be inefficient for very large tables. Using `table_styles` and otherwise avoiding using id tags in data cells can greatly reduce the rendered HTML.\n", + "Table styles are also used to control features which can apply to the whole table at once such as greating a generic hover functionality. The `:hover` pseudo-selector, as well as other pseudo-selectors, can only be used this way.\n", "\n", - "Table styles are also used to control features which can apply to the whole table at once such as greating a generic hover functionality. This `:hover` pseudo-selectors, as well as others, can only be used this way.\n", + "To replicate the normal format of CSS selectors and properties (attribute value pairs), e.g. \n", "\n", - "`table_styles` are extremely flexible, but not as fun to type out by hand.\n", - "We hope to collect some useful ones either in pandas, or preferable in a new package that [builds on top](#Extensibility) the tools here." + "```\n", + "tr:hover {\n", + " background-color: #ffff99;\n", + "}\n", + "```\n", + "\n", + "the necessary format to pass styles to [.set_table_styles()][table] is as a list of dicts, each with a CSS-selector tag and CSS-properties. Properties can either be a list of 2-tuples, or a regular CSS-string, for example:\n", + "\n", + "[table]: ../reference/api/pandas.io.formats.style.Styler.set_table_styles.rst" ] }, { @@ -175,23 +264,38 @@ "metadata": {}, "outputs": [], "source": [ - "def hover(hover_color=\"#ffff99\"):\n", - " return {'selector': \"tr:hover\",\n", - " 'props': [(\"background-color\", \"%s\" % hover_color)]}\n", - "\n", - "styles = [\n", - " hover(),\n", - " {'selector': \"th\", 'props': [(\"font-size\", \"150%\"), (\"text-align\", \"center\")]}\n", - "]\n", - "\n", - "df.style.set_table_styles(styles)" + "cell_hover = { # for row hover use instead of
\n", + " 'selector': 'td:hover',\n", + " 'props': [('background-color', '#ffffb3')]\n", + "}\n", + "index_names = {\n", + " 'selector': '.index_name',\n", + " 'props': 'font-style: italic; color: darkgrey; font-weight:normal;'\n", + "}\n", + "headers = {\n", + " 'selector': 'th:not(.index_name)',\n", + " 'props': 'background-color: #000066; color: white;'\n", + "}\n", + "s.set_table_styles([cell_hover, index_names, headers])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting \n", + "s.set_uuid('after_tab_styles1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "If `table_styles` is given as a dictionary each key should be a specified column or index value and this will map to specific class CSS selectors of the given column or row." + "Next we just add a couple more styling artifacts targeting specific parts of the table, and we add some internally defined CSS classes that we need for the next section. Be careful here, since we are *chaining methods* we need to explicitly instruct the method **not to** ``overwrite`` the existing styles." ] }, { @@ -200,31 +304,37 @@ "metadata": {}, "outputs": [], "source": [ - "df.style.set_table_styles({\n", - " 'A': [{'selector': '',\n", - " 'props': [('color', 'red')]}],\n", - " 'B': [{'selector': 'td',\n", - " 'props': [('color', 'blue')]}]\n", - "}, axis=0)" + "s.set_table_styles([\n", + " {'selector': 'th.col_heading', 'props': 'text-align: center;'},\n", + " {'selector': 'th.col_heading.level0', 'props': 'font-size: 1.5em;'},\n", + " {'selector': 'td', 'props': 'text-align: center; font-weight: bold;'},\n", + " # internal CSS classes\n", + " {'selector': '.true', 'props': 'background-color: #e6ffe6;'},\n", + " {'selector': '.false', 'props': 'background-color: #ffe6e6;'},\n", + " {'selector': '.border-red', 'props': 'border: 2px dashed red;'},\n", + " {'selector': '.border-green', 'props': 'border: 2px dashed green;'},\n", + "], overwrite=False)" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "nbsphinx": "hidden" + }, "outputs": [], "source": [ - "df.style.set_table_styles({\n", - " 3: [{'selector': 'td',\n", - " 'props': [('color', 'green')]}]\n", - "}, axis=1)" + "# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting \n", + "s.set_uuid('after_tab_styles2')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can also chain all of the above by setting the `overwrite` argument to `False` so that it preserves previous settings. We also show the CSS string input rather than the list of tuples." + "As a convenience method (*since version 1.2.0*) we can also pass a **dict** to [.set_table_styles()][table] which contains row or column keys. Behind the scenes Styler just indexes the keys and adds relevant `.col` or `.row` classes as necessary to the given CSS selectors.\n", + "\n", + "[table]: ../reference/api/pandas.io.formats.style.Styler.set_table_styles.rst" ] }, { @@ -233,27 +343,37 @@ "metadata": {}, "outputs": [], "source": [ - "from pandas.io.formats.style import Styler\n", - "s = Styler(df, cell_ids=False, uuid_len=0).\\\n", - " set_table_styles(styles).\\\n", - " set_table_styles({\n", - " 'A': [{'selector': '',\n", - " 'props': 'color:red;'}],\n", - " 'B': [{'selector': 'td',\n", - " 'props': 'color:blue;'}]\n", - " }, axis=0, overwrite=False).\\\n", - " set_table_styles({\n", - " 3: [{'selector': 'td',\n", - " 'props': 'color:green;font-weight:bold;'}]\n", - " }, axis=1, overwrite=False)\n", - "s" + "s.set_table_styles({\n", + " ('Regression', 'Tumour'): [{'selector': 'th', 'props': 'border-left: 1px solid white'},\n", + " {'selector': 'td', 'props': 'border-left: 1px solid #000066'}]\n", + "}, overwrite=False, axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting \n", + "s.set_uuid('xyz01')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "By using these `table_styles` and the additional `Styler` arguments to optimize the HTML we have compressed these styles to only a few lines withing the \\ tags and none of the \\ cells require any `id` attributes. " + "## Setting Classes and Linking to External CSS\n", + "\n", + "If you have designed a website then it is likely you will already have an external CSS file that controls the styling of table and cell objects within it. You may want to use these native files rather than duplicate all the CSS in python (and duplicate any maintenance work).\n", + "\n", + "### Table Attributes\n", + "\n", + "It is very easy to add a `class` to the main `` using [.set_table_attributes()][tableatt]. This method can also attach inline styles - read more in [CSS Hierarchies](#CSS-Hierarchies).\n", + "\n", + "[tableatt]: ../reference/api/pandas.io.formats.style.Styler.set_table_attributes.rst" ] }, { @@ -262,50 +382,114 @@ "metadata": {}, "outputs": [], "source": [ - "s.render().split('\\n')[:16]" + "out = s.set_table_attributes('class=\"my-table-cls\"').render()\n", + "print(out[out.find('` elements of the `
`. Here we add our `.true` and `.false` classes that we created previously. We will save adding the borders until the [section on tooltips](#Tooltips).\n", + "\n", + "[tdclass]: ../reference/api/pandas.io.formats.style.Styler.set_td_classes.rst\n", + "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cell_color = pd.DataFrame([['true ', 'false ', 'true ', 'false '], \n", + " ['false ', 'true ', 'false ', 'true ']], \n", + " index=df.index, \n", + " columns=df.columns[:4])\n", + "s.set_td_classes(cell_color)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting \n", + "s.set_uuid('after_classes')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Styler Functions\n", - "\n", - "Thirdly we can use the method to pass your style functions into one of the following methods:\n", - "\n", - "- ``Styler.applymap``: elementwise\n", - "- ``Styler.apply``: column-/row-/table-wise\n", - "\n", - "Both of those methods take a function (and some other keyword arguments) and applies your function to the DataFrame in a certain way.\n", - "`Styler.applymap` works through the DataFrame elementwise.\n", - "`Styler.apply` passes each column or row into your DataFrame one-at-a-time or the entire table at once, depending on the `axis` keyword argument.\n", - "For columnwise use `axis=0`, rowwise use `axis=1`, and for the entire table at once use `axis=None`.\n", - "\n", - "For `Styler.applymap` your function should take a scalar and return a single string with the CSS attribute-value pair.\n", + "## Styler Functions\n", "\n", - "For `Styler.apply` your function should take a Series or DataFrame (depending on the axis parameter), and return a Series or DataFrame with an identical shape where each value is a string with a CSS attribute-value pair.\n", + "We use the following methods to pass your style functions. Both of those methods take a function (and some other keyword arguments) and apply it to the DataFrame in a certain way, rendering CSS styles.\n", "\n", - "The **advantage** of this method is that there is full granular control and the output is isolated and easily transferrable, especially in Jupyter Notebooks.\n", + "- [.applymap()][applymap] (elementwise): accepts a function that takes a single value and returns a string with the CSS attribute-value pair.\n", + "- [.apply()][apply] (column-/row-/table-wise): accepts a function that takes a Series or DataFrame and returns a Series, DataFrame, or numpy array with an identical shape where each element is a string with a CSS attribute-value pair. This method passes each column or row of your DataFrame one-at-a-time or the entire table at once, depending on the `axis` keyword argument. For columnwise use `axis=0`, rowwise use `axis=1`, and for the entire table at once use `axis=None`.\n", "\n", - "The **disadvantage** is that the HTML/CSS required to produce this needs to be directly generated from the Python code and it can lead to inefficient data transfer for large tables.\n", + "This method is powerful for applying multiple, complex logic to data cells. We create a new DataFrame to demonstrate this.\n", "\n", - "Let's see some examples." + "[apply]: ../reference/api/pandas.io.formats.style.Styler.apply.rst\n", + "[applymap]: ../reference/api/pandas.io.formats.style.Styler.applymap.rst" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.random.seed(0)\n", + "df2 = pd.DataFrame(np.random.randn(10,4), columns=['A','B','C','D'])\n", + "df2.style" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example we can build a function that colors text if it is negative, and chain this with a function that partially fades cells of negligible value. Since this looks at each element in turn we use ``applymap``." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def style_negative(v, props=''):\n", + " return props if v < 0 else None\n", + "s2 = df2.style.applymap(style_negative, props='color:red;')\\\n", + " .applymap(lambda v: 'opacity: 20%;' if (v < 0.3) and (v > -0.3) else None)\n", + "s2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting \n", + "s2.set_uuid('after_applymap')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's write a simple style function that will color negative numbers red and positive numbers black." + "We can also build a function that highlights the maximum value across rows, cols, and the DataFrame all at once. In this case we use ``apply``. Below we highlight the maximum in a column." ] }, { @@ -314,19 +498,28 @@ "metadata": {}, "outputs": [], "source": [ - "def color_negative_red(val):\n", - " \"\"\"Color negative scalars red.\"\"\"\n", - " css = 'color: red;'\n", - " if val < 0: return css\n", - " return None" + "def highlight_max(s, props=''):\n", + " return np.where(s == np.nanmax(s.values), props, '')\n", + "s2.apply(highlight_max, props='color:white;background-color:darkblue', axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "nbsphinx": "hidden" + }, + "outputs": [], + "source": [ + "# Hidden cell to avoid CSS clashes and latter code upcoding previous formatting \n", + "s2.set_uuid('after_apply')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "In this case, the cell's style depends only on its own value.\n", - "That means we should use the `Styler.applymap` method which works elementwise." + "We can use the same function across the different axes, highlighting here the DataFrame maximum in purple, and row maximums in pink." ] }, { @@ -335,28 +528,34 @@ "metadata": {}, "outputs": [], "source": [ - "s = df.style.applymap(color_negative_red)\n", - "s" + "s2.apply(highlight_max, props='color:white;background-color:pink;', axis=1)\\\n", + " .apply(highlight_max, props='color:white;background-color:purple', axis=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Notice the similarity with the standard `df.applymap`, which operates on DataFrames elementwise. We want you to be able to reuse your existing knowledge of how to interact with DataFrames.\n", + "This last example shows how some styles have been overwritten by others. In general the most recent style applied is active but you can read more in the [section on CSS hierarchies](#CSS-Hierarchies). You can also apply these styles to more granular parts of the DataFrame - read more in section on [subset slicing](#Finer-Control-with-Slicing).\n", "\n", - "Notice also that our function returned a string containing the CSS attribute and value, separated by a colon just like in a `'.format(css))" + "# HTML(''.format(css))" ] } ], diff --git a/doc/source/user_guide/visualization.rst b/doc/source/user_guide/visualization.rst index 8b41cc24829c5..7b2c8478e71af 100644 --- a/doc/source/user_guide/visualization.rst +++ b/doc/source/user_guide/visualization.rst @@ -2,9 +2,12 @@ {{ header }} -************* -Visualization -************* +******************* +Chart Visualization +******************* + +This section demonstrates visualization through charting. For information on +visualization of tabular data please see the section on `Table Visualization `_. We use the standard convention for referencing the matplotlib API: