DOC: update style.ipynb for 2.0 (#50973)

attack68 · mroeschke · web-flow · commit 4510c6f16a2c · 2023-02-09T13:42:39.000-08:00
* update style.ipynb for 2.0

* Update doc/source/user_guide/style.ipynb

* Update doc/source/user_guide/style.ipynb

Co-authored-by: Matthew Roeschke &lt;10647082+mroeschke@users.noreply.github.com&gt;

* Update doc/source/user_guide/style.ipynb

Co-authored-by: Matthew Roeschke &lt;10647082+mroeschke@users.noreply.github.com&gt;

* Update doc/source/user_guide/style.ipynb

Co-authored-by: Matthew Roeschke &lt;10647082+mroeschke@users.noreply.github.com&gt;

* ensure imports at top of file

---------

Co-authored-by: JHM Darbyshire (iMac) &lt;attack68@users.noreply.github.com&gt;
Co-authored-by: Matthew Roeschke &lt;10647082+mroeschke@users.noreply.github.com&gt;
diff --git a/doc/source/user_guide/style.ipynb b/doc/source/user_guide/style.ipynb
@@ -9,23 +9,33 @@
     "This section demonstrates visualization of tabular data using the [Styler][styler]\n",
     "class. For information on visualization with charting please see [Chart Visualization][viz]. This document is written as a Jupyter Notebook, and can be viewed or downloaded [here][download].\n",
     "\n",
-    "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n",
-    "[viz]: visualization.rst\n",
-    "[download]: https://nbviewer.ipython.org/github/pandas-dev/pandas/blob/main/doc/source/user_guide/style.ipynb"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Styler Object and HTML \n",
+    "## Styler Object and Customising the Display\n",
+    "Styling and output display customisation should be performed **after** the data in a DataFrame has been processed. The Styler is **not** dynamically updated if further changes to the DataFrame are made. The `DataFrame.style` attribute is a property that returns a [Styler][styler] object. It has a `_repr_html_` method defined on it so it is rendered automatically in Jupyter Notebook.\n",
     "\n",
-    "Styling should be performed after the data in a DataFrame has been processed. The [Styler][styler] creates an HTML `<table>` and leverages CSS styling language to manipulate many parameters including colors, fonts, borders, background, etc. See [here][w3schools] for more information on styling HTML tables. This allows a lot of flexibility out of the box, and even enables web developers to integrate DataFrames into their exiting user interface designs.\n",
-    "    \n",
-    "The `DataFrame.style` attribute is a property that returns a [Styler][styler] object. It has a `_repr_html_` method defined on it so they are rendered automatically in Jupyter Notebook.\n",
+    "The Styler, which can be used for large data but is primarily designed for small data, currently has the ability to output to these formats:\n",
+    "\n",
+    "  - HTML\n",
+    "  - LaTeX\n",
+    "  - String (and CSV by extension)\n",
+    "  - Excel\n",
+    "  - (JSON is not currently available)\n",
+    "\n",
+    "The first three of these have display customisation methods designed to format and customise the output. These include:\n",
     "\n",
+    "  - Formatting values, the index and columns headers, using [.format()][formatfunc] and [.format_index()][formatfuncindex],\n",
+    "  - Renaming the index or column header labels, using [.relabel_index()][relabelfunc]\n",
+    "  - Hiding certain columns, the index and/or column headers, or index names, using [.hide()][hidefunc]\n",
+    "  - Concatenating similar DataFrames, using [.concat()][concatfunc]\n",
+    "  \n",
     "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n",
-    "[w3schools]: https://www.w3schools.com/html/html_tables.asp"
+    "[viz]: visualization.rst\n",
+    "[download]: https://nbviewer.ipython.org/github/pandas-dev/pandas/blob/main/doc/source/user_guide/style.ipynb\n",
+    "[format]: https://docs.python.org/3/library/string.html#format-specification-mini-language\n",
+    "[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst\n",
+    "[formatfuncindex]: ../reference/api/pandas.io.formats.style.Styler.format_index.rst\n",
+    "[relabelfunc]: ../reference/api/pandas.io.formats.style.Styler.relabel_index.rst\n",
+    "[hidefunc]: ../reference/api/pandas.io.formats.style.Styler.hide.rst\n",
+    "[concatfunc]: ../reference/api/pandas.io.formats.style.Styler.concat.rst"
    ]
   },
   {
@@ -41,6 +51,25 @@
     "# This cell is hidden from the output"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Formatting the Display\n",
+    "\n",
+    "### Formatting Values\n",
+    "\n",
+    "The [Styler][styler] distinguishes the *display* value from the *actual* value, in both data values and index or columns headers. To control the display value, the text is printed in each cell as string, and we can use the [.format()][formatfunc] and [.format_index()][formatfuncindex] methods to manipulate this according to a [format spec string][format] or a callable that takes a single value and returns a string. It is possible to define this for the whole table, or index, or for individual columns, or MultiIndex levels. We can also overwrite index names\n",
+    "\n",
+    "Additionally, the format function has a **precision** argument to specifically help formatting floats, as well as **decimal** and **thousands** separators to support other locales, an **na_rep** argument to display missing data, and an **escape** and **hyperlinks** arguments to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas' global options such as `styler.format.precision` option, controllable using `with pd.option_context('format.precision', 2):` \n",
+    "\n",
+    "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n",
+    "[format]: https://docs.python.org/3/library/string.html#format-specification-mini-language\n",
+    "[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst\n",
+    "[formatfuncindex]: ../reference/api/pandas.io.formats.style.Styler.format_index.rst\n",
+    "[relabelfunc]: ../reference/api/pandas.io.formats.style.Styler.relabel_index.rst"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -51,19 +80,157 @@
     "import numpy as np\n",
     "import matplotlib as mpl\n",
     "\n",
-    "df = pd.DataFrame([[38.0, 2.0, 18.0, 22.0, 21, np.nan],[19, 439, 6, 452, 226,232]], \n",
-    "                  index=pd.Index(['Tumour (Positive)', 'Non-Tumour (Negative)'], name='Actual Label:'), \n",
-    "                  columns=pd.MultiIndex.from_product([['Decision Tree', 'Regression', 'Random'],['Tumour', 'Non-Tumour']], names=['Model:', 'Predicted:']))\n",
-    "df.style"
+    "df = pd.DataFrame({\n",
+    "    \"strings\": [\"Adam\", \"Mike\"],\n",
+    "    \"ints\": [1, 3],\n",
+    "    \"floats\": [1.123, 1000.23]\n",
+    "})\n",
+    "df.style \\\n",
+    "  .format(precision=3, thousands=\".\", decimal=\",\") \\\n",
+    "  .format_index(str.upper, axis=1) \\\n",
+    "  .relabel_index([\"row 1\", \"row 2\"], axis=0)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The above output looks very similar to the standard DataFrame HTML representation. But the HTML here has already attached some CSS classes to each cell, even if we haven't yet created any styles. We can view these by calling the  [.to_html()][tohtml] method, which returns the raw HTML as string, which is useful for further processing or adding to a file - read on in [More about CSS and HTML](#More-About-CSS-and-HTML). Below we will show how we can use these to format the DataFrame to be more communicative. For example how we can build `s`:\n",
+    "Using Styler to manipulate the display is a useful feature because maintaining the indexing and data values for other purposes gives greater control. You do not have to overwrite your DataFrame to display it how you like. Here is a more comprehensive example of using the formatting functions whilst still relying on the underlying data for indexing and calculations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "weather_df = pd.DataFrame(np.random.rand(10,2)*5, \n",
+    "                          index=pd.date_range(start=\"2021-01-01\", periods=10),\n",
+    "                          columns=[\"Tokyo\", \"Beijing\"])\n",
     "\n",
-    "[tohtml]: ../reference/api/pandas.io.formats.style.Styler.to_html.rst"
+    "def rain_condition(v): \n",
+    "    if v < 1.75:\n",
+    "        return \"Dry\"\n",
+    "    elif v < 2.75:\n",
+    "        return \"Rain\"\n",
+    "    return \"Heavy Rain\"\n",
+    "\n",
+    "def make_pretty(styler):\n",
+    "    styler.set_caption(\"Weather Conditions\")\n",
+    "    styler.format(rain_condition)\n",
+    "    styler.format_index(lambda v: v.strftime(\"%A\"))\n",
+    "    styler.background_gradient(axis=None, vmin=1, vmax=5, cmap=\"YlGnBu\")\n",
+    "    return styler\n",
+    "\n",
+    "weather_df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "weather_df.loc[\"2021-01-04\":\"2021-01-08\"].style.pipe(make_pretty)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Hiding Data\n",
+    "\n",
+    "The index and column headers can be completely hidden, as well subselecting rows or columns that one wishes to exclude. Both these options are performed using the same methods.\n",
+    "\n",
+    "The index can be hidden from rendering by calling [.hide()][hideidx] without any arguments, which might be useful if your index is integer based. Similarly column headers can be hidden by calling [.hide(axis=\"columns\")][hideidx] without any further arguments.\n",
+    "\n",
+    "Specific rows or columns can be hidden from rendering by calling the same [.hide()][hideidx] method and passing in a row/column label, a list-like or a slice of row/column labels to for the ``subset`` argument.\n",
+    "\n",
+    "Hiding does not change the integer arrangement of CSS classes, e.g. hiding the first two columns of a DataFrame means the column class indexing will still start at `col2`, since `col0` and `col1` are simply ignored.\n",
+    "\n",
+    "[hideidx]: ../reference/api/pandas.io.formats.style.Styler.hide.rst"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pd.DataFrame(np.random.randn(5, 5))\n",
+    "df.style \\\n",
+    "  .hide(subset=[0, 2, 4], axis=0) \\\n",
+    "  .hide(subset=[0, 2, 4], axis=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To invert the function to a **show** functionality it is best practice to compose a list of hidden items."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "show = [0, 2, 4]\n",
+    "df.style \\\n",
+    "  .hide([row for row in df.index if row not in show], axis=0) \\\n",
+    "  .hide([col for col in df.columns if col not in show], axis=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Concatenating DataFrame Outputs\n",
+    "\n",
+    "Two or more Stylers can be concatenated together provided they share the same columns. This is very useful for showing summary statistics for a DataFrame, and is often used in combination with DataFrame.agg.\n",
+    "\n",
+    "Since the objects concatenated are Stylers they can independently be styled as will be shown below and their concatenation preserves those styles."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "summary_styler = df.agg([\"sum\", \"mean\"]).style \\\n",
+    "                   .format(precision=3) \\\n",
+    "                   .relabel_index([\"Sum\", \"Average\"])\n",
+    "df.style.format(precision=1).concat(summary_styler)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Styler Object and HTML \n",
+    "\n",
+    "The [Styler][styler] was originally constructed to support the wide array of HTML formatting options. Its HTML output creates an HTML `<table>` and leverages CSS styling language to manipulate many parameters including colors, fonts, borders, background, etc. See [here][w3schools] for more information on styling HTML tables. This allows a lot of flexibility out of the box, and even enables web developers to integrate DataFrames into their exiting user interface designs.\n",
+    "\n",
+    "Below we demonstrate the default output, which looks very similar to the standard DataFrame HTML representation. But the HTML here has already attached some CSS classes to each cell, even if we haven't yet created any styles. We can view these by calling the  [.to_html()][tohtml] method, which returns the raw HTML as string, which is useful for further processing or adding to a file - read on in [More about CSS and HTML](#More-About-CSS-and-HTML). This section will also provide a walkthrough for how to convert this default output to represent a DataFrame output that is more communicative. For example how we can build `s`:\n",
+    "\n",
+    "[tohtml]: ../reference/api/pandas.io.formats.style.Styler.to_html.rst\n",
+    "\n",
+    "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n",
+    "[w3schools]: https://www.w3schools.com/html/html_tables.asp"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pd.DataFrame([[38.0, 2.0, 18.0, 22.0, 21, np.nan],[19, 439, 6, 452, 226,232]], \n",
+    "                  index=pd.Index(['Tumour (Positive)', 'Non-Tumour (Negative)'], name='Actual Label:'), \n",
+    "                  columns=pd.MultiIndex.from_product([['Decision Tree', 'Regression', 'Random'],['Tumour', 'Non-Tumour']], names=['Model:', 'Predicted:']))\n",
+    "df.style"
    ]
   },
   {
@@ -147,90 +314,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Formatting the Display\n",
-    "\n",
-    "### Formatting Values\n",
-    "\n",
-    "Before adding styles it is useful to show that the [Styler][styler] can distinguish the *display* value from the *actual* value, in both datavalues and index or columns headers. To control the display value, the text is printed in each cell as string, and we can use the [.format()][formatfunc] and [.format_index()][formatfuncindex] methods to manipulate this according to a [format spec string][format] or a callable that takes a single value and returns a string. It is possible to define this for the whole table, or index, or for individual columns, or MultiIndex levels. \n",
-    "\n",
-    "Additionally, the format function has a **precision** argument to specifically help formatting floats, as well as **decimal** and **thousands** separators to support other locales, an **na_rep** argument to display missing data, and an **escape** argument to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas' `styler.format.precision` option, controllable using `with pd.option_context('format.precision', 2):` \n",
-    "\n",
-    "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n",
-    "[format]: https://docs.python.org/3/library/string.html#format-specification-mini-language\n",
-    "[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst\n",
-    "[formatfuncindex]: ../reference/api/pandas.io.formats.style.Styler.format_index.rst"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "df.style.format(precision=0, na_rep='MISSING', thousands=\" \",\n",
-    "                formatter={('Decision Tree', 'Tumour'): \"{:.2f}\",\n",
-    "                           ('Regression', 'Non-Tumour'): lambda x: \"$ {:,.1f}\".format(x*-1e6)\n",
-    "                          })"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Using Styler to manipulate the display is a useful feature because maintaining the indexing and datavalues for other purposes gives greater control. You do not have to overwrite your DataFrame to display it how you like. Here is an example of using the formatting functions whilst still relying on the underlying data for indexing and calculations."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "weather_df = pd.DataFrame(np.random.rand(10,2)*5, \n",
-    "                          index=pd.date_range(start=\"2021-01-01\", periods=10),\n",
-    "                          columns=[\"Tokyo\", \"Beijing\"])\n",
-    "\n",
-    "def rain_condition(v): \n",
-    "    if v < 1.75:\n",
-    "        return \"Dry\"\n",
-    "    elif v < 2.75:\n",
-    "        return \"Rain\"\n",
-    "    return \"Heavy Rain\"\n",
-    "\n",
-    "def make_pretty(styler):\n",
-    "    styler.set_caption(\"Weather Conditions\")\n",
-    "    styler.format(rain_condition)\n",
-    "    styler.format_index(lambda v: v.strftime(\"%A\"))\n",
-    "    styler.background_gradient(axis=None, vmin=1, vmax=5, cmap=\"YlGnBu\")\n",
-    "    return styler\n",
-    "\n",
-    "weather_df"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "weather_df.loc[\"2021-01-04\":\"2021-01-08\"].style.pipe(make_pretty)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Hiding Data\n",
-    "\n",
-    "The index and column headers can be completely hidden, as well subselecting rows or columns that one wishes to exclude. Both these options are performed using the same methods.\n",
-    "\n",
-    "The index can be hidden from rendering by calling [.hide()][hideidx] without any arguments, which might be useful if your index is integer based. Similarly column headers can be hidden by calling [.hide(axis=\"columns\")][hideidx] without any further arguments.\n",
-    "\n",
-    "Specific rows or columns can be hidden from rendering by calling the same [.hide()][hideidx] method and passing in a row/column label, a list-like or a slice of row/column labels to for the ``subset`` argument.\n",
-    "\n",
-    "Hiding does not change the integer arrangement of CSS classes, e.g. hiding the first two columns of a DataFrame means the column class indexing will still start at `col2`, since `col0` and `col1` are simply ignored.\n",
-    "\n",
-    "We can update our `Styler` object from before to hide some data and format the values.\n",
+    "The first step we have taken is the create the Styler object from the DataFrame and then select the range of interest by hiding unwanted columns with [.hide()][hideidx].\n",
     "\n",
     "[hideidx]: ../reference/api/pandas.io.formats.style.Styler.hide.rst"
    ]