Skip to content

Commit ffb9a01

Browse files
authored
DOC: Fix some rendering problems in arrow and io user guide (#52530)
1 parent 2028d9a commit ffb9a01

File tree

2 files changed

+46
-35
lines changed

2 files changed

+46
-35
lines changed

doc/source/user_guide/io.rst

+30-30
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ date_format : str or dict of column -> format, default ``None``
302302
format. For anything more complex,
303303
please read in as ``object`` and then apply :func:`to_datetime` as-needed.
304304

305-
.. versionadded:: 2.0.0
305+
.. versionadded:: 2.0.0
306306
dayfirst : boolean, default ``False``
307307
DD/MM format dates, international and European format.
308308
cache_dates : boolean, default True
@@ -385,9 +385,9 @@ on_bad_lines : {{'error', 'warn', 'skip'}}, default 'error'
385385
Specifies what to do upon encountering a bad line (a line with too many fields).
386386
Allowed values are :
387387

388-
- 'error', raise an ParserError when a bad line is encountered.
389-
- 'warn', print a warning when a bad line is encountered and skip that line.
390-
- 'skip', skip bad lines without raising or warning when they are encountered.
388+
- 'error', raise an ParserError when a bad line is encountered.
389+
- 'warn', print a warning when a bad line is encountered and skip that line.
390+
- 'skip', skip bad lines without raising or warning when they are encountered.
391391

392392
.. versionadded:: 1.3.0
393393

@@ -1998,12 +1998,12 @@ fall back in the following manner:
19981998
* if an object is unsupported it will attempt the following:
19991999

20002000

2001-
* check if the object has defined a ``toDict`` method and call it.
2001+
- check if the object has defined a ``toDict`` method and call it.
20022002
A ``toDict`` method should return a ``dict`` which will then be JSON serialized.
20032003

2004-
* invoke the ``default_handler`` if one was provided.
2004+
- invoke the ``default_handler`` if one was provided.
20052005

2006-
* convert the object to a ``dict`` by traversing its contents. However this will often fail
2006+
- convert the object to a ``dict`` by traversing its contents. However this will often fail
20072007
with an ``OverflowError`` or give unexpected results.
20082008

20092009
In general the best approach for unsupported objects or dtypes is to provide a ``default_handler``.
@@ -2092,19 +2092,19 @@ preserve string-like numbers (e.g. '1', '2') in an axes.
20922092

20932093
Large integer values may be converted to dates if ``convert_dates=True`` and the data and / or column labels appear 'date-like'. The exact threshold depends on the ``date_unit`` specified. 'date-like' means that the column label meets one of the following criteria:
20942094

2095-
* it ends with ``'_at'``
2096-
* it ends with ``'_time'``
2097-
* it begins with ``'timestamp'``
2098-
* it is ``'modified'``
2099-
* it is ``'date'``
2095+
* it ends with ``'_at'``
2096+
* it ends with ``'_time'``
2097+
* it begins with ``'timestamp'``
2098+
* it is ``'modified'``
2099+
* it is ``'date'``
21002100

21012101
.. warning::
21022102

21032103
When reading JSON data, automatic coercing into dtypes has some quirks:
21042104

2105-
* an index can be reconstructed in a different order from serialization, that is, the returned order is not guaranteed to be the same as before serialization
2106-
* a column that was ``float`` data will be converted to ``integer`` if it can be done safely, e.g. a column of ``1.``
2107-
* bool columns will be converted to ``integer`` on reconstruction
2105+
* an index can be reconstructed in a different order from serialization, that is, the returned order is not guaranteed to be the same as before serialization
2106+
* a column that was ``float`` data will be converted to ``integer`` if it can be done safely, e.g. a column of ``1.``
2107+
* bool columns will be converted to ``integer`` on reconstruction
21082108

21092109
Thus there are times where you may want to specify specific dtypes via the ``dtype`` keyword argument.
21102110

@@ -2370,19 +2370,19 @@ A few notes on the generated table schema:
23702370
23712371
* The default naming roughly follows these rules:
23722372

2373-
* For series, the ``object.name`` is used. If that's none, then the
2373+
- For series, the ``object.name`` is used. If that's none, then the
23742374
name is ``values``
2375-
* For ``DataFrames``, the stringified version of the column name is used
2376-
* For ``Index`` (not ``MultiIndex``), ``index.name`` is used, with a
2375+
- For ``DataFrames``, the stringified version of the column name is used
2376+
- For ``Index`` (not ``MultiIndex``), ``index.name`` is used, with a
23772377
fallback to ``index`` if that is None.
2378-
* For ``MultiIndex``, ``mi.names`` is used. If any level has no name,
2378+
- For ``MultiIndex``, ``mi.names`` is used. If any level has no name,
23792379
then ``level_<i>`` is used.
23802380

23812381
``read_json`` also accepts ``orient='table'`` as an argument. This allows for
23822382
the preservation of metadata such as dtypes and index names in a
23832383
round-trippable manner.
23842384

2385-
.. ipython:: python
2385+
.. ipython:: python
23862386
23872387
df = pd.DataFrame(
23882388
{
@@ -2780,20 +2780,20 @@ parse HTML tables in the top-level pandas io function ``read_html``.
27802780

27812781
* Benefits
27822782

2783-
* |lxml|_ is very fast.
2783+
- |lxml|_ is very fast.
27842784

2785-
* |lxml|_ requires Cython to install correctly.
2785+
- |lxml|_ requires Cython to install correctly.
27862786

27872787
* Drawbacks
27882788

2789-
* |lxml|_ does *not* make any guarantees about the results of its parse
2789+
- |lxml|_ does *not* make any guarantees about the results of its parse
27902790
*unless* it is given |svm|_.
27912791

2792-
* In light of the above, we have chosen to allow you, the user, to use the
2792+
- In light of the above, we have chosen to allow you, the user, to use the
27932793
|lxml|_ backend, but **this backend will use** |html5lib|_ if |lxml|_
27942794
fails to parse
27952795

2796-
* It is therefore *highly recommended* that you install both
2796+
- It is therefore *highly recommended* that you install both
27972797
|BeautifulSoup4|_ and |html5lib|_, so that you will still get a valid
27982798
result (provided everything else is valid) even if |lxml|_ fails.
27992799

@@ -2806,22 +2806,22 @@ parse HTML tables in the top-level pandas io function ``read_html``.
28062806

28072807
* Benefits
28082808

2809-
* |html5lib|_ is far more lenient than |lxml|_ and consequently deals
2809+
- |html5lib|_ is far more lenient than |lxml|_ and consequently deals
28102810
with *real-life markup* in a much saner way rather than just, e.g.,
28112811
dropping an element without notifying you.
28122812

2813-
* |html5lib|_ *generates valid HTML5 markup from invalid markup
2813+
- |html5lib|_ *generates valid HTML5 markup from invalid markup
28142814
automatically*. This is extremely important for parsing HTML tables,
28152815
since it guarantees a valid document. However, that does NOT mean that
28162816
it is "correct", since the process of fixing markup does not have a
28172817
single definition.
28182818

2819-
* |html5lib|_ is pure Python and requires no additional build steps beyond
2819+
- |html5lib|_ is pure Python and requires no additional build steps beyond
28202820
its own installation.
28212821

28222822
* Drawbacks
28232823

2824-
* The biggest drawback to using |html5lib|_ is that it is slow as
2824+
- The biggest drawback to using |html5lib|_ is that it is slow as
28252825
molasses. However consider the fact that many tables on the web are not
28262826
big enough for the parsing algorithm runtime to matter. It is more
28272827
likely that the bottleneck will be in the process of reading the raw
@@ -3211,7 +3211,7 @@ supports parsing such sizeable files using `lxml's iterparse`_ and `etree's iter
32113211
which are memory-efficient methods to iterate through an XML tree and extract specific elements and attributes.
32123212
without holding entire tree in memory.
32133213

3214-
.. versionadded:: 1.5.0
3214+
.. versionadded:: 1.5.0
32153215

32163216
.. _`lxml's iterparse`: https://lxml.de/3.2/parsing.html#iterparse-and-iterwalk
32173217
.. _`etree's iterparse`: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse

doc/source/user_guide/pyarrow.rst

+16-5
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,10 @@ which is similar to a NumPy array. To construct these from the main pandas data
3737
3838
.. note::
3939

40-
The string alias ``"string[pyarrow]"`` maps to ``pd.StringDtype("pyarrow")`` which is not equivalent to
41-
specifying ``dtype=pd.ArrowDtype(pa.string())``. Generally, operations on the data will behave similarly
42-
except ``pd.StringDtype("pyarrow")`` can return NumPy-backed nullable types while ``pd.ArrowDtype(pa.string())``
43-
will return :class:`ArrowDtype`.
40+
The string alias ``"string[pyarrow]"`` maps to ``pd.StringDtype("pyarrow")`` which is not equivalent to
41+
specifying ``dtype=pd.ArrowDtype(pa.string())``. Generally, operations on the data will behave similarly
42+
except ``pd.StringDtype("pyarrow")`` can return NumPy-backed nullable types while ``pd.ArrowDtype(pa.string())``
43+
will return :class:`ArrowDtype`.
4444

4545
.. ipython:: python
4646
@@ -62,10 +62,14 @@ into :class:`ArrowDtype` to use in the ``dtype`` parameter.
6262
ser = pd.Series([["hello"], ["there"]], dtype=pd.ArrowDtype(list_str_type))
6363
ser
6464
65+
.. ipython:: python
66+
6567
from datetime import time
6668
idx = pd.Index([time(12, 30), None], dtype=pd.ArrowDtype(pa.time64("us")))
6769
idx
6870
71+
.. ipython:: python
72+
6973
from decimal import Decimal
7074
decimal_type = pd.ArrowDtype(pa.decimal128(3, scale=2))
7175
data = [[Decimal("3.19"), None], [None, Decimal("-1.23")]]
@@ -78,7 +82,10 @@ or :class:`DataFrame` object.
7882

7983
.. ipython:: python
8084
81-
pa_array = pa.array([{"1": "2"}, {"10": "20"}, None])
85+
pa_array = pa.array(
86+
[{"1": "2"}, {"10": "20"}, None],
87+
type=pa.map_(pa.string(), pa.string()),
88+
)
8289
ser = pd.Series(pd.arrays.ArrowExtensionArray(pa_array))
8390
ser
8491
@@ -133,9 +140,13 @@ The following are just some examples of operations that are accelerated by nativ
133140
ser.isna()
134141
ser.fillna(0)
135142
143+
.. ipython:: python
144+
136145
ser_str = pd.Series(["a", "b", None], dtype=pd.ArrowDtype(pa.string()))
137146
ser_str.str.startswith("a")
138147
148+
.. ipython:: python
149+
139150
from datetime import datetime
140151
pa_type = pd.ArrowDtype(pa.timestamp("ns"))
141152
ser_dt = pd.Series([datetime(2022, 1, 1), None], dtype=pa_type)

0 commit comments

Comments
 (0)