You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Specifies what to do upon encountering a bad line (a line with too many fields).
386
386
Allowed values are :
387
387
388
-
- 'error', raise an ParserError when a bad line is encountered.
389
-
- 'warn', print a warning when a bad line is encountered and skip that line.
390
-
- 'skip', skip bad lines without raising or warning when they are encountered.
388
+
- 'error', raise an ParserError when a bad line is encountered.
389
+
- 'warn', print a warning when a bad line is encountered and skip that line.
390
+
- 'skip', skip bad lines without raising or warning when they are encountered.
391
391
392
392
.. versionadded:: 1.3.0
393
393
@@ -1998,12 +1998,12 @@ fall back in the following manner:
1998
1998
* if an object is unsupported it will attempt the following:
1999
1999
2000
2000
2001
-
* check if the object has defined a ``toDict`` method and call it.
2001
+
- check if the object has defined a ``toDict`` method and call it.
2002
2002
A ``toDict`` method should return a ``dict`` which will then be JSON serialized.
2003
2003
2004
-
* invoke the ``default_handler`` if one was provided.
2004
+
- invoke the ``default_handler`` if one was provided.
2005
2005
2006
-
* convert the object to a ``dict`` by traversing its contents. However this will often fail
2006
+
- convert the object to a ``dict`` by traversing its contents. However this will often fail
2007
2007
with an ``OverflowError`` or give unexpected results.
2008
2008
2009
2009
In general the best approach for unsupported objects or dtypes is to provide a ``default_handler``.
@@ -2092,19 +2092,19 @@ preserve string-like numbers (e.g. '1', '2') in an axes.
2092
2092
2093
2093
Large integer values may be converted to dates if ``convert_dates=True`` and the data and / or column labels appear 'date-like'. The exact threshold depends on the ``date_unit`` specified. 'date-like' means that the column label meets one of the following criteria:
2094
2094
2095
-
* it ends with ``'_at'``
2096
-
* it ends with ``'_time'``
2097
-
* it begins with ``'timestamp'``
2098
-
* it is ``'modified'``
2099
-
* it is ``'date'``
2095
+
* it ends with ``'_at'``
2096
+
* it ends with ``'_time'``
2097
+
* it begins with ``'timestamp'``
2098
+
* it is ``'modified'``
2099
+
* it is ``'date'``
2100
2100
2101
2101
.. warning::
2102
2102
2103
2103
When reading JSON data, automatic coercing into dtypes has some quirks:
2104
2104
2105
-
* an index can be reconstructed in a different order from serialization, that is, the returned order is not guaranteed to be the same as before serialization
2106
-
* a column that was ``float`` data will be converted to ``integer`` if it can be done safely, e.g. a column of ``1.``
2107
-
* bool columns will be converted to ``integer`` on reconstruction
2105
+
* an index can be reconstructed in a different order from serialization, that is, the returned order is not guaranteed to be the same as before serialization
2106
+
* a column that was ``float`` data will be converted to ``integer`` if it can be done safely, e.g. a column of ``1.``
2107
+
* bool columns will be converted to ``integer`` on reconstruction
2108
2108
2109
2109
Thus there are times where you may want to specify specific dtypes via the ``dtype`` keyword argument.
2110
2110
@@ -2370,19 +2370,19 @@ A few notes on the generated table schema:
2370
2370
2371
2371
* The default naming roughly follows these rules:
2372
2372
2373
-
* For series, the ``object.name`` is used. If that's none, then the
2373
+
- For series, the ``object.name`` is used. If that's none, then the
2374
2374
name is ``values``
2375
-
* For ``DataFrames``, the stringified version of the column name is used
2376
-
* For ``Index`` (not ``MultiIndex``), ``index.name`` is used, with a
2375
+
- For ``DataFrames``, the stringified version of the column name is used
2376
+
- For ``Index`` (not ``MultiIndex``), ``index.name`` is used, with a
2377
2377
fallback to ``index`` if that is None.
2378
-
* For ``MultiIndex``, ``mi.names`` is used. If any level has no name,
2378
+
- For ``MultiIndex``, ``mi.names`` is used. If any level has no name,
2379
2379
then ``level_<i>`` is used.
2380
2380
2381
2381
``read_json`` also accepts ``orient='table'`` as an argument. This allows for
2382
2382
the preservation of metadata such as dtypes and index names in a
2383
2383
round-trippable manner.
2384
2384
2385
-
.. ipython:: python
2385
+
.. ipython:: python
2386
2386
2387
2387
df = pd.DataFrame(
2388
2388
{
@@ -2780,20 +2780,20 @@ parse HTML tables in the top-level pandas io function ``read_html``.
2780
2780
2781
2781
* Benefits
2782
2782
2783
-
* |lxml|_ is very fast.
2783
+
- |lxml|_ is very fast.
2784
2784
2785
-
* |lxml|_ requires Cython to install correctly.
2785
+
- |lxml|_ requires Cython to install correctly.
2786
2786
2787
2787
* Drawbacks
2788
2788
2789
-
* |lxml|_ does *not* make any guarantees about the results of its parse
2789
+
- |lxml|_ does *not* make any guarantees about the results of its parse
2790
2790
*unless* it is given |svm|_.
2791
2791
2792
-
* In light of the above, we have chosen to allow you, the user, to use the
2792
+
- In light of the above, we have chosen to allow you, the user, to use the
2793
2793
|lxml|_ backend, but **this backend will use** |html5lib|_ if |lxml|_
2794
2794
fails to parse
2795
2795
2796
-
* It is therefore *highly recommended* that you install both
2796
+
- It is therefore *highly recommended* that you install both
2797
2797
|BeautifulSoup4|_ and |html5lib|_, so that you will still get a valid
2798
2798
result (provided everything else is valid) even if |lxml|_ fails.
2799
2799
@@ -2806,22 +2806,22 @@ parse HTML tables in the top-level pandas io function ``read_html``.
2806
2806
2807
2807
* Benefits
2808
2808
2809
-
* |html5lib|_ is far more lenient than |lxml|_ and consequently deals
2809
+
- |html5lib|_ is far more lenient than |lxml|_ and consequently deals
2810
2810
with *real-life markup* in a much saner way rather than just, e.g.,
2811
2811
dropping an element without notifying you.
2812
2812
2813
-
* |html5lib|_ *generates valid HTML5 markup from invalid markup
2813
+
- |html5lib|_ *generates valid HTML5 markup from invalid markup
2814
2814
automatically*. This is extremely important for parsing HTML tables,
2815
2815
since it guarantees a valid document. However, that does NOT mean that
2816
2816
it is "correct", since the process of fixing markup does not have a
2817
2817
single definition.
2818
2818
2819
-
* |html5lib|_ is pure Python and requires no additional build steps beyond
2819
+
- |html5lib|_ is pure Python and requires no additional build steps beyond
2820
2820
its own installation.
2821
2821
2822
2822
* Drawbacks
2823
2823
2824
-
* The biggest drawback to using |html5lib|_ is that it is slow as
2824
+
- The biggest drawback to using |html5lib|_ is that it is slow as
2825
2825
molasses. However consider the fact that many tables on the web are not
2826
2826
big enough for the parsing algorithm runtime to matter. It is more
2827
2827
likely that the bottleneck will be in the process of reading the raw
@@ -3211,7 +3211,7 @@ supports parsing such sizeable files using `lxml's iterparse`_ and `etree's iter
3211
3211
which are memory-efficient methods to iterate through an XML tree and extract specific elements and attributes.
0 commit comments