-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
IndexError using converters in read_html #14624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like this example is no longer reproducible. Happy to reopen this issue if we can get a reproducible example.
|
changing
makes it work again. |
This is a minimal reproduction: import pandas as pd
pd.read_html(
"""
<table>
<tbody>
<tr>
<td>Foo</td>
</tr>
</tbody>
</table>
""",
converters={1: lambda x: x},
) The backtrace:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
read_html
returns a list of DF. Giving aconverters
parameter (see #13461) applies the converters on each DF. Keys of the converters, when being integers, can not be greater than the number of columns minus 1 of the parsed DF (otherwise it raises an IndexError exception in io.parser.PythonParser._convert_data ).But most of the time, DFs returned by
read_html
are of different sizes. Thus converters are unusable on all columns of index greatermin([len(df.column) for df in pd.read_html(url)])
Example
Expected Output
As in
This issue may also prevent to change
read_html
as proposed in #14608Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
pandas: 0.19.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 1.0b8
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: