@@ -938,18 +938,241 @@ Reading HTML Content
938
938
939
939
.. versionadded :: 0.11.1
940
940
941
- The toplevel :func: `~pandas.io.parsers .read_html ` function can accept an HTML
941
+ The toplevel :func: `~pandas.io.html .read_html ` function can accept an HTML
942
942
string/file/url and will parse HTML tables into list of pandas DataFrames.
943
+ Let's look at a few examples.
944
+
945
+ Read a URL with no options
946
+
947
+ .. ipython :: python
948
+
949
+ url = ' http://www.fdic.gov/bank/individual/failed/banklist.html'
950
+ dfs = read_html(url)
951
+ dfs
952
+
953
+ .. note ::
954
+
955
+ ``read_html `` returns a ``list `` of ``DataFrame `` objects, even if there is
956
+ only a single table contained in the HTML content
957
+
958
+ Read a URL and match a table that contains specific text
959
+
960
+ .. ipython :: python
961
+
962
+ match = ' Metcalf Bank'
963
+ df_list = read_html(url, match = match)
964
+ len (dfs)
965
+ dfs[0 ]
966
+
967
+ Specify a header row (by default ``<th> `` elements are used to form the column
968
+ index); if specified, the header row is taken from the data minus the parsed
969
+ header elements (``<th> `` elements).
970
+
971
+ .. ipython :: python
972
+
973
+ dfs = read_html(url, header = 0 )
974
+ len (dfs)
975
+ dfs[0 ]
976
+
977
+ Specify an index column
978
+
979
+ .. ipython :: python
980
+
981
+ dfs = read_html(url, index_col = 0 )
982
+ len (dfs)
983
+ dfs[0 ]
984
+ dfs[0 ].index.name
985
+
986
+ Specify a number of rows to skip
987
+
988
+ .. ipython :: python
989
+
990
+ dfs = read_html(url, skiprows = 0 )
991
+ len (dfs)
992
+ dfs[0 ]
993
+
994
+ Specify a number of rows to skip using a list (``xrange `` (Python 2 only) works
995
+ as well)
996
+
997
+ .. ipython :: python
998
+
999
+ dfs = read_html(url, skiprows = range (2 ))
1000
+ len (dfs)
1001
+ dfs[0 ]
1002
+
1003
+ Don't infer numeric and date types
1004
+
1005
+ .. ipython :: python
1006
+
1007
+ dfs = read_html(url, infer_types = False )
1008
+ len (dfs)
1009
+ dfs[0 ]
1010
+
1011
+ Specify an HTML attribute
1012
+
1013
+ .. ipython :: python
1014
+
1015
+ dfs = read_html(url)
1016
+ len (dfs)
1017
+ dfs[0 ]
1018
+
1019
+ Use some combination of the above
1020
+
1021
+ .. ipython :: python
1022
+
1023
+ dfs = read_html(url, match = ' Metcalf Bank' , index_col = 0 )
1024
+ len (dfs)
1025
+ dfs[0 ]
1026
+
1027
+ Read in pandas ``to_html `` output (with some loss of floating point precision)
1028
+
1029
+ .. ipython :: python
1030
+
1031
+ df = DataFrame(randn(2 , 2 ))
1032
+ s = df.to_html(float_format = ' {0:.40g } ' .format)
1033
+ dfin = read_html(s, index_col = 0 )
1034
+ df
1035
+ dfin[0 ]
1036
+ df.index
1037
+ df.columns
1038
+ dfin[0 ].index
1039
+ dfin[0 ].columns
1040
+ np.allclose(df, dfin[0 ])
943
1041
944
1042
945
1043
Writing to HTML files
946
1044
~~~~~~~~~~~~~~~~~~~~~~
947
1045
948
1046
.. _io.html :
949
1047
950
- DataFrame object has an instance method ``to_html `` which renders the contents
951
- of the DataFrame as an html table. The function arguments are as in the method
952
- ``to_string `` described above.
1048
+ ``DataFrame `` objects have an instance method ``to_html `` which renders the
1049
+ contents of the ``DataFrame `` as an HTML table. The function arguments are as
1050
+ in the method ``to_string `` described above.
1051
+
1052
+ .. note ::
1053
+
1054
+ Not all of the possible options for ``DataFrame.to_html `` are shown here for
1055
+ brevity's sake. See :func: `~pandas.DataFrame.to_html ` for the full set of
1056
+ options.
1057
+
1058
+ .. ipython :: python
1059
+ :suppress:
1060
+
1061
+ def write_html (df , filename , * args , ** kwargs ):
1062
+ static = os.path.abspath(os.path.join(' source' , ' _static' ))
1063
+ with open (os.path.join(static, filename + ' .html' ), ' w' ) as f:
1064
+ df.to_html(f, * args, ** kwargs)
1065
+
1066
+ .. ipython :: python
1067
+
1068
+ df = DataFrame(randn(2 , 2 ))
1069
+ df
1070
+ print df.to_html() # raw html
1071
+
1072
+ .. ipython :: python
1073
+ :suppress:
1074
+
1075
+ write_html(df, ' basic' )
1076
+
1077
+ HTML:
1078
+
1079
+ .. raw :: html
1080
+ :file: _static/basic.html
1081
+
1082
+ The ``columns `` argument will limit the columns shown
1083
+
1084
+ .. ipython :: python
1085
+
1086
+ print df.to_html(columns = [0 ])
1087
+
1088
+ .. ipython :: python
1089
+ :suppress:
1090
+
1091
+ write_html(df, ' columns' , columns = [0 ])
1092
+
1093
+ HTML:
1094
+
1095
+ .. raw :: html
1096
+ :file: _static/columns.html
1097
+
1098
+ ``float_format `` takes a Python callable to control the precision of floating
1099
+ point values
1100
+
1101
+ .. ipython :: python
1102
+
1103
+ print df.to_html(float_format = ' {0:.10f } ' .format)
1104
+
1105
+ .. ipython :: python
1106
+ :suppress:
1107
+
1108
+ write_html(df, ' float_format' , float_format = ' {0:.10f } ' .format)
1109
+
1110
+ HTML:
1111
+
1112
+ .. raw :: html
1113
+ :file: _static/float_format.html
1114
+
1115
+ ``bold_rows `` will make the row labels bold by default, but you can turn that
1116
+ off
1117
+
1118
+ .. ipython :: python
1119
+
1120
+ print df.to_html(bold_rows = False )
1121
+
1122
+ .. ipython :: python
1123
+ :suppress:
1124
+
1125
+ write_html(df, ' nobold' , bold_rows = False )
1126
+
1127
+ .. raw :: html
1128
+ :file: _static/nobold.html
1129
+
1130
+ The ``classes `` argument provides the ability to give the resulting HTML
1131
+ table CSS classes. Note that these classes are *appended * to the existing
1132
+ ``'dataframe' `` class.
1133
+
1134
+ .. ipython :: python
1135
+
1136
+ print df.to_html(classes = [' awesome_table_class' , ' even_more_awesome_class' ])
1137
+
1138
+ Finally, the ``escape `` argument allows you to control whether the
1139
+ "<", ">" and "&" characters escaped in the resulting HTML (by default it is
1140
+ ``True ``). So to get the HTML without escaped characters pass ``escape=False ``
1141
+
1142
+ .. ipython :: python
1143
+
1144
+ df = DataFrame({' a' : list (' &<>' ), ' b' : randn(3 )})
1145
+
1146
+
1147
+ .. ipython :: python
1148
+ :suppress:
1149
+
1150
+ write_html(df, ' escape' )
1151
+ write_html(df, ' noescape' , escape = False )
1152
+
1153
+ Escaped:
1154
+
1155
+ .. ipython :: python
1156
+
1157
+ print df.to_html()
1158
+
1159
+ .. raw :: html
1160
+ :file: _static/escape.html
1161
+
1162
+ Not escaped:
1163
+
1164
+ .. ipython :: python
1165
+
1166
+ print df.to_html(escape = False )
1167
+
1168
+ .. raw :: html
1169
+ :file: _static/noescape.html
1170
+
1171
+ .. note ::
1172
+
1173
+ Some browsers may not show a difference in the rendering of the previous two
1174
+ HTML tables.
1175
+
953
1176
954
1177
Clipboard
955
1178
---------
0 commit comments