@@ -2,39 +2,38 @@ html5lib
2
2
========
3
3
4
4
html5lib is a pure-python library for parsing HTML. It is designed to
5
- conform to the HTML specification, which has formalized the error
6
- handling algorithms of legacy web browsers, and is now implemented by
7
- all major web browsers.
5
+ conform to the HTML specification, as is implemented by all major web
6
+ browsers.
8
7
9
8
10
9
Requirements
11
10
------------
12
11
13
- Python 2.6 and above (including 3) are supported. Implementations
14
- known to work are CPython (as the reference implementation) and
15
- PyPy. Jython is known *not * to work due to various bugs in its
16
- implementation of the language. Others such as IronPython may or may
17
- not work; if you wish to try, you are strongly recommended to run the
18
- testsuite and report back!
12
+ Python 2.6 and above as well as Python 3.0 and above are
13
+ supported. Implementations known to work are CPython (as the reference
14
+ implementation) and PyPy. Jython is known *not * to work due to various
15
+ bugs in its implementation of the language. Others such as IronPython
16
+ may or may not work; if you wish to try, you are strongly encouraged
17
+ to run the testsuite and report back!
19
18
20
19
The only required library dependency is ``six ``, this can be found
21
20
packaged in PyPi.
22
21
23
22
Optionally:
24
23
25
24
- ``datrie `` can be used to improve parsing performance (though in
26
- almost all cases the improvement is trivial );
25
+ almost all cases the improvement is marginal );
27
26
28
27
- ``lxml `` is supported as a tree format (for both building and
29
28
walking) under CPython (but *not * PyPy where it is known to cause
30
29
segfaults);
31
30
32
31
- ``genshi `` has a treewalker (but not builder); and
33
32
34
- - ``chardet `` (note currently this is only packaged on PyPi for
33
+ - ``chardet `` can be used as a fallback when character encoding cannot
34
+ be determined (note currently this is only packaged on PyPi for
35
35
Python 2, though several package managers include unofficial ports
36
- to Python 3) can be used as a fallback when character encoding
37
- cannot be determined.
36
+ to Python 3).
38
37
39
38
40
39
Installation
@@ -72,15 +71,15 @@ Please report any bugs on the `issue tracker
72
71
Tests
73
72
-----
74
73
75
- These are nowadays contained in the html5lib-tests repository and
76
- included as a submodule, thus for git checkouts they must be
77
- initialized (for release tarballs this is unneeded)::
74
+ These are contained in the html5lib-tests repository and included as a
75
+ submodule, thus for git checkouts they must be initialized (for
76
+ release tarballs this is unneeded)::
78
77
79
78
$ git submodule init
80
79
$ git submodule update
81
80
82
- And then they can be run once ``nose `` has been installed with
83
- ``nosetests ``. All should pass.
81
+ And then they can be run, with ``nose `` installed, using the
82
+ ``nosetests `` command in the root directory . All should pass.
84
83
85
84
86
85
Contributing
0 commit comments