Skip to content

[2.3.x] CI: skip lxml encode test on Windows #60238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

jorisvandenbossche
Copy link
Member

The Windows CI has been failing for a couple of days on the 2.3.x branch, with one failing html test test_encode when using the lxml backend:

 ================================== FAILURES ===================================
_______________ TestReadHtml.test_encode[letz_latin1.html-lxml] _______________
[gw0] win32 -- Python 3.12.7 C:\Users\runneradmin\micromamba\envs\test\python.exe

self = <pandas.tests.io.test_html.TestReadHtml object at 0x0000022907CCDC30>
html_encoding_file = 'D:\\a\\pandas\\pandas\\pandas\\tests\\io\\data\\html_encoding\\letz_latin1.html'
flavor_read_html = functools.partial(<function read_html at 0x00000228FA7CD6D0>, flavor='lxml')

    @pytest.mark.filterwarnings(
        "ignore:You provided Unicode markup but also provided a value for "
        "from_encoding.*:UserWarning"
    )
    def test_encode(self, html_encoding_file, flavor_read_html):
        base_path = os.path.basename(html_encoding_file)
        root = os.path.splitext(base_path)[0]
        _, encoding = root.split("_")
    
        try:
            with open(html_encoding_file, "rb") as fobj:
>               from_string = flavor_read_html(
                    fobj.read(), encoding=encoding, index_col=0
                ).pop()

pandas\tests\io\test_html.py:1395: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\io\html.py:1240: in read_html
    return _parse(
pandas\io\html.py:983: in _parse
    tables = p.parse_tables()
pandas\io\html.py:249: in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
pandas\io\html.py:791: in _build_doc
    r = parse(self.io, parser=parser)
C:\Users\runneradmin\micromamba\envs\test\Lib\site-packages\lxml\html\__init__.py:914: in parse
    return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
src\\lxml\\etree.pyx:3589: in lxml.etree.parse
    ???
src\\lxml\\parser.pxi:1958: in lxml.etree._parseDocument
    ???
src\\lxml\\parser.pxi:1984: in lxml.etree._parseDocumentFromURL
    ???
src\\lxml\\parser.pxi:1887: in lxml.etree._parseDocFromFile
    ???
src\\lxml\\parser.pxi:1200: in lxml.etree._BaseParser._parseDocFromFile
    ???
src\\lxml\\parser.pxi:633: in lxml.etree._ParserContext._handleParseResultDoc
    ???
src\\lxml\\parser.pxi:743: in lxml.etree._handleParseResult
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E     File "<string>", line 0
E   lxml.etree.XMLSyntaxError: unknown error

src\\lxml\\parser.pxi:672: XMLSyntaxError

The test is already allowed to fail on Windows when the file uses UTF-16 or UTF-32 encoding, but so now it fails with latin1 encoding as well.

Checking a working vs failing build on 2.3.x, I do notice that libxml2 was updated from 2.12.7 to 2.13.4 when it started failing. But on the main branch, it is also using this latest version and there it is not failing ..

So not directly an idea what is going on, and cannot reproduce this locally. Given it is working on main, and for all other platforms (and we already skip certain encodings as well), I am inclined to just skip the test for Windows on 2.3.x.

cc @mroeschke in case you have seen something similar on main (maybe something we forgot to backport?)

@jorisvandenbossche jorisvandenbossche added the CI Continuous Integration label Nov 8, 2024
@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Nov 8, 2024
@jorisvandenbossche jorisvandenbossche changed the title CI: skip lxml encode test on Windows [2.3.x] CI: skip lxml encode test on Windows Nov 8, 2024
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I haven't seen this failure on main; only on 2.3.x as you mentioned.

@mroeschke mroeschke merged commit 9465bf1 into pandas-dev:2.3.x Nov 8, 2024
63 of 64 checks passed
@jorisvandenbossche jorisvandenbossche deleted the ci-windows branch November 8, 2024 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants