Skip to content

annotation-xml in MathML breaks the parser #258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andreyfedoseev opened this issue May 25, 2016 · 2 comments
Closed

annotation-xml in MathML breaks the parser #258

andreyfedoseev opened this issue May 25, 2016 · 2 comments

Comments

@andreyfedoseev
Copy link

Here's a piece of MathML that breaks the parser:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <semantics>
    <mrow>
      <mrow>
        <mtext>Mass of electron</mtext>
        <mo>=</mo>
        <mn>1.602</mn>
        <mspace width="0.2em" />
        <mo>&#xD7;</mo>
        <mspace width="0.2em" />
        <msup>
          <mrow>
            <mn>10</mn>
          </mrow>
          <mrow>
            <mn>&#x2212;19</mn>
          </mrow>
        </msup>
        <mspace width="0.2em" />
        <mtext>C</mtext>
        <mspace width="0.2em" />
        <mo>&#xD7;</mo>
        <mspace width="0.4em" />
        <mfrac>
          <mrow>
            <mn>1</mn>
            <mspace width="0.2em" />
            <mtext>kg</mtext>
          </mrow>
          <mrow>
            <mn>1.759</mn>
            <mspace width="0.2em" />
            <mo>&#xD7;</mo>
            <mspace width="0.2em" />
            <msup>
              <mrow>
                <mn>10</mn>
              </mrow>
              <mrow>
                <mn>11</mn>
              </mrow>
            </msup>
            <mspace width="0.2em" />
            <mtext>C</mtext>
          </mrow>
        </mfrac>
        <mspace width="0.2em" />
        <mo>=</mo>
        <mn>9.107</mn>
        <mspace width="0.2em" />
        <mo>&#xD7;</mo>
        <mspace width="0.2em" />
        <msup>
          <mrow>
            <mn>10</mn>
          </mrow>
          <mrow>
            <mn>&#x2212;31</mn>
          </mrow>
        </msup>
        <mspace width="0.2em" />
        <mtext>kg</mtext>
      </mrow>
    </mrow>
    <annotation-xml encoding="MathML-Content">
      <mrow><mtext>Mass of electron</mtext><mo>=</mo><mn>1.602</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>−19</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>C</mtext><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.4em"></mspace><mfrac><mrow><mn>1</mn><mspace width="0.2em"></mspace><mtext>kg</mtext></mrow><mrow><mn>1.759</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>11</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>C</mtext></mrow></mfrac><mspace width="0.2em"></mspace><mo>=</mo><mn>9.107</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>−31</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>kg</mtext></mrow>
    </annotation-xml>
  </semantics>
</math>

When I pass it to html5lib.parse I get the following traceback:

.../html5lib/html5parser.pyc in mainLoop(self)
    173                         (currentNodeNamespace == namespaces["mathml"] and
    174                          currentNodeName == "annotation-xml" and
--> 175                          token["name"] == "svg") or
    176                         (self.isHTMLIntegrationPoint(currentNode) and
    177                          type in (StartTagToken, CharactersToken, SpaceCharactersToken))):
@gsnedders
Copy link
Member

I presume it's a space character following it that breaks it.

@gsnedders gsnedders added this to the 0.99999999 milestone May 25, 2016
@gsnedders
Copy link
Member

gsnedders commented May 29, 2016

Yeah, <math><annotation-xml> </annotation-xml> is enough to reproduce it.

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 29, 2016
gsnedders added a commit to gsnedders/html5lib-python that referenced this issue Jul 6, 2016
gsnedders added a commit to gsnedders/html5lib-python that referenced this issue Jul 6, 2016
gsnedders added a commit that referenced this issue Jul 6, 2016
Fix #258: annotation-xml branch didn't check tag type
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants