Skip to content

Missing end-of-file line feed duplicates paragraph #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gorango opened this issue Aug 15, 2019 · 7 comments · Fixed by #5
Closed

Missing end-of-file line feed duplicates paragraph #4

gorango opened this issue Aug 15, 2019 · 7 comments · Fixed by #5
Labels
🗄 area/interface This affects the public interface 💪 phase/solved Post is done 👶 semver/patch This is a backwards-compatible fix 🐛 type/bug This is a problem

Comments

@gorango
Copy link

gorango commented Aug 15, 2019

Changing line 147 in test/index.js to "minify" the incoming html with the following line:

var file = vfile(fs.readFileSync(input, 'utf8').replace(/\n*^\s*/gm, ''))

This particular change results in 14 failures (I recognize that a proper minification might avoid a few of these failures).

I wonder if this issue should be better addressed in the rehype-parse package?

@gorango gorango added 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Aug 15, 2019
@ChristianMurphy
Copy link
Member

@gorango can you provide an example of minified HTML that is causing an issue?

@ChristianMurphy ChristianMurphy added 🙉 open/needs-info This needs some more info 🐛 type/bug This is a problem and removed 🐛 type/bug This is a problem 🙉 open/needs-info This needs some more info labels Aug 15, 2019
@wooorm
Copy link
Member

wooorm commented Aug 15, 2019

The regex /\n*^\s*/gm replaces zero or more line feeds, followed by a line feed (important), followed by zero or more white space, with an empty string.
That means that any line feed is removed.

HTML collapses whitespace. That means that a line feed is seen as a single space character.
There’s a difference between nothing and a space.

Sounds like you‘re falling for an XY problem. As Christian mentioned. We need more info.

@gorango
Copy link
Author

gorango commented Aug 15, 2019

Agreed 🤦‍♂️. I also didn't take into account the position attributes and WhiteSpaceNodes that are affected by the minification. But just before returning to close this issue, I did spot a small anomaly...

Running rm test/fixtures/**/*.json && node test to generate new output files, I noticed the following results changed unexpectedly:

  • test/fixtures/overlap-explicit
  • test/fixtures/source

Both outputs contained clones of the last sentence in each.

Adding any space at the end of the vfile.contents prevents the duplicates.

@gorango
Copy link
Author

gorango commented Aug 15, 2019

To replicate this, you can replace test/fixtures/source/index.html, with:

<p>This is <span data-nlcst="source">marked</span>.</p><p data-nlcst="source">Completely marked.</p>

You should get similar results if you minify the overlap-explicit/index.html file.

@wooorm
Copy link
Member

wooorm commented Aug 16, 2019

@gorango I tested these two out and:

  • rm test/fixtures/**/*.json && node test did not produce different output.json files for me.
  • Replacing test/fixtures/source/index.html with that document, removing its output.json file, and running node test, only changes positional information for me.

Removing the EOF EOL (the last \n before the end of the file) in the source fixture did indeed duplicate the last paragraph.

@wooorm wooorm changed the title Unexpected handling of minified HTML Missing end-of-file line feed duplicates paragraph Aug 16, 2019
@wooorm wooorm added 👶 semver/patch This is a backwards-compatible fix 🗄 area/interface This affects the public interface 🙆 yes/confirmed This is confirmed and ready to be worked on and removed 🙉 open/needs-info This needs some more info labels Aug 16, 2019
wooorm added a commit that referenced this issue Aug 16, 2019
@gorango
Copy link
Author

gorango commented Aug 16, 2019

To clarify, I was running node test with my initial minification hack but I'm glad you were able to reproduce the behaviour.

Duplicates in overlap-explicit could be reproduced just by collapsing the closing root tag:

<section>
 <h1>My Cats</h1>
 <p>You can play with my cat simulator.</p>
 <object data="cats.sim">
  <p>To see the cat simulator, use one of the following links:</p>
  <ul>
   <li><a href="cats.sim">Download simulator file</a>
   <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a>
  </ul>
  <p>Alternatively, upgrade to the Mellblom Browser.</p>
 </object>
 <p>I'm quite proud of it.</p></section>

I can confirm that the latest PR handles this case as well.

@wooorm wooorm closed this as completed in #5 Aug 16, 2019
wooorm added a commit that referenced this issue Aug 16, 2019
Closes GH-4.
Closes GH-5.

Reviewed-by: Christian Murphy <[email protected]>
@wooorm wooorm added ⛵️ status/released and removed 🙆 yes/confirmed This is confirmed and ready to be worked on labels Aug 16, 2019
@wooorm
Copy link
Member

wooorm commented Aug 16, 2019

Thanks for reporting @gorango, it’s all fixed and released now!

In the future, please do not remove the issue template, instead, but instead take the time to fill it out correctly. It’s there for a reason!

@wooorm wooorm added the 💪 phase/solved Post is done label Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗄 area/interface This affects the public interface 💪 phase/solved Post is done 👶 semver/patch This is a backwards-compatible fix 🐛 type/bug This is a problem
Development

Successfully merging a pull request may close this issue.

3 participants