-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
PDF not showing some traditional Chinese characters #6319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @iruletheworld for reporting this.
What font are you using locally? Please, share any information that you may consider useful to debug this. I open the PDF and I can't really differentiate one character from another. It's really hard to me to notice if there is a character not rendering properly. That said, you may want to point us to the specific page, line and column to mention a wrong character and which one should be placed there instead. |
Hi @humitos , thanks for replying. I dig into the In here, I would:
Detailed ComparisonsSome examples of missing characters are shown in the screenshots below. The missing characters are rendered as "F" in squares.
Excepted:
Expected: I uploaded a local PDF build for your reference: MEWs for reproducing this issue locallyBasically, you need to force the MWE using "錄", "換" and "註" are missing.
You can use
Suspected CauseI think this is because of the This table basically says, when using The two MWEs above try to force the I think this may be something to deal the implementation of Possible SolutionI think there may be two solutions.
Progress To DateI tried method 1 by using
Summary So Far
|
Hi @iruletheworld! Thanks a lot for your amazing report on this issue. Did you change something in your repository regarding these settings? I found that the live PDF does not have the "F" in squares anymore in the title: https://buildmedia.readthedocs.org/media/pdf/latex2img-zh-cn/latest/latex2img-zh-cn.pdf I'd like if you can create a branch in your repo with the correct settings that should produce the expected results (I suppose they should be equal as the one that you use to build the PDF locally). That way, I'd be able to trigger some builds on RTD and debug it more efficiently. |
Hi @humitos , about the "F", I think it is to do with the PDF reader. I've tried Adobe, and it wouldn't display the "F" (Chrome also wouldn't show). But Foxit reader would do that. To make it more clear, I screenshoted using Foxit. I've done another two things with the repo.
At the moment I am trying to get a virtual machine setup for Ubuntu with texlive to test the local PDF build with |
An (unperfect) SolutionOk, after much trial and error, I've found an acceptable solution, only to zh-hant (TW) and neither zh-hant (HK) nor zh-hans (therefore unperfect). I now believe this is a font problem on the server since changing to Debian available fonts does work (to an extend). The available fonts I found are here. If you go into the repo and use tag
Root Cause of the ProblemI believe the root cause of the problem is the Why
|
Would installing this package allow to use this font and render all the characters properly? This package does exist in Ubuntu 18.04 (bionic) --which is the one that we use in production: https://packages.ubuntu.com/bionic/fonts/fonts-wqy-zenhei |
The short answer is No. I have made a test repo. You can look into it for details. I would only state the results and conclusions here. When using Note that a missing character in this post is referred to and is rendered as a "tofu" (as Google calls it, basically a rectangular). This picture is a comparison between HK TC and TW TC in serifYou can see that the 2nd char and the 5th to last char of the HK TC are rendered as tofu. The followings are examples of Chinese italic (KaiTi)The 2nd char is a tofu. The 4th and 5th chars are tofu. The last char is a tofu. Conclusions
By the way, if the AR family used by |
I'm impressed with all your analysis, thanks! I still want to know if there is something that Read the Docs can do to help here and have a fully working PDF with all the Chinese characters (HK, TW and simplified). I understood that you can build the PDF in a perfect way in your local computer, so why we can't on RTD?
I understand that this seems the preferred way to suggest to our users, is that correct? At least it will have all the characters on their places and the PDF will build completely.
Does this exist? If so, we can install it in our server and make your PDF happy 😄
Would you feel comfortable to make this changes by yourself and open a Pull Request? It seems that you have ton of experience here and I'm sure you will update it way better than myself. Although, if these setup is very complex or does not cover most of the cases, we may want to keep the "if the user can accept a sans only PDF" solution by default, but expand the guide with this more specific solution for these particular cases. |
Thanks for the prompt reply!
I found that there is a Ubuntu package
I think for most users, the substance comes before the style, and therefore this may be the preferred solution. Though some users may need the serif and italic for their reasons. Also, full sans is kinda valiate the typset customs for the Chinese language, but that should be much less of an issue.
Google's distribution of Adobe's
Agreed. I hope the user would at least understand a bit about the difficulty of typesetting CJK. Perhaps, it should cover not just Chinese but also Japanese and Korean, which may make it quite complicated. I think Japanese may even be more difficult to get right due to the mixture of Kanji (Chinese characters), Hiragana (consider them as lowercase phonetic syllabary) and Katakana (consider them as uppercase phonetic syllabary). Korean should be consistent, since they have made quite an effort to get rid of the Chinese language after Japanese rule. Though they do use Chinese characters in some cases, these cases are quite limited. Anyway, I do think the The thing with For PR, I would like to test a bit more, so that it could be more definitive. I also hope to find a vlid solution for all CJK and not just TC and SC (help wanted!) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Would definitely like to see Noto CJK ( Adding on to the coverage of Noto CJK fonts, its currently most uniform and aesthetically appealing open-source solution for a pan CJK (super-) font family. It would cover a decent amount of characters used in all languages, taking care of the subtle design difference in each region, and should be good enough for most common usage of the languages. For an uncommonly extensive coverage of Chinese characters (usually for rare character in names or academic research purposes), Hanazono would be a fallback choice that resides in the Debian packages repo as |
@blueset I'm happy adding those font package to our image (I've already created a PR for that). Also, what would be the default LaTeX preamble that we should include by default for Chinese language to use the most accurate font? How we test it? |
I have tested Simplified Chinese ( zh-hanslatex_elements = {
"preamble": r"""
\usepackage[AutoFallBack=true]{xeCJK}
\setCJKmainfont{Noto Serif CJK SC}[Language=Chinese Simplified, BoldFont={* Bold}, ItalicFont=AR PL KaitiM GB]
\setCJKsansfont{Noto Sans CJK SC}[Language=Chinese Simplified, BoldFont={* Bold}, ItalicFont=AR PL KaitiM GB]
\setCJKmonofont{Noto Sans CJK SC}[Language=Chinese Simplified, BoldFont={* Bold}, ItalicFont=AR PL KaitiM GB]
\setCJKfallbackfamilyfont{\CJKrmdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKsfdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKttdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
"""
} zh-hant (updated to solve # 2 below)latex_elements = {
"preamble": r"""
\usepackage[AutoFallBack=true]{xeCJK}
\setCJKmainfont{Noto Serif CJK TC}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]
\setCJKsansfont{Noto Sans CJK TC}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]
\setCJKmonofont{Noto Sans CJK TC}[Language=Chinese Traditional, BoldFont={* Bold}, ItalicFont=AR PL KaitiM Big5]
\setCJKfallbackfamilyfont{\CJKrmdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKsfdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\setCJKfallbackfamilyfont{\CJKttdefault}[AutoFakeBold]{{HanaMinA},{HanaMinB}}
\xeCJKEditPunctStyle{quanjiao}{optimize-kerning=true}
"""
} About
RTD currently doesn’t tell Hong Kong and Taiwan variants of Traditional Chinese apart, this portion would not contribute much. I would still leave it here in case anyone needs it. (updated to solve # 2 below)
|
Preambles and samples for |
@blueset THANKS, this is amazing! I've already opened a PR to install the package fonts you mentioned in a previous comment. I can't guarantee that we are going to include these preambles by default on a Read the Docs build because they will probably need a lot of testing (and I'm not an expert on this topic to can manage it) but I'd like to add them as suggestion in our current guide https://docs.readthedocs.io/en/stable/guides/pdf-non-ascii-languages.html or an appendix of it. I really appreciate the work that all of you have done in this topic and I hope we can manage in a better way all of these languages at Read the Docs 🌏 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Thank you for the effort! Including the fonts in the building environment is still better than nothing. Looking forward to |
@blueset actually, 7.0 is our current
|
@humitos The full build log is here FYI: https://readthedocs.org/api/v2/build/10643925.txt |
@blueset I left a comment in your commit 😄. See ehForwarderBot/ehForwarderBot@dbb959a#r37923286 |
Oops, I forgot to change the image name after copying. Now everything works and the PDF output looks much better with the new set of fonts. Thank you for the effort! |
@blueset wow! I'm very happy reading that 😄 --Thank you a lot for helping us debugging this issue and make Read the Docs better and improve our support with other fonts :) If anything is missing here, I'd say that we can improve our Documentation Guide mentioning how these fonts can be configured but I think we can close this issue now. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Details
https://readthedocs.org/projects/latex2img/downloads/pdf/latest/
Sorry to open this issue, but I have read a lot of the related issues and have googled a lot but still cannot get it fixed.
I am using the method in #5453 to build the Latex PDF for zh_TW. The local build is fine, all things as expected, but the remote build has some characters missing, e.g., "換", "佈" (basically a funny PDF).
Here is the setting in
conf.py
.I have tried not using
ctex
but justxeCJK
with a few different fonts but still not working.By the way, the simplified Chinese translation is all correct (I use just
xeCJK
for it). Also, the HTML is fine with either language.Expected Result
All traditional Chinese characters display correctly.
Actual Result
Some traditional Chinese characters, e.g., "換", "佈", are not displayed (the font used on the server does not have them?)
The text was updated successfully, but these errors were encountered: