-
Notifications
You must be signed in to change notification settings - Fork 1.2k
XMLParser fixes (SR-13546, SR-2301) #2920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These whitespaces were reported as normal character content.
foundElementDeclaration delegate calls were not set.
One case of reentrance is reading an external entity. This was not working so this change implements that, too. The CFXMLInterface callback's parameters mostly missed nullability info. This lead to unexpected crashes in optimised builds even when the parameter was correctly checked for nil.
saxHandler->processingInstruction = (processingInstructionSAXFunc)__CFSwiftXMLParserBridge.processingInstruction; | ||
saxHandler->comment = (commentSAXFunc)__CFSwiftXMLParserBridge.comment; | ||
// saxHandler->warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to clean up these commented lines or add more context around them please? When the final code is reviewed without any context of this PR, it won't be clear why they are commented out and what the implications of that are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do so.
I sorted the fields as they are in _xmlSAXHandler and added all unused fields as comments to make it more obvious what is (not) used.
Do you think it is better to remove the unused fields or to add a short explanation above the block with “currently not used by CFXMLInterface” at these fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@millenomi WDYT? What kind of clarification in code comments would you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed a minor update that adds some comments.
Comment makes clear why some libxml callbacks are commented out.
@swift-ci test |
Can you trigger it once again? |
@swift-ci please test |
I’m about to give up on this. 😟 |
@swift-ci please test |
Does Build #2526 count for macOS? If so, both platforms have a successful build now. |
@swift-ci please test |
“Swift Test macOS Platform” finished a build yesterday, another try? |
@swift-ci please test macOS platform |
I ran the two testcases on Big Sur using
I think for the first test re-entrancy will need to be blocked to match Darwin. I dont think there is much that can be done with the 2nd one and probably underlies libxml2 differences. It might be worth at some point building swift-corelibs-foundation against the version that Darwin uses (https://opensource.apple.com/source/libxml2/libxml2-34.9/) and seeing if it fixes any of these minor bugs (and in other tests that fail on Darwin as well) |
You are right, the implementation does not mimic Apple/Darwin’s implementation in all aspects. I do not see how XMLParser could support (external) entities without being re-entrance-tolerant. Maybe that’s the reason Apple’s implementation does not support entities? I consider entity handling in NSXMLParser broken: let xml = """
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE root [
<!ENTITY internalEntity "Internal">
<!ELEMENT root ANY >
]>
<root><foo>&internalEntity;</foo></root>
"""
class MyXMLParserDelegate: NSObject, XMLParserDelegate {
func parser(_ parser: XMLParser, foundCharacters string: String) {
print("characters: \(string)")
}
}
let xmlData = xml.data(using: .utf8)!
let parser = XMLParser(data: xmlData)
let del = MyXMLParserDelegate()
parser.delegate = del
_ = parser.parse() Never calls
The difference in One could argue if calling What do you think about the following aproach: If the delegate implements |
Any change this gets merged? Regarding the conflict: Am I supposed to keep the pull request up to date? How is this typically handled? That WASI-changeset seems rather unfinished:
|
The WASI API is partial on purpose, this platform doesn't support streams, multithreading, and file access. As for merging, I assume this needs an approval from @millenomi. |
Closing older PRs after re-core of swift-corelibs-foundation on swift-foundation (#5001). |
Are the underlying bugs fixed in the new implementation? The code does not look any different. I even mentioned that the Darwin implementation is affected, too. |
Can we rebase this on top of the new main so the conflicts are resolved? |
I locally rebased, it was not a big thing. But unfortunately, since even main fails to compile on my machine, I cannot push it due to my “do not release untested code policy”:
It’s probably not a big thing but honestly, I am no longer willing to burn any more of my precious time. Apple has known about these bugs for several years, I even provided a fix for free, making sure it can be merged without conflict for quite a while. Fixing (these) bugs obviously has no priority at Apple. Maybe it would be better to just deprecate or remove XMLParser. IMHO, at its current state, no-one should use it. It will fail for valid input. |
XMLParser on linux was not reentrance safe.
While fixing this, I found that XMLParser also never calls several other delegate methods. (see SR-2301)
Still unfixed: Internal and unparsed external entities might not work as XML expects them to work. Unfortunately I could not find any reference (asides from the XML spec) how entities are actually expected to work.
macOS’s NSXMLParser does not seem to support them at all.