Skip to content

xml rendering issue when encountering string "\x01" #7073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pennyan opened this issue Aug 16, 2022 · 10 comments
Open

xml rendering issue when encountering string "\x01" #7073

pennyan opened this issue Aug 16, 2022 · 10 comments
Assignees
Labels
aws Bugs or features of importance to AWS CBMC users Viewer Bugs and features related to CBMC Viewer

Comments

@pennyan
Copy link

pennyan commented Aug 16, 2022

CBMC version: 5.53.1
Operating system: linux
Exact command line resulting in the issue: cbmc issue.c --trace --xml-ui
What behaviour did you expect: Successfully generated xml file
What happened instead: Program aborted with an error

Problem program issue.c:

#include <stdlib.h>

int main(void) {
  const unsigned char *str = (unsigned char *)"\x00";
  assert(*str == "\x01");
  return 0;
}

Last several lines of output when running cbmc issue.c --trace --xml-ui:

--- begin invariant violation report ---
Invariant check failed
File: xml.cpp:154 function: escape_attribute
Condition: ch >= ' '
Reason: XML does not support escaping non-printable character 1
Backtrace:
cbmc() [0x6561cd]
cbmc() [0x6571b7]
cbmc() [0x56a612]
cbmc() [0x6e4f49]
cbmc() [0x6e5052]
cbmc() [0x6e50dd]
cbmc() [0x6e50dd]
cbmc() [0xcf28ed]
cbmc() [0x918641]
cbmc() [0x700058]
cbmc() [0x6ff52a]
cbmc() [0x55ddeb]
cbmc() [0x551e01]
/lib64/libc.so.6(__libc_start_main+0xea) [0x7f8ed496a13a]
cbmc() [0x55ea2a]


--- end invariant violation report ---
[1]    1659 abort      cbmc expr.c --trace --xml-ui
@pennyan
Copy link
Author

pennyan commented Aug 16, 2022

This issue might be related #7038

@pennyan pennyan changed the title xml rendering issue when encountering string "\x01" in the code xml rendering issue when encountering string "\x01" Aug 16, 2022
@jimgrundy
Copy link
Collaborator

Agree. This is likely the same issue as #7038

@jimgrundy jimgrundy added aws Bugs or features of importance to AWS CBMC users aws-medium labels Aug 17, 2022
@tautschnig tautschnig self-assigned this Sep 2, 2022
@thomasspriggs
Copy link
Contributor

This issue is related to #7038 as it is the same invariant. However the issue isn't exactly the same. The other issue relates to character codes greater than or equal to 128, which are printable and non-control code characters. This issue relates to control code characters, which would be those in the range 0-31 inclusive. The codes for \n and \r are currently omitted from the output. This avoids avoids violating the INVARIANT for them, but even this isn't ideal as there is information which is then missing from the XML.

Therefore we need to decide on a solution for printing characters in the 0-31 range. Technically these control codes are permitted in XML as I understand it. However I'd suggest that we use decimal character references in order to make the work of XML readers both human and algorithmic easier. This would encode character code 1 as &#1; for example.

@jimgrundy
Copy link
Collaborator

I would strongly endorse finding some way to continue printing XML (probably best to add in these #nnn; codings) rather than exiting ungracefully as we do now. The current experience is unpleasant for users and gives them few clues of how to move one. I would really like to see the solution you propose implemented.

@jimgrundy
Copy link
Collaborator

@thomasspriggs if you can do this do you want to self-assign?

@thomasspriggs
Copy link
Contributor

@thomasspriggs if you can do this do you want to self-assign?

I can work out how to make the changes needed to raise PR a for this kind of change. However I'd like to discuss the prioritisation for this piece of work with the rest of my team before I commit to it.

@thomasspriggs thomasspriggs self-assigned this Sep 9, 2022
@thomasspriggs
Copy link
Contributor

@jimgrundy I should have time to work on this next week.

@tautschnig tautschnig removed their assignment Oct 9, 2022
@jimgrundy
Copy link
Collaborator

Where are we on this @thomasspriggs ?

@thomasspriggs
Copy link
Contributor

Where are we on this @thomasspriggs ?

We can't print character code '\x01' whether it is the raw character, or encoded using XML standard escaping as this is not allowed by XML standard 1.0. We could use XML escaping for character code '\x01' (but not '\x00') if we moved to XML standard version 1.1. However XML standard 1.1 has not been widely adopted despite having been released over a decade ago. My PR to move to the newer standard was rejected and closed due to the lack of adoption of the newer standard.

This brings us back to the idea of using C style escaping as it is presented in the input source code. I think that idea has legs. Unfortunately there are a combination of implementation/architectural related problems which make implementing the fix unexpectedly time-consuming -

  • The final stage of printing (from XML DOM to string) should really be language agnostic in order to aid support of languages other than C. So we should not just insert C style escaping at this late stage.
  • By the time we reach the printing stage to produce the XML DOM the problem causing character is not stored in a string expression or character expression like data structure. It is embedded into a comment data structure as part of a larger string converted expression. This means that the earlier stage in the pipeline would need to be modified to avoid putting control characters into the comment.
  • As I understand it, the C language scanning/parsing stages of CBMC unescape input strings/characters and discard the original escaped form of the input code. This means that we would either need to alter the input stage to store the original unescaped form somewhere or re-escape later.

None of the above issues are insurmountable they just lead to a surprisingly large amount of effort to be required. We shifted team priorities elsewhere due to the time involved. @NlightNFotis I believe you looked at this after I first started investigating. Do you have anything further to add?

@thomasspriggs thomasspriggs removed their assignment Nov 4, 2022
@NlightNFotis
Copy link
Contributor

@thomasspriggs I think what you've mentioned so far covers everything - I don't have anything more to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users Viewer Bugs and features related to CBMC Viewer
Projects
None yet
Development

No branches or pull requests

8 participants