Skip to content

Feature filter goto symbol table [blocks: #2649] #2680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mmuesly
Copy link
Contributor

@mmuesly mmuesly commented Aug 3, 2018

I had the desire to filter the output of a goto symbol table for some certain flags.
This simplifies the process, to determine which symbols are a good choice to pass to the memory-analyzer in #2649.

This PR allows, to use --filter-symbol-table together with --list-symbol-table or --show-symbol-table to filter the printed symbols by flags.

See the content of the HELP_SHOW_SYMBOL_TABLE macro for more context on allowed flags.

mmuesly added 2 commits August 3, 2018 13:23
I had the need, to filter the output of the symbol table for
a certain type of symbols. So, this commit adds an additional flag for goto-instrument
and cbmc to filter the output of the show-symbol-table and list-symbol-table command.
Further, this commit moves the commandflag defintion for show_symbol_table into its own macro.
Then this new macro is included into cbmc_parse_options and goto_instrument_parse_options.
The same is done for the help output.

This commit keeps the --list-symbols flag for backwards compatibility, but also
added --list-symbol-table with the same behavior, to align flag names with
other show/list functions.
@smowton
Copy link
Contributor

smowton commented Aug 4, 2018

Thinking it might be better to do this by using the json-ui and filtering in a script, rather than implement a filtering mini language that will inevitably be less than complete?

@martin-cs
Copy link
Collaborator

Yeah... I'm kind of with @smowton on this one... I'm not sure what value this gives above and beyond filtering things externally.

@mmuesly
Copy link
Contributor Author

mmuesly commented Aug 6, 2018

Well as said, I just want to filter the printed symbols. CBMC already loads an internal representation of the symbols, so if I filter the internal representation before printing, I still keep all the nice output formatting and stuff existing, while I don't have to reload the transformed plaintext or JSON output into a new data model, do my filtering externally and then write formatting code for results. (Yes, I am aware that this is done with a few lines of python code, if using a JSON representation).

Here is the story, I think that evolves, if I do this in external scripts:
For sure, I am able to write grep scripts or any other external filter. Next, my colleague needs to do the same thing, so he writes his external scripts or a derivate of my scripts and soon, we all spend considerable amounts of time writing external scripts and maintaining them, whenever someone changes the output format, instead of using a flag, built-in to CBMC that is updated when output is changed.

Regarding the completeness of the filter language:
Currently, it matches all flags owned by symbolt. It is just inevitably incomplete, if you add flags without caring for the output, but that is a process problem, not a domain problem in my opinion. Anyhow, if a filter option is missing, I would prefer changing probably a few lines of code in an existing approach, than writing one from scratch.

Apart from that, I haven't seen json-ui support for symbol tables, so I would have to write that first and that is not going to happen.
Adding the suggested change set, is the fastest and most reliable way to solve the problem from my point of view in a reusable way. Probably it is still less than perfect, but acceptable from my point of view for basic filter needs.

It is up to you, to scope out the user experience you want to provide for CBMC.
I am happy to address the clang format and lint issues occurring during CI checks, once there is a decision, whether this PR will be merged or not.

@smowton
Copy link
Contributor

smowton commented Aug 7, 2018

If I make a PR later this week for json-ui to do something sensible with the symbol table would that help?

@martin-cs
Copy link
Collaborator

@mmuesly I see your issue and as someone who has written exactly the kind of scripts you're talking about (and then had to maintain them when someone changed the output!) I am sympathetic to your problem. The idea of the XML output (and later, the JSON output) was to give a low-pain, more reliable route for interfacing other tools by giving structured output and so skipping the parsing step. IMHO the text output is really for humans and at "human scale" ( @smowton @tautschnig and others may well disagree with this ). So I'm a little reluctant to support patches that do significant "text UI" work.

Would having example / sample scripts in contrib/ improve things? @smowton : opinions?

@smowton
Copy link
Contributor

smowton commented Aug 7, 2018

That sounds ideal -- proper (tm) JSON / XML / your-serialization-format-here output, and ship (and TEST / CI) some example scripts

@mmuesly
Copy link
Contributor Author

mmuesly commented Aug 7, 2018

As long as there is the filter script somewhere shipped in the code base and still a CI test in place, I am happy to accept that instead.

That said, I still don't see the value for doing this in the post ui processing, if I might filter the central data structure before converting it to a proper ui. This is not really a ui dependent functionality. I just want to cut down the data stream, that ends up in the output.

It just effects the text ui right now, because there is missing ui support for other uis in this functionality in general right now. I would have build this in a cross functional ui filter otherwise following the DRY principal. That said, I still see room in this PR for code reduction and might offer to invest more time into reducing the code duplication in check_commandline_for_show_symbol_flags.

So stepping back from the ui output format discussion, the question might be reduced to:

Do we want one filter in place filtering all uis and anyone can easily use his favorite ui,
or do we want to write a post process filter for every ui?

I would choose option one, but if you want to maintain option two, I am happy with that. As long as I have one out of the box working script for filtering at hand, whenever I need it, my requirements are met.

@smowton
Copy link
Contributor

smowton commented Aug 7, 2018

For me the big difference is if we commit the filter that's set up for the particular needs of your workflow, then some second user comes along and says "I want a slightly different filter," then (a) we have to advise them to edit the cbmc source code to achieve that, which may be difficult if they're not a C++ programmer / are using a binary distribution / want to do something that's a pain in C++ but easy in languages oriented towards string processing, and (b) if we accept their filter too, we end up either committing a cornucopia of different rather specific filters, or implementing what amounts to a query language. It seems to me if a user can throw 5 lines of Python or a jq one-liner together instead to implement their particular filtering needs then we have a much friendlier and more general solution. I don't care what the interchange format is between cbmc and Python, but of the two we have widest support for now (JSON and XML), JSON seems nicest.

@mmuesly
Copy link
Contributor Author

mmuesly commented Aug 7, 2018

Thanks for the explanation. I can follow your argumentation.

But there are two further points I want to make:
First, I am currently trying to contribute my workflow as an extension to the cbmc code base, and you cannot feed non symbol variables to the memory-analyzer at the moment. There are people in favor for merging the gdb interconection.
So user will need some assistance to figure out, what they want to feed in. We might discuss, that this filter should go into the memory-analyzer rather than into cbmc and goto-instrument.

Second, if your main concern is about different people suggesting different filters and you really see that a realistic scenario, why don't you create an api, that allows to load custom binary filter before outputting the goto program? That would be a really nice thing.
There is a lot of structural information in the goto-modelt and symbolt, that I have to reconstruct expensively for more complex filtering, but get close to free, when writing a C++ filter.
I tried to write the memory-analyzer as an independent tool on top of filters, instead of intermingle it with the goto-modelt, but it is much easier to intermingle it as I can use the full support of the goto-modelt data structure. And it was the same experience writing this filter.

@tautschnig tautschnig self-assigned this Nov 10, 2018
@tautschnig tautschnig changed the title Feature filter goto symbol table Feature filter goto symbol table [blocks: #2649] Nov 10, 2018
@tautschnig tautschnig changed the title Feature filter goto symbol table [blocks: #2649] Feature filter goto symbol table [depends-on: #4261, blocks: #2649] Feb 25, 2019
@tautschnig tautschnig changed the title Feature filter goto symbol table [depends-on: #4261, blocks: #2649] Feature filter goto symbol table [blocks: #2649] Nov 9, 2022
@TGWDB
Copy link
Contributor

TGWDB commented May 3, 2023

Closing due to age (no further comment on PR content), please reopen with rebase on develop if you intent to continue this work.

@TGWDB TGWDB closed this May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants