Skip to content

Add recording symbol table #1446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Oct 5, 2017

This upstreams PR https://github.com/diffblue/security-scanner/pull/213, which gives more detail; already reviewed by @thk123 and @reuk over there.

@smowton smowton requested a review from mgudemann October 5, 2017 14:52
@smowton
Copy link
Contributor Author

smowton commented Oct 5, 2017

@NathanJPhillips suggest giving this the once-over

@tautschnig
Copy link
Collaborator

tautschnig commented Oct 5, 2017

Can people please please PLEASE start writing commit messages with actual contents? Thank you.

Just to add clarification: I'm sure the work done and presented in this PR is very useful and solid, but as-is this is a change impacting a lot of code without a way of understanding why it may be useful or the right thing to do.

symbol_tablet keep_symbols;
std::unordered_set<irep_idt, irep_id_hash> unreachable_symbols;
for(const auto &sym : symbol_table.symbols)
unreachable_symbols.insert(sym.first);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is done, but this change appears to be inverting the logic of book-keeping. Surely that's fine, but which of the commits/commit messages does it fit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will factor out and describe

@smowton smowton force-pushed the smowton/upstream-recording-symtab branch from 62ce860 to abac902 Compare October 5, 2017 16:12
@smowton
Copy link
Contributor Author

smowton commented Oct 5, 2017

@tautschnig updated this adding explanatory commit messages and moving the book-keeping inversion into its own commit.

@smowton
Copy link
Contributor Author

smowton commented Oct 5, 2017

Companion test-gen PR: https://github.com/diffblue/test-gen/pull/1088

@smowton smowton force-pushed the smowton/upstream-recording-symtab branch from abac902 to 57b0b63 Compare October 6, 2017 10:01
Copy link
Contributor

@mgudemann mgudemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@smowton smowton force-pushed the smowton/upstream-recording-symtab branch 2 times, most recently from 602fe34 to 87f36cb Compare October 11, 2017 14:47
@smowton smowton requested a review from reuk October 11, 2017 14:47
@smowton
Copy link
Contributor Author

smowton commented Oct 11, 2017

Added @reuk to review since this now uses his recent symtab changes

Copy link
Contributor

@reuk reuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor comments

{
if(symbolt *sym=get_writeable(id))
return *sym;
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No else after return please

{
if(const symbolt *sym=lookup(id))
return *sym;
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No else after return please


symbolt &get_writeable_ref(const irep_idt &id)
{
if(symbolt *sym=get_writeable(id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementations of lookup_ref and get_writeable_ref should share an implementation, probably using the private static template idiom.

@smowton smowton force-pushed the smowton/upstream-recording-symtab branch from 87f36cb to eb5b253 Compare October 12, 2017 15:54
@smowton smowton requested a review from LAJW October 12, 2017 15:54
@smowton
Copy link
Contributor Author

smowton commented Oct 12, 2017

Updated

Copy link
Contributor

@LAJW LAJW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough.

@smowton smowton force-pushed the smowton/upstream-recording-symtab branch from eb5b253 to 769b052 Compare October 12, 2017 16:06
@tautschnig
Copy link
Collaborator

Many thanks for providing guiding commit messages. Wouldn't it have been possible to avoid the mass-renaming? After all, a symbol_tablet is now not actually a table anymore, but instead just an interface. This would seem a lot less intrusive?

@smowton
Copy link
Contributor Author

smowton commented Oct 12, 2017

@tautschnig only the sites that instantiate one have been renamed to concrete_symbol_tablet. There are surprisingly many, the vast majority producing a blank one to satisfy some method that always takes one but won't use it in that particular call.

@smowton
Copy link
Contributor Author

smowton commented Oct 12, 2017

The only change that doesn't fit that pattern is read_object_and_link, which for the time being gets around the new interface since it needs a couple of ops we'd rather not support. We may rewrite it if we need it to support other implementations in the future.

@tautschnig
Copy link
Collaborator

Thanks a lot for the further clarification.

Now my final question remains: having understood each individual commit, I am wondering what the overall use case/rationale is? That is, what are the grounds on which a user (developer) would choose one over the other?

@smowton
Copy link
Contributor Author

smowton commented Oct 13, 2017

Desired action: after the frontend converts a function, perform post-processing for every symbol it has touched. Usually that will just be the single function in question, but sometimes it can create support functions on demand, such as singleton wrappers for Java static initialisers, or built-in String functions.

We thought that rather than make a second pass over the function being converted, or relying on the frontend making a callback for every function it touches, we should make a journalling implementation of the symbol table, such that there's no danger of reported and actual actions becoming unsynchronised.

Example usage:

concrete_symbol_tablet actual_table;
init_table(actual_table);

recording_symbol_tablet journal(actual_table); // Wraps actual_table
alter_table(journal);

for(const auto &added : journal.added())
{
  printf("%s was added\n", added.name);
}

@smowton smowton force-pushed the smowton/upstream-recording-symtab branch from 769b052 to 526386d Compare October 13, 2017 08:12
@tautschnig
Copy link
Collaborator

Desired action: [...]

Would it be possible to add this explanation as doxygen comments? No user/developer should have to guess intentions.

This is useful because in a forthcoming commit symbol_tablet will become an interface,
and therefore lose its `swap` method. In the case where the particular implementation
journals its operations this will also provide a more accurate impression of what has
taken place (many symbols were erased, cf. this table was swapped with another which
experienced many insertions)
@smowton smowton force-pushed the smowton/upstream-recording-symtab branch from 75352a3 to b4f6e18 Compare October 16, 2017 16:39
@smowton
Copy link
Contributor Author

smowton commented Oct 16, 2017

@tautschnig done (see last commit)

@tautschnig
Copy link
Collaborator

I don't know what the timeline/pressure for this work is, but I'd love to see as part of this work: 1) review of all the code location that now got a concrete_symbol_tablet -- it seems that's way too many; 2) use the recording symbol table in all those places that do incremental converts.

@smowton
Copy link
Contributor Author

smowton commented Oct 16, 2017

(1) will do tomorrow morning.
(2) I'm not sure about though -- we introduce the recording table when we're doing incremental lazy function loading (in the other PR pending your review ;)), but I don't see a point in collecting a journal that doesn't have a user.

NathanJPhillips and others added 3 commits October 16, 2017 18:13
This converts symbol_tablet into an interface and moves its current implementation
into concrete_symbol_tablet, permitting the forthcoming interface-compatible
alternative journalling implementation.
This is an implementation of symbol_tablet that wraps another symbol_tablet
and journals additions, erasures and alterations such that a user can find out
what changes have taken place since the recording_symbol_tablet was created.
@smowton smowton force-pushed the smowton/upstream-recording-symtab branch from b4f6e18 to 2c09954 Compare October 16, 2017 17:13
@smowton
Copy link
Contributor Author

smowton commented Oct 17, 2017

@tautschnig reviewed them just now. Breakdown:

  • Around 25% legitimate use in non-testing code -- for example, the goto_model, language_ui and bmc tables are "global" / "master" tables, and many of the uses in language frontends are of the form:
concrete_symbol_tablet local_symbols;
convert_function(..., local_symbols);
rename_and_link(local_symbols, global_symbol_table);

The link/rename operation could be done incrementally, but I don't think there's anything terribly wrong with the current approach.

  • About 25% are of the form:
concrete_symbol_tablet empty_table;
namespacet empty_namespace(empty_table);
// Note this only works because we know this particular expression doesn't contain symbols:
some_op(expr, empty_namespace);

Most of the time when this is sometimes sensible there are already overloads of the relevant functions that don't take a namespacet and create a blank one internally -- for example language_util.cpp has from_expr(const exprt &expr), a convenience (but dangerous!) wrapper on from_expr(const namespacet &ns, const irep_idt &identifier, const exprt &expr).

Of the 20 or so uses of this form, the only functions that could maybe gain an overload are type2java, expr2java and similar, which have 3-4 users each using the blank-namespace pattern. Even then the overloads are a bit dodgy, since they conceal a dangerous practice (using a blank namespace and hoping the expression is symbol-free), so it's arguably good to make the user think about that by forcing them to make their own fake namespace.

  • About 50% are uses in unit tests, which naturally need a root symbol table each.

Conclusion: if you like, we could remove a few instances by factoring the blank namespace pattern into expr2java et al, or converting users to use from_expr instead of directly calling expr2java at the cost of some needless languaget construction. However I think there's little to worry about here.

@tautschnig
Copy link
Collaborator

Even then the overloads are a bit dodgy, since they conceal a dangerous practice (using a blank namespace and hoping the expression is symbol-free), so it's arguably good to make the user think about that by forcing them to make their own fake namespace.

I fully agree with this being dodgy, all of this (the 25% that you estimated) should simply be removed.

@smowton
Copy link
Contributor Author

smowton commented Oct 17, 2017

I disagree-- the fudamental act output(e, blank_namespace) doesn't seem too bad if you can really justify that blank_namespace is appropriate. The worst case scenario if you get that wrong is a fatal no-such-symbol invariant, and it'll be clear why that's happening.

The alternative to blank_namespace uses such as the is_threaded implementation near the top of the diff is to either replace the blank namespace with some other implementation of the optional pattern, which I don't think changes anything, or else introduce a bunch of extra constructor parameters to route some global namespace to a callsite that (we assert) won't use them.

@smowton
Copy link
Contributor Author

smowton commented Oct 17, 2017

(alternatively perhaps I'm missing your intent? Taking src/analyses/is_threaded.cpp (first file in this PR's diff) as a working example, could you say how you want it to look instead?)

@tautschnig
Copy link
Collaborator

Here's how I'd patch it (in a slow-migration-path manner):

diff --git a/src/analyses/is_threaded.cpp b/src/analyses/is_threaded.cpp
index 1ac32d234..ce0769a63 100644
--- a/src/analyses/is_threaded.cpp
+++ b/src/analyses/is_threaded.cpp
@@ -82,12 +82,10 @@ public:
   }
 };

-void is_threadedt::compute(const goto_functionst &goto_functions)
+void is_threadedt::compute(
+  const goto_functionst &goto_functions,
+  const namespacet &ns)
 {
-  // the analysis doesn't actually use the namespace, fake one
-  symbol_tablet symbol_table;
-  const namespacet ns(symbol_table);
-
   ait<is_threaded_domaint> is_threaded_analysis;

   is_threaded_analysis(goto_functions, ns);
diff --git a/src/analyses/is_threaded.h b/src/analyses/is_threaded.h
index c5cf48a55..26968551d 100644
--- a/src/analyses/is_threaded.h
+++ b/src/analyses/is_threaded.h
@@ -21,16 +21,24 @@ Date: October 2012
 class is_threadedt
 {
 public:
+  // __attribute__((deprecated))
   explicit is_threadedt(
     const goto_functionst &goto_functions)
   {
-    compute(goto_functions);
+    // the analysis doesn't actually use the namespace, fake one
+    symbol_tablet symbol_table;
+    const namespacet ns(symbol_table);
+
+    compute(goto_functions, ns);
   }

   explicit is_threadedt(
     const goto_modelt &goto_model)
   {
-    compute(goto_model.goto_functions);
+    const goto_functionst &goto_functions=goto_model.goto_functions;
+    const namespacet ns(goto_model.symbol_table);
+
+    compute(goto_functions, ns);
   }

   bool operator()(const goto_programt::const_targett t) const
@@ -47,8 +55,10 @@ protected:
   typedef std::set<goto_programt::const_targett> is_threaded_sett;
   is_threaded_sett is_threaded_set;

+  // should use goto_modelt once the public API above is cleaned up
   void compute(
-    const goto_functionst &goto_functions);
+    const goto_functionst &goto_functions,
+    const namespacet &ns);
 };

 #endif // CPROVER_ANALYSES_IS_THREADED_H

@smowton
Copy link
Contributor Author

smowton commented Oct 17, 2017

Don't you think adding a spurious dependency like that is troubling? (If the comment is correct) then the code doesn't need a namespace, yet we're writing code that strongly suggests it does. Shouldn't we pass it exactly what it actually needs?

@smowton
Copy link
Contributor Author

smowton commented Oct 17, 2017

(As an alternative, how about a static global empty_namespace, to make it clearer what they're doing?)

@smowton
Copy link
Contributor Author

smowton commented Nov 1, 2017

Replaced by #1496

@smowton smowton closed this Nov 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants