Skip to content

Factor out java-specifics from language-agnostic code #1869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
peterschrammel opened this issue Feb 21, 2018 · 12 comments
Closed

Factor out java-specifics from language-agnostic code #1869

peterschrammel opened this issue Feb 21, 2018 · 12 comments
Assignees

Comments

@peterschrammel
Copy link
Member

peterschrammel commented Feb 21, 2018

I have created a JBMC repository where all the Java stuff will eventually go: https://github.com/diffblue/jbmc

(I'll populate below as my investigations progress)

Step 1a: Move jbmc and java_bytecode/library into new repo (in progress, see master branch of jbmc repo)

Step 1b: Separate out java-diff from goto-diff

  • TODO: tbd

Step 1c: Separate out java-analyzer from goto-analyzer

  • TODO: tbd

Step 2: Introduce required extension mechanisms into language-agnostic framework.

The following java_bytecode dependencies have been fixed:

The following java_bytecode dependencies must be removed:

  • goto-programs/goto_convert:
    • TODO: handle ID_*_new through separate goto passes

There are non-java_bytecode Java dependencies in (exhaustive list):

  • analyses/goto_check
  • analyses/uncaught_exception_analysis
    • special-casing AssertionError seems wrong and should be removed, otherwise there is nothing java-specific and no java_bytecode dependency
  • goto-analyzer/taint_analysis
    • prefix comparison should be fixed up, but no dep on java_bytecode
  • goto-instrument/call_sequences
    • use pretty name from symbol table instead of string manipulation, but no dep on java_bytecode
  • goto-instrument/cover_basic_blocks
    • requires java_source_location
    • factor out cover_basic_blocks_javat into separate class
    • but no dep on java_bytecode
  • goto-instrument/cover_filter
    • java built-ins should be done via a tag in the symbol table (see show_goto-functions), but no dep on java_bytecode
  • goto-programs/remove_exceptions
    • special-casing AssertionError seems wrong and should be removed, otherwise there is nothing java-specific and no dep on java_bytecode
    • inflight exception identifier can have a language-independent prefix
  • goto-programs/show_goto_functions
    • fix languaget::system_library_symbols for tagging symbols in the symbol table, but no dependency on java_bytecode
  • goto-symex/symex_assign, goto-symex/symex_builtin_functions
    • separate out cpp/java_new_array and unify, get mode from an lhs symbol
    • Ideally replace cpp_new by allocate in frontend
    • but no dependency on java_bytecode
  • pointer-analysis/value_set_dereference
    • silent failure should be replaced by an invariant, but no dependency on java_bytecode
  • util/source_location
    • requires java_source_location, but no dependency on java_bytecode
  • util/config
    • requires a bigger refactoring to make this non-global, non-critical for now
  • util/irep_ids.def
    • requires ability to extend irep_ids.def in a dependent repo, but no dependency on java_bytecode
  • util/unicode
    • used by string solver, seems uncritical

Step 3: Move java specifics into jbmc repo and update submodules in dependent repos

@martin-cs
Copy link
Collaborator

I see value in modularising language dependent things out as much as possible but does it make sense to put this in a separate repository? My experience so far is that keeping things in sync is a pain unless there are some clear, well defined, boundaries / APIs between them.

@martin-cs
Copy link
Collaborator

As owner of a non-trivial number of out-of-tree branches, perhaps @tautschnig could say more...

@tautschnig
Copy link
Collaborator

I think the track record of software kept in different repositories is poor, but maybe DiffBlue now has the resources to have dedicated teams assigned to each project who then take and keep ownership of those systems. Thus I trust that only @peterschrammel will know what's appropriate.

@tautschnig
Copy link
Collaborator

Since I do have an interest in keeping JBMC alive, could that internal strategy that I postulated above be made public? As far as I can tell, the current graveyard of tools-outside-the-repo enlists Deltacheck, FShell, Impara, Musketeer, Satabs, Symex. I think EBMC and 2LS are the ones that are alive.

@tautschnig
Copy link
Collaborator

Maybe @kroening can also comment?

@peterschrammel
Copy link
Member Author

peterschrammel commented Feb 21, 2018

cons:

  • maintenance overhead: Separate repositories are a pain unless there is someone responsible for keeping it up to date. In this case, everything java-related will depend on the jbmc repository, so this won't be a problem.
  • development process overhead: Honestly, I'd prefer to have a single repo for everything to reduce friction in the development process.

pros:

  • added inertia in changing core apis: It becomes much more difficult to bypass APIs and end up with spaghetti code. To push it to the extreme, the 'cbmc' repository should be renamed 'cprover', and 'cbmc', 'goto-cc' etc should be factored out as well.
  • improvement of core APIs: It forces us to clearly define APIs and extension mechanisms to support multiple languages (which has failed in the past as the above list demonstrates).
  • reduction of PR noise in the CBMC repository regarding Java-related stuff.

@tautschnig
Copy link
Collaborator

I believe the situation, also with regard to the graveyard of tools, would be very different if there were a set-up that triggered an automatic build (and notifications back to the originator) of all dependencies. If this were the case, I'd be all up for such a refactoring. In absence of such a set-up any project that is factored out is doomed.

With a private CI such as a Jenkins host it's easy to track those dependencies and do automated builds as needed. Maybe there are solutions to make this work with Travis. If so, go for separate repositories. If it isn't possible: you have been warned.

Other people have discussed the pros and cons as well:
https://gist.github.com/arschles/5d7ba90495eb50fa04fc
https://gist.github.com/technosophos/9c706b1ef10f42014a06

@martin-cs
Copy link
Collaborator

I think at least one of the pros (improvement of core APIs) could be done without splitting repos.

@tautschnig : I'm not sure of the health of hw-cbmc, LoopFrog (and FunFrog?) and I might be able to add a few more to the roll of the lost (Wolverine, Scoot, TAN, https://svn.cprover.org/wiki/doku.php?id=software , cdfpl).

@peterschrammel
Copy link
Member Author

I don't think that it makes sense to keep 20 unused tools alive by placing a burden on everyone who wants to make a PR to the central (i.e. cbmc) repository. If these tools had a relevant user base then there would be someone who took responsibility to maintain them. It's hard to say whether they would get more users by carrying them as zombies around, but sometimes it makes sense to leave things buried in the cemetery until they get resurrected by someone who is really interested. It's important to put tombstones to document where they can be dug up - if all these tools were properly set up with cbmc submodules and CI, the situation would be significantly better already because then we would always know the last configuration (CBMC pointer, compiler, etc) that produced a successful build, even if the CBMC pointer has not been advanced for a couple of years.

@martin-cs
Copy link
Collaborator

@peterschrammel I wasn't suggesting any kind of resurrection (and yes, keeping the pointer would be the right suggestion; that is what we should have done). I guess I was just OCD attempting to complete @tautschnig 's list.

The key point is that without active maintainance, keeping even branches alive and in sync is a real pain. I think we (in the general sense) are getting better at it but even then I'm not sure I'd recommend it if it can be avoided.

@smowton
Copy link
Contributor

smowton commented Feb 27, 2018

Some of the Java-specific stuff mentioned above is a positive thing that we would want to reproduce somehow even if the codebase is split up. For example, goto-check knowing that Java pointers are always guaranteed to be either null or pointing to the head of a valid object is useful.

@thomasspriggs
Copy link
Contributor

The link to the proposed jbmc repository is dead when I try and access it. Therefore I am going to assume the plan is just to keep jbmc in a separate directory within the cbmc repository and close out this issue. Please feel free to re-open this issue if there are alternative plans afoot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants