Skip to content

add replace_all documentation #3478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 25, 2019
Merged

add replace_all documentation #3478

merged 1 commit into from
Feb 25, 2019

Conversation

kroening
Copy link
Member

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like semi-formal arguments.

void replace_all(
std::string &str,
const std::string &from,
const std::string &to)
{
PRECONDITION(from.size() != 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think !from.empty() is more idiomatic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

/// \param to: string to replace with
/// Copyright notice:
/// Attributed to Gauthier Boaglio
/// Source: https://stackoverflow.com/a/24315631/7501486
/// Used under MIT license
/// Note that this has quadratic complexity if from.size() != to.size()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quadratic in ... what? The length of str? That's equally true then of the implementation that triggered this discussion in #3463.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed wording

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫
This PR failed Diffblue compatibility checks (cbmc commit: 4ecf211).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92976712
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

The incompatibility may have been introduced by an earlier PR. In that case merging this
PR should be avoided unless it fixes the current incompatibility.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️
Passed Diffblue compatibility checks (cbmc commit: 366d563).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/92977248

/// \param to: string to replace with
/// Copyright notice:
/// Attributed to Gauthier Boaglio
/// Source: https://stackoverflow.com/a/24315631/7501486
/// Used under MIT license
/// Note that this is quadratic in str.size() if from.size() != to.size().
/// There is no guarantee that it is linear in str.size() if
/// from.size() == to.size().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

/// \param to: string to replace with
/// Copyright notice:
/// Attributed to Gauthier Boaglio
/// Source: https://stackoverflow.com/a/24315631/7501486
/// Used under MIT license
/// Note that this is quadratic in str.size() if from.size() != to.size().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... , because each replacement entails a copy of the contents of the string.

Or maybe something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would not change the complexity in str.size() -- unfortunately, std::string::replace does not provide the guarantee that replacing equal-length strings is not independent of the length of the containing string.

@tautschnig
Copy link
Collaborator

The precondition is important, but instead of the documentation dance about complexity we might just fix the implementation?

str result;
std::string::size_type cursor = 0;
for(std::string::size_type replace_at = str.find(from, 0); replace_at != std::string::npos; replace_at = str.find(from, replace_at))
{
  result += str.substr(cursor, replace_at - cursor);
  result += to;
  cursor += from.length();
  replace_at += from.length();
}
if(cursor != str.size())
  result += str.substr(cursor);
str.swap(result);

@kroening
Copy link
Member Author

There are three cases:

  1. replacement by shorter string, 2) by string with same length, and 3) by longer string

On an experiment with strings of length 6000:
There's a visible speedup for case 1), but only 25%, and it really needs a long string to kick in. The new implementation is ca. 3-4x slower for case 2). It's ca. 5x slower for case 3).

Thus, I'd propose to stick to the current one, with a warning that it may be slow for long strings.

@tautschnig
Copy link
Collaborator

@kroening many thanks for the analysis. What about inserting a result.reserve(str.length()); - it might also be a matter of how many replacements are to be done (i.e., is time lost in the append operation or in doing find)? Admittedly, I am surprised by the huge slow-down.

@kroening
Copy link
Member Author

The .reserve() makes hardly any measurable difference (the default enlargement strategy seems to work well).
The example I use replaces 30%.

Copy link
Contributor

@NlightNFotis NlightNFotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, appreciate the performance related comments as well. Would prefer the (performance) data points to be included if it would be possible, but don't mind it either way.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving after sitting on it for way too long, but really I think this function should just be removed:

  1. We do not have a single use in the code base.
  2. It's completely unrelated to program analysis or any other CBMC matter.
  3. If anyone else uses it (AFAIK TG does), they are more than welcome to copy the implementation.

Any user can then optimise it for their use case.

@tautschnig tautschnig merged commit 5c8839f into develop Feb 25, 2019
@tautschnig tautschnig deleted the replace_all_doc branch February 25, 2019 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants