Skip to content

Enhance split_string to support splitting on white space [TG-2922] #2071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 19, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ set_target_properties(
mmcc
pointer-analysis
solvers
string_utils
test-bigint
testing-utils
unit
Expand Down
31 changes: 27 additions & 4 deletions src/util/string_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,16 @@ Author: Daniel Poetzl
\*******************************************************************/

#include "string_utils.h"
#include "invariant.h"

#include <cassert>
#include <cctype>
#include <algorithm>

/// Remove all whitespace characters from either end of a string. Whitespace
/// in the middle of the string is left unchanged
/// \param s: the string to strip
/// \return The stripped string
std::string strip_string(const std::string &s)
{
auto pred=[](char c){ return std::isspace(c); };
Expand All @@ -30,15 +35,29 @@ std::string strip_string(const std::string &s)
return s.substr(i, (j-i+1));
}

/// Given a string s, split into a sequence of substrings when separated by
/// specified delimiter.
/// \param s: The string to split up
/// \param delim: The character to use as the delimiter
/// \param [out] result: The sub strings. Must be empty.
/// \param strip: If true, strip_string will be used on each element, removing
/// whitespace from the beginning and end of each element
/// \param remove_empty: If true, all empty-string elements will be removed.
/// This is applied after strip so whitespace only elements will be removed if
/// both are set to true
void split_string(
const std::string &s,
char delim,
std::vector<std::string> &result,
bool strip,
bool remove_empty)
{
assert(result.empty());
assert(!std::isspace(delim));
PRECONDITION(result.empty());
if(strip && std::isspace(delim))
{
throw std::invalid_argument(
"delim can't be a space character if using strip");
}

if(s.empty())
{
Expand All @@ -47,7 +66,7 @@ void split_string(
}

std::string::size_type n=s.length();
assert(n>0);
INVARIANT(n > 0, "Empty string case should already be handled");

std::string::size_type start=0;
std::string::size_type i;
Expand Down Expand Up @@ -87,7 +106,11 @@ void split_string(
std::string &right,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheeky of me, but as you are touching this function, do you fancy adding some Doxygen? ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will pick this up as part of the follow on tidy up PR as this is blocking another PR #2046

bool strip)
{
assert(!std::isspace(delim));
if(strip && std::isspace(delim))
{
throw std::invalid_argument(
"delim can't be a space character if using strip");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that this condition/throw block can't just be an INVARIANT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the unit test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - ok, though that smells a little bit of the mechanics of unit testing driving the architecture of the code, but I don't have a hugely strong opinion. I just felt that an INVARIANT, or perhaps actually PRECONDITION now I think more, was more inline with other functions in this file that do things like PRECONDITION(result.empty());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am with @chrisr-diffblue: I think this should be a PRECONDITION. I think invariant.h ought to have the facilities to turn that into something that can be caught, and hence could be used for testing error paths in unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree - I think of all the code we write, util code is the most "library like" and not throwing limits the utility of this function. Imagine Using split_string on a user input: I now have to duplicate the error handling code if I want to allow the user to try again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @thk123 does make a good point about the 'library like' use-case - perhaps the answer would be to have 'CATCHABLE_PRECONDITION' along with @tautschnig suggestion to enable error path testing? - i.e. in the library use-case Thomas is talking about, it could be caught by the caller to avoid duplicating the error handling.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two different "users" to be distinguished: the developer using the library and the end-user feeding input to what the developer has built. The end-user should not be subjected to internal limitations, but for a developer it's reasonable to assume that they have read the documentation.

It is of course perfectly reasonable to use exceptions for either class of "users," but that's not what the code base does right now. If you think this should change, I'd go via a new issue and/or have some further discussion. Because your proposed code is of course very nice and clean I'd keep it so that a future patch to introduce the exceptions is very low effort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all this I do think that PRECONDITION_STRUCTURED should be able to support this. There are a few cases of INVARIANT_STRUCTURED in the code base. (But also invariant.h says something about the future...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not clear on what you propose I do with this code for this PR? Do you mean keep the code in this PR, or change this PR to a precondition (and remove the associated unit tests) and stash this version away for a later day?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I'd do if it were my PR: if things can be made to work via PRECONDITION_STRUCTURED, use it. Else stash it away for a later day and drop a note in an issue so that it gets tracked.


std::vector<std::string> result;

Expand Down
6 changes: 3 additions & 3 deletions src/util/string_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ std::string strip_string(const std::string &s);

void split_string(
const std::string &s,
char delim, // must not be a whitespace character
char delim,
std::vector<std::string> &result,
bool strip=false, // strip whitespace from elements
bool remove_empty=false); // remove empty elements
bool strip = false,
bool remove_empty = false);

void split_string(
const std::string &s,
Expand Down
1 change: 0 additions & 1 deletion unit/.gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Unit test binaries
miniBDD
sharing_node
string_utils
unit_tests
12 changes: 0 additions & 12 deletions unit/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ file(GLOB_RECURSE testing_utils "testing-utils/*.cpp" "testing-utils/*.h")
list(REMOVE_ITEM sources
# Used in executables
${CMAKE_CURRENT_SOURCE_DIR}/miniBDD.cpp
${CMAKE_CURRENT_SOURCE_DIR}/string_utils.cpp

# Don't build
${CMAKE_CURRENT_SOURCE_DIR}/sharing_map.cpp
Expand Down Expand Up @@ -63,14 +62,3 @@ target_include_directories(miniBDD
target_link_libraries(miniBDD solvers ansi-c)
add_test(NAME miniBDD COMMAND $<TARGET_FILE:miniBDD>)
set_tests_properties(miniBDD PROPERTIES LABELS "CORE;CBMC")

add_executable(string_utils string_utils.cpp)
target_include_directories(string_utils
PUBLIC
${CBMC_BINARY_DIR}
${CBMC_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}
)
target_link_libraries(string_utils solvers ansi-c)
add_test(NAME string_utils COMMAND $<TARGET_FILE:string_utils>)
set_tests_properties(string_utils PROPERTIES LABELS "CORE;CBMC")
11 changes: 3 additions & 8 deletions unit/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,6 @@
# Source files for test utilities
SRC = unit_tests.cpp \
catch_example.cpp \
util/expr_iterator.cpp \
util/optional.cpp \
analyses/call_graph.cpp \
java_bytecode/java_bytecode_convert_class/convert_abstract_class.cpp \
# Empty last line

# Test source files
Expand Down Expand Up @@ -48,8 +44,11 @@ SRC += unit_tests.cpp \
util/irep.cpp \
util/irep_sharing.cpp \
util/message.cpp \
util/optional.cpp \
util/parameter_indices.cpp \
util/simplify_expr.cpp \
util/string_utils/split_string.cpp \
util/string_utils/strip_string.cpp \
util/symbol_table.cpp \
catch_example.cpp \
java_bytecode/java_virtual_functions/virtual_functions.cpp \
Expand Down Expand Up @@ -88,7 +87,6 @@ OBJ += $(CPROVER_LIBS) testing-utils/testing-utils$(LIBEXT)

TESTS = unit_tests$(EXEEXT) \
miniBDD$(EXEEXT) \
string_utils$(EXEEXT) \
# Empty last line

CLEANFILES = $(TESTS)
Expand All @@ -107,6 +105,3 @@ unit_tests$(EXEEXT): $(OBJ)

miniBDD$(EXEEXT): miniBDD$(OBJEXT) $(CPROVER_LIBS)
$(LINKBIN)

string_utils$(EXEEXT): string_utils$(OBJEXT) $(CPROVER_LIBS)
$(LINKBIN)
50 changes: 0 additions & 50 deletions unit/string_utils.cpp

This file was deleted.

Loading