Skip to content

[TestScript] Enhance the Robustness of the VTR Flow Log Parser Script #2743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ueqri opened this issue Sep 24, 2024 · 0 comments
Closed

[TestScript] Enhance the Robustness of the VTR Flow Log Parser Script #2743

ueqri opened this issue Sep 24, 2024 · 0 comments
Assignees

Comments

@ueqri
Copy link
Contributor

ueqri commented Sep 24, 2024

Current Behaviour

The VTR flow log parser currently only parses/matches the regex pattern starting from:

  • either the beginning of the line, e.g., (my pattern)...
  • or the character right after one or multiple #, e.g., #### (my pattern)...

(Note: The parse function and a brief intro of how the VTR log parser works can be found in the appendix.)

The above assumptions can break the parser in some cases, causing chaotic failures in the regression test.

Problematic Situation

If someone accidentally prints a few characters at the beginning of the line or someone doesn't know the assumptions made in the log parser, the log parser will make a hard life to the regression testing.

PS: This guy is me. I did something stupid causing parsing issues, as mentioned in #2720 (comment) and #2720 (comment).

Possible Solution

Let's say we have the following script (actually derived from the real case in #2720) not capturing the value 0.70:

import re
line='.# Routing took 0.70 seconds (max_rss 81.2 MiB, delta_rss +0.0 MiB)'
regex = re.compile(r'\s*Routing took (.*) seconds')
match = regex.match(line)
if match and match.groups():
    print(match.groups()[0])

We can correct this by adding affixes to the regex pattern string to indicate that the pattern is part of the string/line:

import re
line='.# Routing took 0.70 seconds (max_rss 81.2 MiB, delta_rss +0.0 MiB)'
regex = re.compile(r'^.*\s*Routing took (.*) seconds.*$') # Added a prefix '^.*' and a suffix '.*$'
match = regex.match(line)
if match and match.groups():
    print(match.groups()[0])

^ and $ indicate the beginning and the end of the string/line, respectively. Though the suffix can be removed, I prefer to keep it to explicitly say that our pattern only has to be part of the line.

Performance concerns: compared to the previous version, it is theatrically slower due to more work to do; however, since the regex pattern has been pre-compiled, the overhead should be that significant. Plus, the improved robustness is worth the overhead.

Appendix

The parse function can be found here:

def parse_file_and_update_results(filename, patterns, results):
"""
Find filename, and then look through for the matching patterns, updating results
"""
# We interpret the parse pattern's filename as a glob pattern
filepaths = glob.glob(filename)
if len(filepaths) > 1:
raise vtr.InspectError(
"File pattern '{}' is ambiguous ({} files matched)".format(filename, len(filepaths)),
len(filepaths),
filepaths,
)
if len(filepaths) == 1:
assert Path(filepaths[0]).exists
with open(filepaths[0], "r") as file:
for line in file:
while line[0] == "#":
line = line[1:]
for parse_pattern in patterns:
match = parse_pattern.regex().match(line)
if match and match.groups():
# Extract the first group value
results[parse_pattern] = match.groups()[0]

Regarding the parse_pattern.regex().match(line), the parse_pattern.regex() is just a pre-compiled regex as shown in:

class ParsePattern:
"""Pattern for parsing log files"""
def __init__(self, name, filename, regex_str, default_value=None):
self._name = name
self._filename = filename
self._regex = re.compile(regex_str)
self._default_value = default_value
def name(self):
"""Return name of what is being parsed for"""
return self._name
def filename(self):
"""Log filename to parse"""
return self._filename
def regex(self):
"""Regex expression to use for parsing"""
return self._regex
def default_value(self):
"""Return the default parse value"""
return self._default_value

An example of this ParsePattern object can be ParsePattern('crit_path_route_time', '/path/to/vpr.crit_path.out', '\s*Routing took (.*) seconds', None) based on this rule:

crit_path_route_time;vpr.crit_path.out;\s*Routing took (.*) seconds

@ueqri ueqri self-assigned this Sep 24, 2024
ueqri added a commit to ueqri/vtr-verilog-to-routing that referenced this issue Sep 26, 2024
Added comments for `self._regex = re.compile(f'^.*{regex_str}.*$')` in
vtr_flow/scripts/python_libs/vtr/log_parse.py.

Detailed in GitHub Issue verilog-to-routing#2743.
@ueqri ueqri closed this as completed Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant