[TestScript] Enhanced the Robustness of the VTR Flow Log Parser Script #2744

ueqri · 2024-09-24T01:07:22Z

Added affixes to the regex pattern string, i.e., $.*(my pattern).*$ to ensure that the searched pattern is not strictly required to start from the beginning of the string or line; it only needs to be part of the line.

This makes the definition of the pattern rules easier and makes the VTR flow log parser more robust.

(The following texts are copied from #2743)

Current Behaviour

The VTR flow log parser currently only parses/matches the regex pattern starting from:

either the beginning of the line, e.g., (my pattern)...
or the character right after one or multiple #, e.g., #### (my pattern)...

(Note: The parse function and a brief intro of how the VTR log parser works can be found in the appendix.)

The above assumptions can break the parser in some cases, causing chaotic failures in the regression test.

Problematic Situation

If someone accidentally prints a few characters at the beginning of the line or someone doesn't know the assumptions made in the log parser, the log parser will make a hard life to the regression testing.

PS: This guy is me. I did something stupid causing parsing issues, as mentioned in #2720 (comment) and #2720 (comment).

Possible Solution

Let's say we have the following script (actually derived from the real case in #2720) not capturing the value 0.70:

import re
line='.# Routing took 0.70 seconds (max_rss 81.2 MiB, delta_rss +0.0 MiB)'
regex = re.compile(r'\s*Routing took (.*) seconds')
match = regex.match(line)
if match and match.groups():
    print(match.groups()[0])

We can correct this by adding affixes to the regex pattern string to indicate that the pattern is part of the string/line:

import re
line='.# Routing took 0.70 seconds (max_rss 81.2 MiB, delta_rss +0.0 MiB)'
regex = re.compile(r'^.*\s*Routing took (.*) seconds.*$') # Added a prefix '^.*' and a suffix '.*$'
match = regex.match(line)
if match and match.groups():
    print(match.groups()[0])

^ and $ indicate the beginning and the end of the string/line, respectively. Though the suffix can be removed, I prefer to keep it to explicitly say that our pattern only has to be part of the line.

Performance concerns: compared to the previous version, it is theatrically slower due to more work to do; however, since the regex pattern has been pre-compiled, the overhead should be that significant. Plus, the improved robustness is worth the overhead.

Appendix

The parse function can be found here:

vtr-verilog-to-routing/vtr_flow/scripts/python_libs/vtr/parse_vtr_flow.py

Lines 18 to 46 in 2c2af51

    
           def parse_file_and_update_results(filename, patterns, results): 
        
               """ 
        
               Find filename, and then look through for the matching patterns, updating results 
        
               """ 
        
               # We interpret the parse pattern's filename as a glob pattern 
        
               filepaths = glob.glob(filename) 
        
               if len(filepaths) > 1: 
        
                   raise vtr.InspectError( 
        
                       "File pattern '{}' is ambiguous ({} files matched)".format(filename, len(filepaths)), 
        
                       len(filepaths), 
        
                       filepaths, 
        
                   ) 
        
               if len(filepaths) == 1: 
        
                   assert Path(filepaths[0]).exists 
        
                   with open(filepaths[0], "r") as file: 
        
                       for line in file: 
        
                           while line[0] == "#": 
        
                               line = line[1:] 
        
                           for parse_pattern in patterns: 
        
                               match = parse_pattern.regex().match(line) 
        
                               if match and match.groups(): 
        
                                   # Extract the first group value 
        
                                   results[parse_pattern] = match.groups()[0]

Regarding the parse_pattern.regex().match(line), the parse_pattern.regex() is just a pre-compiled regex as shown in:

vtr-verilog-to-routing/vtr_flow/scripts/python_libs/vtr/log_parse.py

Lines 20 to 43 in 2c2af51

    
           class ParsePattern: 
        
               """Pattern for parsing log files""" 
        
               def __init__(self, name, filename, regex_str, default_value=None): 
        
                   self._name = name 
        
                   self._filename = filename 
        
                   self._regex = re.compile(regex_str) 
        
                   self._default_value = default_value 
        
               def name(self): 
        
                   """Return name of what is being parsed for""" 
        
                   return self._name 
        
               def filename(self): 
        
                   """Log filename to parse""" 
        
                   return self._filename 
        
               def regex(self): 
        
                   """Regex expression to use for parsing""" 
        
                   return self._regex 
        
               def default_value(self): 
        
                   """Return the default parse value""" 
        
                   return self._default_value

An example of this ParsePattern object can be ParsePattern('crit_path_route_time', '/path/to/vpr.crit_path.out', '\s*Routing took (.*) seconds', None) based on this rule:

vtr-verilog-to-routing/vtr_flow/parse/parse_config/timing/vpr.route_relaxed_chan_width.txt

Line 29 in 2c2af51

crit_path_route_time;vpr.crit_path.out;\s*Routing took (.*) seconds

Added affixes to the regex pattern string, i.e., `$.*(my pattern).*$` to ensure that the searched pattern is not strictly required to start from the beginning of the string or line; it only needs to be part of the line. This makes the definition of the pattern rules easier and makes the VTR flow log parser more robust.

ueqri · 2024-09-24T01:13:37Z

There should be no harm to the current infrastructure, since I suppose we don't have, and shouldn't have (e.g., having \s*Routing took (.*) seconds within one log line doesn't make sense), more than two occurrences of the same pattern in the same line of the log file.

vaughnbetz

Thanks Hang! I just suggest adding a comment.

vtr_flow/scripts/python_libs/vtr/log_parse.py

Added comments for `self._regex = re.compile(f'^.*{regex_str}.*$')` in vtr_flow/scripts/python_libs/vtr/log_parse.py. Detailed in GitHub Issue verilog-to-routing#2743.

vaughnbetz

Thanks!

github-actions bot added the lang-python Python code label Sep 24, 2024

vaughnbetz requested changes Sep 26, 2024

View reviewed changes

vtr_flow/scripts/python_libs/vtr/log_parse.py Show resolved Hide resolved

[TestScript] Added Comments for the Regex Pattern in the Log Parser

ee29e88

Added comments for `self._regex = re.compile(f'^.*{regex_str}.*$')` in vtr_flow/scripts/python_libs/vtr/log_parse.py. Detailed in GitHub Issue verilog-to-routing#2743.

ueqri requested a review from vaughnbetz September 26, 2024 17:19

vaughnbetz approved these changes Sep 26, 2024

View reviewed changes

vaughnbetz merged commit 8dd5a3b into verilog-to-routing:master Sep 27, 2024
37 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TestScript] Enhanced the Robustness of the VTR Flow Log Parser Script #2744

[TestScript] Enhanced the Robustness of the VTR Flow Log Parser Script #2744

Uh oh!

ueqri commented Sep 24, 2024

Uh oh!

ueqri commented Sep 24, 2024

Uh oh!

vaughnbetz left a comment

Uh oh!

Uh oh!

vaughnbetz left a comment

Uh oh!

Uh oh!

Uh oh!

	def parse_file_and_update_results(filename, patterns, results):

	"""
	Find filename, and then look through for the matching patterns, updating results
	"""

	# We interpret the parse pattern's filename as a glob pattern
	filepaths = glob.glob(filename)

	if len(filepaths) > 1:
	raise vtr.InspectError(
	"File pattern '{}' is ambiguous ({} files matched)".format(filename, len(filepaths)),
	len(filepaths),
	filepaths,
	)

	if len(filepaths) == 1:
	assert Path(filepaths[0]).exists

	with open(filepaths[0], "r") as file:
	for line in file:
	while line[0] == "#":
	line = line[1:]

	for parse_pattern in patterns:
	match = parse_pattern.regex().match(line)
	if match and match.groups():
	# Extract the first group value
	results[parse_pattern] = match.groups()[0]

	class ParsePattern:
	"""Pattern for parsing log files"""

	def __init__(self, name, filename, regex_str, default_value=None):
	self._name = name
	self._filename = filename
	self._regex = re.compile(regex_str)
	self._default_value = default_value

	def name(self):
	"""Return name of what is being parsed for"""
	return self._name

	def filename(self):
	"""Log filename to parse"""
	return self._filename

	def regex(self):
	"""Regex expression to use for parsing"""
	return self._regex

	def default_value(self):
	"""Return the default parse value"""
	return self._default_value

[TestScript] Enhanced the Robustness of the VTR Flow Log Parser Script #2744

[TestScript] Enhanced the Robustness of the VTR Flow Log Parser Script #2744

Uh oh!

Conversation

ueqri commented Sep 24, 2024

Current Behaviour

Problematic Situation

Possible Solution

Appendix

Uh oh!

ueqri commented Sep 24, 2024

Uh oh!

vaughnbetz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vaughnbetz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!