|
| 1 | +# 591. Tag Validator |
| 2 | +Given a string representing a code snippet, implement a tag validator to parse the code and return whether it is valid. |
| 3 | + |
| 4 | +A code snippet is valid if all the following rules hold: |
| 5 | +1. The code must be wrapped in a **valid closed tag**. Otherwise, the code is invalid. |
| 6 | +2. A **closed tag** (not necessarily valid) has exactly the following format : `<TAG_NAME>TAG_CONTENT</TAG_NAME>`. Among them, `<TAG_NAME>` is the start tag, and `</TAG_NAME>` is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is **valid** if and only if the TAG_NAME and TAG_CONTENT are valid. |
| 7 | +3. A **valid** `TAG_NAME` only contain **upper-case letters**, and has length in range [1,9]. Otherwise, the `TAG_NAME` is **invalid**. |
| 8 | +4. A **valid** `TAG_CONTENT` may contain other **valid closed tags**, **cdata** and any characters (see note1) **EXCEPT** unmatched `<`, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the `TAG_CONTENT` is **invalid**. |
| 9 | +5. A start tag is unmatched if no end tag exists with the same TAG_NAME, and vice versa. However, you also need to consider the issue of unbalanced when tags are nested. |
| 10 | +6. A `<` is unmatched if you cannot find a subsequent `>`. And when you find a `<` or `</`, all the subsequent characters until the next `>` should be parsed as TAG_NAME (not necessarily valid). |
| 11 | +7. The cdata has the following format : `<![CDATA[CDATA_CONTENT]]>`. The range of `CDATA_CONTENT` is defined as the characters between `<![CDATA[` and the **first subsequent** `]]>`. |
| 12 | +8. `CDATA_CONTENT` may contain **any characters**. The function of cdata is to forbid the validator to parse `CDATA_CONTENT`, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as **regular characters**. |
| 13 | + |
| 14 | +#### Example 1: |
| 15 | +<pre> |
| 16 | +<strong>Input:</strong> code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>" |
| 17 | +<strong>Output:</strong> true |
| 18 | +<strong>Explanation:</strong> |
| 19 | +The code is wrapped in a closed tag : <DIV> and </DIV>. |
| 20 | +The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata. |
| 21 | +Although CDATA_CONTENT has an unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as a tag. |
| 22 | +So TAG_CONTENT is valid, and then the code is valid. Thus return true. |
| 23 | +</pre> |
| 24 | + |
| 25 | +#### Example 2: |
| 26 | +<pre> |
| 27 | +<strong>Input:</strong> code = "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>" |
| 28 | +<strong>Output:</strong> true |
| 29 | +<strong>Explanation:</strong> |
| 30 | +We first separate the code into : start_tag|tag_content|end_tag. |
| 31 | +start_tag -> "<DIV>" |
| 32 | +end_tag -> "</DIV>" |
| 33 | +tag_content could also be separated into : text1|cdata|text2. |
| 34 | +text1 -> ">> ![cdata[]] " |
| 35 | +cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>" |
| 36 | +text2 -> "]]>>]" |
| 37 | +The reason why start_tag is NOT "<DIV>>>" is because of the rule 6. |
| 38 | +The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7. |
| 39 | +</pre> |
| 40 | + |
| 41 | +#### Example 3: |
| 42 | +<pre> |
| 43 | +<strong>Input:</strong> code = "<A> <B> </A> </B>" |
| 44 | +<strong>Output:</strong> false |
| 45 | +<strong>Explanation:</strong> Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa. |
| 46 | +</pre> |
| 47 | + |
| 48 | +#### Constraints: |
| 49 | +* `1 <= code.length <= 500` |
| 50 | +* `code` consists of English letters, digits, `'<'`, `'>'`, `'/'`, `'!'`, `'['`, `']'`, `'.'`, and `' '`. |
| 51 | + |
| 52 | +## Solutions (Python) |
| 53 | + |
| 54 | +### 1. Solution |
| 55 | +```Python |
| 56 | +class Solution: |
| 57 | + def isValid(self, code: str) -> bool: |
| 58 | + cdata = False |
| 59 | + tagstack = [] |
| 60 | + i = 0 |
| 61 | + |
| 62 | + while i < len(code): |
| 63 | + if cdata: |
| 64 | + if code[i:i + 3] == "]]>": |
| 65 | + cdata = False |
| 66 | + i += 2 |
| 67 | + elif tagstack == [] and (code[i] != '<' or code[i:i + 2] in "</<!"): |
| 68 | + return False |
| 69 | + elif code[i:i + 9] == "<![CDATA[": |
| 70 | + cdata = True |
| 71 | + i += 8 |
| 72 | + elif code[i:i + 2] == "</": |
| 73 | + for j in range(i + 2, i + 13): |
| 74 | + if j >= len(code) or j == i + 12 or (j == i + 2 and code[j] == '>'): |
| 75 | + return False |
| 76 | + elif code[j] == '>': |
| 77 | + if tagstack.pop() != code[i + 2:j]: |
| 78 | + return False |
| 79 | + if tagstack == [] and j != len(code) - 1: |
| 80 | + return False |
| 81 | + i = j |
| 82 | + break |
| 83 | + elif not code[j].isupper(): |
| 84 | + return False |
| 85 | + elif code[i] == '<': |
| 86 | + for j in range(i + 1, i + 12): |
| 87 | + if j >= len(code) or j == i + 11 or (j == i + 1 and code[j] == '>'): |
| 88 | + return False |
| 89 | + elif code[j] == '>': |
| 90 | + tagstack.append(code[i + 1:j]) |
| 91 | + i = j |
| 92 | + break |
| 93 | + elif not code[j].isupper(): |
| 94 | + return False |
| 95 | + |
| 96 | + i += 1 |
| 97 | + |
| 98 | + return tagstack == [] |
| 99 | +``` |
0 commit comments