Skip to content

Commit d78118d

Browse files
authored
Fix a problem that parse exception message can't be generated for invalid encoding XML (#123)
## Why? If the XML tag contains Unicode characters and an error is occurred for the tag, an incompatible encoding error is raised. Because our parse exception message parts have an UTF-8 part (that includes the target tag information) and an ASCII-8BIT part (that includes error context input). Fix GH-29 Reported by DuKewu. Thanks!!!
1 parent 06be5cf commit d78118d

File tree

2 files changed

+14
-0
lines changed

2 files changed

+14
-0
lines changed

lib/rexml/parseexception.rb

+1
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ def to_s
2929
err << "\nLine: #{line}\n"
3030
err << "Position: #{position}\n"
3131
err << "Last 80 unconsumed characters:\n"
32+
err.force_encoding("ASCII-8BIT")
3233
err << @source.buffer[0..80].force_encoding("ASCII-8BIT").gsub(/\n/, ' ')
3334
end
3435

test/parse/test_element.rb

+13
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,19 @@ def test_empty_namespace_attribute_name
4747
DETAIL
4848
end
4949

50+
def test_empty_namespace_attribute_name_with_utf8_character
51+
exception = assert_raise(REXML::ParseException) do
52+
parse("<x :\xE2\x80\x8B>") # U+200B ZERO WIDTH SPACE
53+
end
54+
assert_equal(<<-DETAIL.chomp.force_encoding("ASCII-8BIT"), exception.to_s)
55+
Invalid attribute name: <:\xE2\x80\x8B>
56+
Line: 1
57+
Position: 8
58+
Last 80 unconsumed characters:
59+
:\xE2\x80\x8B>
60+
DETAIL
61+
end
62+
5063
def test_garbage_less_than_before_root_element_at_line_start
5164
exception = assert_raise(REXML::ParseException) do
5265
parse("<\n<x/>")

0 commit comments

Comments
 (0)