Skip to content

Inspect special purpose chars using their unicode representation #13676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions lib/elixir/lib/code/identifier.ex
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,34 @@ defmodule Code.Identifier do

defp escape_char(0), do: [?\\, ?0]

@escaped_bom :binary.bin_to_list("\\uFEFF")
defp escape_char(65279), do: @escaped_bom
defp escape_char(char)
# Some characters that are confusing (zero-width / alternative spaces) are displayed
# using their unicode representation:
# https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Special-purpose_characters

# BOM
when char == 0xFEFF
# Mathematical invisibles
when char in 0x2061..0x2064
# Bidirectional neutral
when char in [0x061C, 0x200E, 0x200F]
# Bidirectional general (source of vulnerabilities)
when char in 0x202A..0x202E
when char in 0x2066..0x2069
Comment on lines +119 to +121
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's especially good for this one, since we already have something to prevent vulnerabilities around it in the tokenizer:

%% Bidirectional control
%% Retrieved from https://trojansource.codes/trojan-source.pdf
-define(bidi(C), C =:= 16#202A;

# Interlinear annotations
when char in 0xFFF9..0xFFFC
# Zero-width joiners and non-joiners
when char in [0x200C, 0x200D, 0x034F]
# Non-break space / zero-width space
when char in [0x00A0, 0x200B, 0x2060]
# Line/paragraph separators
when char in [0x2028, 0x2029]
# Spaces
when char in 0x2000..0x200A
when char == 0x205F do
<<a::4, b::4, c::4, d::4>> = <<char::16>>
[?\\, ?u, to_hex(a), to_hex(b), to_hex(c), to_hex(d)]
end

defp escape_char(char)
when char in 0x20..0x7E
Expand Down
2 changes: 2 additions & 0 deletions lib/elixir/test/elixir/inspect_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,8 @@ defmodule Inspect.BitStringTest do
assert inspect(" ゆんゆん") == "\" ゆんゆん\""
# BOM
assert inspect("\uFEFFhello world") == "\"\\uFEFFhello world\""
# Invisible characters
assert inspect("\u2063") == "\"\\u2063\""
end

test "infer" do
Expand Down