Skip to content

(str == <<>>) Results in false for empty strings in comprehensions #13673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PsychicPlatypus opened this issue Jun 18, 2024 · 3 comments
Closed

Comments

@PsychicPlatypus
Copy link

PsychicPlatypus commented Jun 18, 2024

Elixir and Erlang/OTP versions

elixir 1.14.5-otp-25
erlang 25.0.4

Operating system

MacOS Sonoma 14.5 (23F79)

Current behavior

The following function:

def sanitize_string(text) do
    x =
      text
      |> String.replace(~r"\s+", " ")

    for i <- 0..(String.length(x) - 1)//2 do
      current = String.at(x, i) |> String.trim() |> tap(&IO.inspect(&1, label: ""))
      next = String.at(x, i + 1) |> String.trim() |> tap(&IO.inspect(&1, label: ""))

      (current == <<>>)
      |> tap(&IO.inspect(&1, label: ""))

      (next == <<>>)
      |> tap(&IO.inspect(&1, label: ""))
    end
  end

For the string:

foo bar foo bar
\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣\n⁣FII Institute⁣\n#Site,\n##"

Produces results:

...
: ""
: "⁣"
: true
: false
: ""
: "⁣"
: true
: false
: ""
: "⁣"
: true
: false
...

Which is not correct since both strings are empty, so they should both return true for the check str == <<>>

Expected behavior

The function should produce:

...
: ""
: "⁣"
: true
: true
: ""
: "⁣"
: true
: true
: ""
: "⁣"
: true
: true
...

For those empty strings

@sabiwara
Copy link
Contributor

sabiwara commented Jun 18, 2024

These are actually non-empty strings, they contain this invisible separator codepoint:

s = "⁣"
String.length(s) # 1
String.to_charlist(s) # [8291]
Integer.to_string(8291, 16)  # "2063"
s == "\u2063"  # true

@josevalim
Copy link
Member

Closing this as there is no bug in Elixir, but we may want to improve the printing in Elixir to make these cases clearer. :)

@josevalim josevalim closed this as not planned Won't fix, can't repro, duplicate, stale Jun 18, 2024
@sabiwara
Copy link
Contributor

sabiwara commented Jun 18, 2024

[notes] By comparison in python:

  • the representation is '\u2063' => we might be able to do the same for inspect
  • when copy-pasting in a python shell, "⁣" gets replaced by "\u2063"

sabiwara added a commit to sabiwara/elixir that referenced this issue Jun 19, 2024
This reduces confusion when working with zero-width characters or alternative spaces.

Relates to elixir-lang#13673
sabiwara added a commit that referenced this issue Jun 19, 2024
)

This reduces confusion when working with zero-width characters or alternative spaces.

Relates to #13673
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants