Add Base.valid{n}?/2 functions #14417

whatyouhide · 2025-04-09T09:30:39Z

This is a first PR to add this kind of functions. If we like the direction, I'll go ahead and add valid32?/2 and valid64?/2.

josevalim · 2025-04-09T09:39:41Z

LGTM. What are we going to call the uri_base64 one? :D

lib/elixir/test/elixir/base_test.exs

whatyouhide · 2025-04-09T09:52:17Z

@josevalim I'd just replace decode with valid in function names I think, so url_valid64? and friends.

josevalim · 2025-04-09T10:07:54Z

@whatyouhide perhaps we should go with valid_decode64? and valid_url_decode64? as those are specific to the decoding operation? In theory we don't need decoding in the name, as all encoding is valid, but perhaps this makes it clearer anyway.

whatyouhide · 2025-04-09T10:42:51Z

Okay I’m down to go with valid_decode*. Gimme a sec.

whatyouhide · 2025-04-09T10:53:45Z

Added all bases. The code is a bit repetitive but I’m afraid getting more clever would make it really hard to read. For base64, ignoring whitespace is nasty because we still have to remove it completely and then do a validation pass (so one copy), but I think that's fine as the first step.

lib/elixir/test/elixir/base_test.exs

josevalim · 2025-04-09T11:14:02Z

@whatyouhide how big is the Base module before and after these changes? Should we also benchmark to see if the performance difference is significant?

josevalim · 2025-04-09T11:17:43Z

lib/elixir/lib/base.ex

    {min, decoded} = alphabet |> Enum.with_index() |> to_decode_list.()

-    defp unquote(name)(char) do
+    defp unquote(validate_name)(char) when char in unquote(alphabet), do: char


For validating each char, we should use the same decode/1 function instead of introducing a new validate function. The decode function is a subtraction plus a tuple operation, which is likely faster than checking for char in unquote(alphabet).

This should apply to all of them.

No, this is not done :D My suggestion is to remove defp unquote(validate_name)(char) and use the decoding function instead, which is likely faster?

sabiwara · 2025-04-09T11:44:51Z

Should we also benchmark to see if the performance difference is significant?

I think this is a good point, also I'm wondering if replacing the raise/rescue approach by generating and clauses would improve these benchmarks.

whatyouhide · 2025-04-09T11:48:48Z

to see if the performance difference is significant?

The huge difference here is the memory difference anyway?

whatyouhide · 2025-04-09T12:23:38Z

Byte size of Elixir.Base.beam before: 89548
Byte size of Elixir.Base.beam after: 115360 (~28% bigger)

Performance wise.

Bench script

Mix.install([:benchee, :benchee_html])

defmodule Helpers do
  def decode(16, input), do: Base.decode16!(input)
  def decode(32, input), do: Base.decode32!(input)
  def decode(64, input), do: Base.decode64!(input)

  def validate(16, input), do: Base.valid_decode16?(input)
  def validate(32, input), do: Base.valid_decode32?(input)
  def validate(64, input), do: Base.valid_decode64?(input)
end

inputs = %{
  "small string" => "hello world",
  "big string" => :crypto.strong_rand_bytes(1_000_000)
}

inputs =
  for base <- [16, 32, 64],
      {input_name, input_value} <- inputs,
      into: %{} do
    encoded_value = apply(Base, :"encode#{base}", [input_value])
    {"base#{base} - #{input_name}", {base, encoded_value}}
  end

Benchee.run(
  %{
    "decode" => fn {base, input} -> Helpers.decode(base, input) end,
    "validate" => fn {base, input} -> Helpers.validate(base, input) end
  },
  memory_time: 3,
  reduction_time: 2,
  inputs: inputs,
  formatters: [
    {Benchee.Formatters.HTML, file: "bench.html"},
    Benchee.Formatters.Console
  ]
)

Results

##### With input base16 - big string #####
Name               ips        average  deviation         median         99th %
validate        286.82        3.49 ms    ±12.32%        3.62 ms        4.43 ms
decode          244.58        4.09 ms     ±3.14%        4.09 ms        4.59 ms

Comparison: 
validate        286.82
decode          244.58 - 1.17x slower +0.60 ms

Memory usage statistics:

Name        Memory usage
validate            40 B
decode             104 B - 2.60x memory usage +64 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name     Reduction count
validate          1.00 M
decode            4.25 M - 4.25x reduction count +3.25 M

**All measurements for reduction count were the same**

##### With input base16 - small string #####
Name               ips        average  deviation         median         99th %
validate       13.95 M       71.70 ns  ±7684.41%          83 ns         125 ns
decode         10.26 M       97.50 ns   ±113.52%          84 ns         208 ns

Comparison: 
validate       13.95 M
decode         10.26 M - 1.36x slower +25.80 ns

Memory usage statistics:

Name        Memory usage
validate            40 B
decode             104 B - 2.60x memory usage +64 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name     Reduction count
validate              20
decode                56 - 2.80x reduction count +36

**All measurements for reduction count were the same**

##### With input base32 - big string #####
Name               ips        average  deviation         median         99th %
validate        421.79        2.37 ms     ±8.23%        2.31 ms        3.04 ms
decode          306.89        3.26 ms    ±11.75%        3.01 ms        4.01 ms

Comparison: 
validate        421.79
decode          306.89 - 1.37x slower +0.89 ms

Memory usage statistics:

Name        Memory usage
validate           120 B
decode             184 B - 1.53x memory usage +64 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name     Reduction count
validate          3.40 M
decode            3.40 M - 1.00x reduction count +0.00000 M

**All measurements for reduction count were the same**

##### With input base32 - small string #####
Name               ips        average  deviation         median         99th %
decode          7.17 M      139.56 ns  ±4817.17%         125 ns         250 ns
validate        6.54 M      153.01 ns ±25536.65%          83 ns         167 ns

Comparison: 
decode          7.17 M
validate        6.54 M - 1.10x slower +13.46 ns

Memory usage statistics:

Name        Memory usage
decode             176 B
validate           112 B - 0.64x memory usage -64 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name     Reduction count
decode                51
validate              51 - 1.00x reduction count +0

**All measurements for reduction count were the same**

##### With input base64 - big string #####
Name               ips        average  deviation         median         99th %
validate        513.06        1.95 ms     ±2.97%        1.95 ms        2.05 ms
decode          397.90        2.51 ms     ±6.27%        2.50 ms        2.85 ms

Comparison: 
validate        513.06
decode          397.90 - 1.29x slower +0.56 ms

Memory usage statistics:

Name        Memory usage
validate           120 B
decode             184 B - 1.53x memory usage +64 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name     Reduction count
validate          2.83 M
decode            2.83 M - 1.00x reduction count +0.00000 M

**All measurements for reduction count were the same**

##### With input base64 - small string #####
Name               ips        average  deviation         median         99th %
validate        7.12 M      140.53 ns ±27766.24%          83 ns         167 ns
decode          6.28 M      159.26 ns ±10765.68%         125 ns         209 ns

Comparison: 
validate        7.12 M
decode          6.28 M - 1.13x slower +18.73 ns

Memory usage statistics:

Name        Memory usage
validate           104 B
decode             168 B - 1.62x memory usage +64 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name     Reduction count
validate              47
decode                47 - 1.00x reduction count +0

TL;DR: validating is basically always faster and uses less memory than decoding 😄

sleepiecappy · 2025-04-09T16:09:39Z

Hi! Sorry to just chime in, but I'm confused about the naming. Is it because we want to validate if the decode is valid before using it? Or would these functions be used standing alone to validate the bases? I take that we want a naming scheme as: valid_[subject]?, so if the decode is the subject then it makes sense, but if the subject is the base I feel it can create confusion when using them. If you could explain the rationale behind the names it would be appreciated 🙏🏻

whatyouhide · 2025-04-09T18:31:50Z

@sleepiecappy sometimes you want to efficiently validate that something is valid baseN, but without actually decoding it because that causes an output binary to be created and allocated and whatnot. This set of new functions is exactly for that use case.

lib/elixir/lib/base.ex

josevalim

Code wise it looks good to me, but let's decide on the name before merging it.

sabiwara

LGTM!

lib/elixir/lib/base.ex

Co-authored-by: Jean Klingler <[email protected]>

Add Base.valid16?/1

746432e

sabiwara reviewed Apr 9, 2025

View reviewed changes

lib/elixir/test/elixir/base_test.exs Outdated Show resolved Hide resolved

whatyouhide added 5 commits April 9, 2025 12:28

base32

fc86bc3

base32

243266f

base64

48b3722

base64

0c1dc8d

base64

fd281c4

base64 tests

cf2fa1d

whatyouhide requested review from sabiwara and josevalim April 9, 2025 10:52

josevalim reviewed Apr 9, 2025

View reviewed changes

lib/elixir/test/elixir/base_test.exs Outdated Show resolved Hide resolved

josevalim reviewed Apr 9, 2025

View reviewed changes

FIXUP

9f1102d

whatyouhide requested a review from josevalim April 9, 2025 18:31

whatyouhide changed the title ~~Add Base.valid16?/2~~ Add Base.valid_decode{n}?/2 functions Apr 9, 2025

whatyouhide added 2 commits April 9, 2025 21:55

FIXUP

26c8381

FIXUP

50cb58f

josevalim approved these changes Apr 10, 2025

View reviewed changes

sabiwara reviewed Apr 10, 2025

View reviewed changes

lib/elixir/lib/base.ex Outdated Show resolved Hide resolved

josevalim approved these changes Apr 10, 2025

View reviewed changes

Better docs

f45bdd4

sabiwara approved these changes Apr 10, 2025

View reviewed changes

lib/elixir/lib/base.ex Outdated Show resolved Hide resolved

whatyouhide and others added 2 commits April 10, 2025 15:06

Update lib/elixir/lib/base.ex

0e9e6fe

Co-authored-by: Jean Klingler <[email protected]>

FIXUP

80b700f

josevalim changed the title ~~Add Base.valid_decode{n}?/2 functions~~ Add Base.valid{n}?/2 functions Apr 11, 2025

josevalim approved these changes Apr 11, 2025

View reviewed changes

whatyouhide merged commit fdbc664 into main Apr 11, 2025
22 checks passed

whatyouhide deleted the al/base-valid16 branch April 11, 2025 06:59

sabiwara mentioned this pull request Apr 12, 2025

Optimize Base.valid16?/2 #14429

Merged

Add Base.valid{n}?/2 functions #14417

Add Base.valid{n}?/2 functions #14417

Uh oh!

Conversation

whatyouhide commented Apr 9, 2025

Uh oh!

josevalim commented Apr 9, 2025

Uh oh!

Uh oh!

whatyouhide commented Apr 9, 2025

Uh oh!

josevalim commented Apr 9, 2025

Uh oh!

whatyouhide commented Apr 9, 2025

Uh oh!

whatyouhide commented Apr 9, 2025

Uh oh!

Uh oh!

josevalim commented Apr 9, 2025

Uh oh!

josevalim Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josevalim Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

whatyouhide Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

josevalim Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

sabiwara commented Apr 9, 2025

Uh oh!

whatyouhide commented Apr 9, 2025

Uh oh!

whatyouhide commented Apr 9, 2025

Uh oh!

sleepiecappy commented Apr 9, 2025

Uh oh!

whatyouhide commented Apr 9, 2025

Uh oh!

Uh oh!

josevalim left a comment

Choose a reason for hiding this comment

Uh oh!

sabiwara left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josevalim Apr 9, 2025 •

edited

Loading