-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add Base.valid{n}?/2 functions #14417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM. What are we going to call the uri_base64 one? :D |
@josevalim I'd just replace |
@whatyouhide perhaps we should go with |
Okay I’m down to go with |
Added all bases. The code is a bit repetitive but I’m afraid getting more clever would make it really hard to read. For base64, ignoring whitespace is nasty because we still have to remove it completely and then do a validation pass (so one copy), but I think that's fine as the first step. |
@whatyouhide how big is the Base module before and after these changes? Should we also benchmark to see if the performance difference is significant? |
lib/elixir/lib/base.ex
Outdated
{min, decoded} = alphabet |> Enum.with_index() |> to_decode_list.() | ||
|
||
defp unquote(name)(char) do | ||
defp unquote(validate_name)(char) when char in unquote(alphabet), do: char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For validating each char, we should use the same decode/1
function instead of introducing a new validate function. The decode function is a subtraction plus a tuple operation, which is likely faster than checking for char in unquote(alphabet)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should apply to all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is not done :D My suggestion is to remove defp unquote(validate_name)(char)
and use the decoding function instead, which is likely faster?
I think this is a good point, also I'm wondering if replacing the raise/rescue approach by generating |
The huge difference here is the memory difference anyway? |
Performance wise. Bench scriptMix.install([:benchee, :benchee_html])
defmodule Helpers do
def decode(16, input), do: Base.decode16!(input)
def decode(32, input), do: Base.decode32!(input)
def decode(64, input), do: Base.decode64!(input)
def validate(16, input), do: Base.valid_decode16?(input)
def validate(32, input), do: Base.valid_decode32?(input)
def validate(64, input), do: Base.valid_decode64?(input)
end
inputs = %{
"small string" => "hello world",
"big string" => :crypto.strong_rand_bytes(1_000_000)
}
inputs =
for base <- [16, 32, 64],
{input_name, input_value} <- inputs,
into: %{} do
encoded_value = apply(Base, :"encode#{base}", [input_value])
{"base#{base} - #{input_name}", {base, encoded_value}}
end
Benchee.run(
%{
"decode" => fn {base, input} -> Helpers.decode(base, input) end,
"validate" => fn {base, input} -> Helpers.validate(base, input) end
},
memory_time: 3,
reduction_time: 2,
inputs: inputs,
formatters: [
{Benchee.Formatters.HTML, file: "bench.html"},
Benchee.Formatters.Console
]
) Results
TL;DR: validating is basically always faster and uses less memory than decoding 😄 |
Hi! Sorry to just chime in, but I'm confused about the naming. Is it because we want to validate if the decode is valid before using it? Or would these functions be used standing alone to validate the bases? I take that we want a naming scheme as: |
@sleepiecappy sometimes you want to efficiently validate that something is valid baseN, but without actually decoding it because that causes an output binary to be created and allocated and whatnot. This set of new functions is exactly for that use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code wise it looks good to me, but let's decide on the name before merging it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: Jean Klingler <[email protected]>
This is a first PR to add this kind of functions. If we like the direction, I'll go ahead and add
valid32?/2
andvalid64?/2
.