-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Optimization attempt for union of maps and tuples #14205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some examples from the Bamboo issue (a lot of similar ones): before #=> [
{:closed,
%{
blocked: :term,
private: %{map: [{:open, %{"o:tag": :term}, []}]},
cc: :term,
__struct__: %{atom: {:union, %{Bamboo.Email => []}}},
from: :term,
to: :term,
headers: :term,
subject: :term,
assigns: :term,
attachments: :term,
bcc: :term,
text_body: :term,
html_body: :term
}, []},
{:closed,
%{
blocked: :term,
private: :term,
cc: :term,
__struct__: %{atom: {:union, %{Bamboo.Email => []}}},
from: :term,
to: :term,
headers: :term,
subject: :term,
assigns: :term,
attachments: :term,
bcc: :term,
text_body: :term,
html_body: :term
}, []}
]
optimized #=> [
{:closed,
%{
blocked: :term,
private: :term,
cc: :term,
__struct__: %{atom: {:union, %{Bamboo.Email => []}}},
from: :term,
to: :term,
headers: :term,
subject: :term,
assigns: :term,
attachments: :term,
bcc: :term,
text_body: :term,
html_body: :term
}, []}
] before #=> [
{:closed,
%{
owner: :term,
port: :term,
private: :term,
scheme: :term,
status: :term,
script_name: :term,
state: :term,
host: :term,
params: :term,
__struct__: %{atom: {:union, %{Plug.Conn => []}}},
halted: %{atom: {:union, %{true: []}}},
cookies: :term,
request_path: :term,
method: :term,
secret_key_base: :term,
path_info: :term,
body_params: :term,
query_params: :term,
req_headers: :term,
adapter: :term,
remote_ip: :term,
query_string: :term,
assigns: :term,
req_cookies: :term,
resp_cookies: :term,
path_params: :term,
resp_body: :term,
resp_headers: :term
}, []},
{:closed,
%{
owner: :term,
port: :term,
private: :term,
scheme: :term,
status: :term,
script_name: :term,
state: :term,
host: :term,
params: :term,
__struct__: %{atom: {:union, %{Plug.Conn => []}}},
halted: :term,
cookies: :term,
request_path: :term,
method: :term,
secret_key_base: :term,
path_info: :term,
body_params: :term,
query_params: :term,
req_headers: :term,
adapter: :term,
remote_ip: :term,
query_string: :term,
assigns: :term,
req_cookies: :term,
resp_cookies: :term,
path_params: :term,
resp_body: :term,
resp_headers: :term
}, []}
]
optimized #=> [
{:closed,
%{
owner: :term,
port: :term,
private: :term,
scheme: :term,
status: :term,
script_name: :term,
state: :term,
host: :term,
params: :term,
__struct__: %{atom: {:union, %{Plug.Conn => []}}},
halted: :term,
cookies: :term,
request_path: :term,
method: :term,
secret_key_base: :term,
path_info: :term,
body_params: :term,
query_params: :term,
req_headers: :term,
adapter: :term,
remote_ip: :term,
query_string: :term,
assigns: :term,
req_cookies: :term,
resp_cookies: :term,
path_params: :term,
resp_body: :term,
resp_headers: :term
}, []}
] |
iex> :tprof.profile(Mix, :install, [[:bamboo], [force: true]], %{type: :call_time, pattern: [{Module.Types.Descr,:_, :_}], timeout: 5000, report: {:total, {:measurement, :descending}}})
...
FUNCTION CALLS TIME (us) PER CALL [ %]
'Elixir.Module.Types.Descr':'empty?'/1 5253 381 0.07 [11.55]
'Elixir.Module.Types.Descr':'gradual?'/1 4236 222 0.05 [ 6.73]
'Elixir.Module.Types.Descr':'compatible?'/2 932 177 0.19 [ 5.36]
'Elixir.Module.Types.Descr':iterator_merge/3 2122 161 0.08 [ 4.88]
'Elixir.Module.Types.Descr':union/2 1469 152 0.10 [ 4.61]
'Elixir.Module.Types.Descr':term/0 3326 143 0.04 [ 4.33]
'Elixir.Module.Types.Descr':intersection/2 2166 133 0.06 [ 4.03]
'Elixir.Module.Types.Descr':'disjoint?'/2 1333 116 0.09 [ 3.52]
'Elixir.Module.Types.Descr':iterator_intersection/4 1160 113 0.10 [ 3.42]
'Elixir.Module.Types.Descr':list_descr/3 536 97 0.18 [ 2.94]
'Elixir.Module.Types.Descr':union/3 1001 96 0.10 [ 2.91]
'Elixir.Module.Types.Descr':tuple_fetch/2 396 82 0.21 [ 2.48]
'Elixir.Module.Types.Descr':atom_new/1 636 63 0.10 [ 1.91]
'Elixir.Module.Types.Descr':binary/0 1434 62 0.04 [ 1.88]
'Elixir.Module.Types.Descr':list_pop_dynamic/1 1072 59 0.06 [ 1.79]
'Elixir.Module.Types.Descr':symmetrical_merge/3 1081 58 0.05 [ 1.76] |
@@ -1278,8 +1278,75 @@ defmodule Module.Types.Descr do | |||
|
|||
defp map_only?(descr), do: empty?(Map.delete(descr, :map)) | |||
|
|||
# Union is list concatenation | |||
defp map_union(dnf1, dnf2), do: dnf1 ++ (dnf2 -- dnf1) | |||
defp map_or_tuple_union(dnf1, dnf2) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still going to change the implementation of maps considerably, so I would suggest keeping those separate.
Something else you can do, which is what we do once we normalize, is to compute the union of two fields, if there is only field that is different. And in this case, we check for structural equality.
So this could be a more general algorithm:
- Do a pass on the dnf checking if they have all the same field and collecting which fields are different
- If any of the fields that is different is the struct field, abort
- If all of the fields are equal, you are done
- If If only one field is different, do its union
- If more than one field is different, check if you have all subtypes on the left or the right
This should require one pass to find fields and another to solve subtyping.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fusible_maps?
function gives you a general idea of the algorithm today, although it returns a boolean, and you want something more complex. We could later on adapt the function to be shared by both union and normalize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still going to change the implementation of maps considerably, so I would suggest keeping those separate.
Makes sense 👍
Something else you can do, which is what we do once we normalize, is to compute the union of two fields, if there is only field that is different. And in this case, we check for structural equality.
Interesting one, I think I see what you're saying. Only if the tags are equal though, correct?
%{a: integer(), b: type_of_b()} or %{a: float(), b: type_of_b()}
is%{a: integer() or float(), b: type_of_b()}
%{..., a: integer(), b: type_of_b()} or %{..., a: float(), b: type_of_b()}
is%{..., a: integer() or float(), b: type_of_b()}
%{..., a: integer(), b: type_of_b()} or %{a: float(), b: type_of_b()}
=> any other field is illegal ifa
is a float?
It would be nice to have a bench for this case, do you have an example in mind where we could measure its impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only if the tags are equal. In my mind, the most expensive parts are computing the subtyping and traversing the keys. So in my mind, the suggestions above are an improvement to this pull request because:
- It traverses the keys once
- It avoids computing the subtype (which is expensive) for the case of a single key
- It avoids computing the subtype for both left and right sides
In pseudo code:
case compute_keys_with_distinct_values(...) do
:error -> # distinct keys, returns both
[] -> # maps are the same, return left or right
[{key, left, right}] -> # single key diverge, compute the key union and put the result back
[{_key, left, right} | rest] = all ->
if subtype?(left, right) do
check_if_subtype_on_left?(rest)
else
check_if_subtype_on_right?(all)
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important point I didn't highlight in the PR description is that I'm not using subtype?
(which I understand is quite expensive), but a very lightweight simple_subtype?
(trivial_subtype?
) which only handles trivial cases like integer() subtype_of term()
or {integer(), string()} subtype_of {term(), term()}
(which are the kind of cases I saw in practices in the slow examples).
I don't expect it to be costly, because it only does mostly shallow pattern/guard-based comparisons and bails as soon as something is off (different # of keys, one key missing, one key not being a trivial subtype).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got it, but it will be slower than the equality check and, given we are traversing keys anyway, providing a more general solution than simple subtyping will yield other benefits. Basically, if we are already traversing the keys to optimize, let's optimize as much as we can?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, if we are already traversing the keys to optimize, let's optimize as much as we can?
Strongly agreed 👍
Will try to prepare some benchee bench as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a new PR #14215
Replaced by #14215 |
Seems to solve #14203
By avoiding building lists of very similar types in unions, which then get multiplied by the cartesian product in intersections, we avoid the explosion of the size of composite types.
Another nice benefit is that nested composite types are simplified, e.g. a nested map might become a
term()
after a union which can be easier to check downstream.