Skip to content

Optimize map unions to avoid building long lists #14215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 24, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 124 additions & 2 deletions lib/elixir/lib/module/types/descr.ex
Original file line number Diff line number Diff line change
Expand Up @@ -1278,8 +1278,130 @@ defmodule Module.Types.Descr do

defp map_only?(descr), do: empty?(Map.delete(descr, :map))

# Union is list concatenation
defp map_union(dnf1, dnf2), do: dnf1 ++ (dnf2 -- dnf1)
defp map_union(dnf1, dnf2) do
# Union is just concatenation, but we rely on some optimization strategies to
# avoid the list to grow when possible

# first pass trying to identify patterns where two maps can be fused as one
with [{tag1, pos1, []}] <- dnf1,
[{tag2, pos2, []}] <- dnf2,
strategy when strategy != nil <- map_union_optimization_strategy(tag1, pos1, tag2, pos2) do
case strategy do
:all_equal ->
dnf1

:any_map ->
[{:open, %{}, []}]

{:one_key_difference, key, v1, v2} ->
new_pos = Map.put(pos1, key, union(v1, v2))
[{tag1, new_pos, []}]

:left_subtype_of_right ->
dnf2

:right_subtype_of_left ->
dnf1
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can encapsulate this and use it on map_normalize too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean instead of the map_non_negated_fuse / fusible_maps / map_non_negated_fuse_pair part right?
Seems it's doing very similar stuff, let's see!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I can also be the one giving that a try. We can merge this once the comments above are addressed, then I can work on the map_normalize and you work on tuples? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got it actually! Will push in a sec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ec96574

I didn't spend time on renaming but I think we can probably use the term fuse / fusion etc more for consistency.
Feel free to rename as you see fit!

else
# otherwise we just concatenate and remove structural duplicates
_ -> dnf1 ++ (dnf2 -- dnf1)
end
end

defp map_union_optimization_strategy(tag1, pos1, tag2, pos2)
defp map_union_optimization_strategy(tag, pos, tag, pos), do: :all_equal
defp map_union_optimization_strategy(:open, empty, _, _) when empty == %{}, do: :any_map
defp map_union_optimization_strategy(_, _, :open, empty) when empty == %{}, do: :any_map

defp map_union_optimization_strategy(tag, pos1, tag, pos2)
when map_size(pos1) == map_size(pos2) do
:maps.iterator(pos1)
|> :maps.next()
|> do_map_union_optimization_strategy(pos2, :all_equal)
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not handle cases where the open map on the left has some extra fields that are set to if_set(term()), and the right map is closed.

Example:

assert union(
        open_map(a: if_set(term())),
        closed_map([])
      ) == open_map(a: if_set(term()))

Similarly, what if we have a larger (in size) open map as pos1, but which is a supertype of pos2? Then the only tried strategy will be l.1340 which leads to :left_subtype_of_right.

Example:

 assert union(
        open_map(a: if_set(term()), b: number()),
        open_map(b: integer())
      ) == open_map(a: if_set(term()), b: number())

I don't think those are case that necessarily need to be covered, but adding those tests to highlight it would prevent us discovering this again.

Copy link
Contributor Author

@sabiwara sabiwara Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was totally forgetting about if_set.

I don't think those are case that necessarily need to be covered

Yeah this is an optimization supposed to deal with some "obvious" cases that happen frequently, so it might be OK not to catch all cases (we're not dealing with negs either).

But in this case it might be possible to implement in the current pass with something like:

  • if one key is only on the side of the supertype and its value is if_set, continue inferring this supertype relation
  • if we can switch to the supertype strategy, do it
  • otherwise bail

The map size issue is a real problem though... Perhaps by changing the internal representation to store if_set as part of a different map, we can easily compute the size of the required map, and have a separate pass for optional keys?
This might be overkill for this particular use case, but if this new representation can simplify a bunch of other places such as subtyping etc it might be worth considering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map size issue is a real problem though...

I think it is fine because our goal is to traverse the smallest map for performance. The full algorithm does require traversing both sides but the point here is precisely to not implement the full algorithm. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

I was more thinking if we get bottlenecks in the future due to if_set and if there was a way to optimize them in other parts too, but it's best to avoid speculation and to wait if real world slow/pathological cases to show up, and iterate then.

defp map_union_optimization_strategy(:open, pos1, _, pos2)
when map_size(pos1) <= map_size(pos2) do
:maps.iterator(pos1)
|> :maps.next()
|> do_map_union_optimization_strategy(pos2, :right_subtype_of_left)
end

defp map_union_optimization_strategy(_, pos1, :open, pos2)
when map_size(pos1) >= map_size(pos2) do
:maps.iterator(pos2)
|> :maps.next()
|> do_map_union_optimization_strategy(pos1, :right_subtype_of_left)
|> case do
:right_subtype_of_left -> :left_subtype_of_right
nil -> nil
end
end

defp map_union_optimization_strategy(_, _, _, _), do: nil

defp do_map_union_optimization_strategy(:none, _, status), do: status

defp do_map_union_optimization_strategy({key, v1, iterator}, pos2, status) do
with %{^key => v2} <- pos2,
next_status when next_status != nil <- map_union_next_strategy(key, v1, v2, status) do
do_map_union_optimization_strategy(:maps.next(iterator), pos2, next_status)
else
_ -> nil
end
end

defp map_union_next_strategy(key, v1, v2, status)

# structurally equal values do not impact the ongoing strategy
defp map_union_next_strategy(_key, same, same, status), do: status

defp map_union_next_strategy(key, v1, v2, :all_equal) do
if key != :__struct__, do: {:one_key_difference, key, v1, v2}
end

defp map_union_next_strategy(_key, v1, v2, {:one_key_difference, _, d1, d2}) do
# we have at least two key differences now, we switch strategy
# if both are subtypes in one direction, keep checking
cond do
trivial_subtype?(d1, d2) and trivial_subtype?(v1, v2) -> :left_subtype_of_right
trivial_subtype?(d2, d1) and trivial_subtype?(v2, v1) -> :right_subtype_of_left
true -> nil
end
end

defp map_union_next_strategy(_key, v1, v2, :left_subtype_of_right) do
if trivial_subtype?(v1, v2), do: :left_subtype_of_right
end

defp map_union_next_strategy(_key, v1, v2, :right_subtype_of_left) do
if trivial_subtype?(v2, v1), do: :right_subtype_of_left
end

# cheap to compute sub-typing
# a trivial subtype is always a subtype, but not all subtypes are subtypes
defp trivial_subtype?(_, :term), do: true
defp trivial_subtype?(same, same), do: true

defp trivial_subtype?(%{} = left, %{} = right)
when map_size(left) == 1 and map_size(right) == 1 do
case {left, right} do
Copy link
Contributor Author

@sabiwara sabiwara Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for an even simpler version with just shallow comparisons (except for the structural comparison on top)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion, let's go with the regular subtyping? We apply this on very few cases and once we start having structs nested inside structs, this will make a difference?

In other words, let's go with the "slow" but more general and less code version and we optimize it again in the future? Especially to avoid getting the logic wrong here... :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes total sense, especially since
a) we'll bail at the first non-match
b) we avoid building lists and make a bunch of stuff cheaper downstream so the extra precision might pay
c) it's easier to optimize based on actually slow projects, so deferring this one seems like it's a good idea

And perhaps we'll be able to make subtype? cheaper in the future and bail early in some cases or something, in which case the whole typing would benefit.

{%{atom: _}, %{atom: {:negation, neg}}} when neg == %{} ->
true

{%{map: _}, %{map: [{:open, pos, []}]}} when pos == %{} ->
true

{%{bitmap: bitmap1}, %{bitmap: bitmap2}} ->
(bitmap1 &&& bitmap2) === bitmap2

_ ->
false
end
end

defp trivial_subtype?(_, _), do: false

# Given two unions of maps, intersects each pair of maps.
defp map_intersection(dnf1, dnf2) do
Expand Down
41 changes: 41 additions & 0 deletions lib/elixir/test/elixir/module/types/descr_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,47 @@ defmodule Module.Types.DescrTest do
assert union(difference(list(term()), list(integer())), list(integer()))
|> equal?(list(term()))
end

test "optimizations" do
# The tests are checking the actual implementation, not the semantics.
# This is why we are using structural comparisons.
# It's fine to remove these if the implementation changes, but breaking
# these might have an important impact on compile times.

# Optimization one: same tags, all but one key are structurally equal
assert union(
open_map(a: float(), b: atom()),
open_map(a: integer(), b: atom())
) == open_map(a: union(float(), integer()), b: atom())

assert union(
closed_map(a: float(), b: atom()),
closed_map(a: integer(), b: atom())
) == closed_map(a: union(float(), integer()), b: atom())

# Optimization two: we can tell that one map is a trivial subtype of the other:

assert union(
closed_map(a: term(), b: term()),
closed_map(a: float(), b: binary())
) == closed_map(a: term(), b: term())

assert union(
open_map(a: term()),
closed_map(a: float(), b: binary())
) == open_map(a: term())

assert union(
closed_map(a: float(), b: binary()),
open_map(a: term())
) == open_map(a: term())

# Do we want this want to pass or keep shallow checks only?
# assert union(
# closed_map(a: term(), b: tuple([term(), term()])),
# closed_map(a: float(), b: tuple([atom(), binary()]))
# ) == closed_map(a: term(), b: term())
end
end

describe "intersection" do
Expand Down
Loading