Skip to content
This repository was archived by the owner on Apr 14, 2022. It is now read-only.

expressions: use indexes to speed up #228

Closed
Totktonada opened this issue Sep 14, 2018 · 2 comments
Closed

expressions: use indexes to speed up #228

Totktonada opened this issue Sep 14, 2018 · 2 comments
Labels
archived enhancement New feature or request expressions wontfix This will not be worked on

Comments

@Totktonada
Copy link
Member

  • Get first conjunct (one of && operands) of a root expression, which suitable to perform index lookup (if only &&s are in the root expression).
  • Get all disjuncts (|| operands) of a root expression (if only ||s are in the root expression and all operands are suitable to perform index lookup).
  • Other cases are subject to full scan (if filter is only argument).

It is follow up of #13.

From #13:

If no index is found based on filters, offset or from a connection (see
get_index_name), then it is worth to use the AST to perform index search.
Basically we need to collect all always-checked predicates (maybe it worth to
construct CNF?): ones that are not under OR operator. Then prioritize it:
equality first, then less-great. Then attempt to match each predicate against
existing indexes (by a name of the field it operates on). When a match found,
then rewrite AST to avoid redundant operations in process_tuple (remove the
operation that will be performed by an index).

The approach above is very-very basic and cannot handle many cases. But I
think it is on optimal point between efficiency and simplicity for now.

@Totktonada Totktonada added enhancement New feature or request expressions labels Sep 14, 2018
@Totktonada
Copy link
Member Author

Totktonada commented Sep 26, 2018

After constant propagation we'll get AST of AND, OR and NOT of the Terms of
the following forms:

  1. 'x CMP_OP constant' or 'constant CMP_OP x';
  2. 'x * 2 CMP_OP constant', function call and others.

The first form can be used to perform the index lookup to get a superset of
the final result and then filter it using the entire expression to get the
final result. Such Term can have corresponding index part.

'Term AND Term AND ... AND Term' can have corresponding index parts, we can
match it against existing indexes and choose most selective one. We need to
find BETWEEN case (x > 6 && x <= 7), because it likely a way more selective
then any other non-EQ index lookup.

We should match Terms against TREE indexes available (and only EQ for HASH
indexes) and pay attention to the fact that only a last part can be non-EQ
in the original expression.

The example to illustrate the fact for the expression
p1 > 1 && p2 > 2 && p3 > 2:

box.schema.space.create('test')
box.space.test:create_index('primary', {parts = {1, 'unsigned', 2, 'unsigned',
    3, 'unsigned'}})
s = box.space.test
s:replace{1, 1, 1}
s:replace{1, 2, 2}
s:replace{2, 1, 1}
s:replace{2, 2, 2}
s:select({1, 2, 2}, {iterator = 'GT'})
- [2, 1, 1] // fail (p2 > 2 term)
- [2, 2, 2] // ok

So these are cases we match:

  1. Set of EQ Terms.
  2. One non-EQ Term.
  3. BETWEEN on one part (generate non-EQ index lookup and STOP value).
  4. Set of EQ Terms and one non-EQ Term (generate non-EQ index lookup and STOP
    values using all EQ Terms).

The examples for the case 4:

x == 1 && y > 7 => INDEX {x,y} GE {1,7} STOP {>1,}

x == 1 && y == 2 && z > 7 => INDEX {x,y,z} GE {1,2,7} STOP {>1,,} STOP {,>2,}

'Conjunct OR Conjunct OR ... OR Conjunct' can be fetched with several index
lookups (one for each conjunct) when all conjuncts have an index. Otherwise
one of conjucts requires to do full scan and it is better to perform one full
scan for the entire expression.

Performing several index lookups also does not seem to be profitable in the
case when there are more then one non-EQ and non-BETWEEN lookups.

So we need to construct DNF (disjunctive normal form), match terms against
index parts, find BETWEEN patterns, match parts against indexes (and choose
the best one) for each Conjunct, then choose whether is it worth to perform
several index lookups or a full scan.

There is the alternative approach: don't construct DNF expliticly, but do
corresponding union / cartesian product math on found index parts. This is
harder to describe and possibly hides some details, so I decided to stick with
the approach described above.

@Totktonada
Copy link
Member Author

I'm going to archive the repository. I'll proceed as follows:

  • Mark all open pull requests with the archived label and close.
  • Mark all open issues with archived and wontfix labels and close.
  • Archive the repository.

Consider the following projects:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
archived enhancement New feature or request expressions wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant