expressions: use indexes to speed up #228

Totktonada · 2018-09-14T17:01:23Z

Get first conjunct (one of && operands) of a root expression, which suitable to perform index lookup (if only &&s are in the root expression).
Get all disjuncts (|| operands) of a root expression (if only ||s are in the root expression and all operands are suitable to perform index lookup).
Other cases are subject to full scan (if filter is only argument).

It is follow up of #13.

From #13:

If no index is found based on filters, offset or from a connection (see
get_index_name), then it is worth to use the AST to perform index search.
Basically we need to collect all always-checked predicates (maybe it worth to
construct CNF?): ones that are not under OR operator. Then prioritize it:
equality first, then less-great. Then attempt to match each predicate against
existing indexes (by a name of the field it operates on). When a match found,
then rewrite AST to avoid redundant operations in process_tuple (remove the
operation that will be performed by an index).

The approach above is very-very basic and cannot handle many cases. But I
think it is on optimal point between efficiency and simplicity for now.

The text was updated successfully, but these errors were encountered:

Totktonada · 2018-09-26T14:53:57Z

After constant propagation we'll get AST of AND, OR and NOT of the Terms of
the following forms:

'x CMP_OP constant' or 'constant CMP_OP x';
'x * 2 CMP_OP constant', function call and others.

The first form can be used to perform the index lookup to get a superset of
the final result and then filter it using the entire expression to get the
final result. Such Term can have corresponding index part.

'Term AND Term AND ... AND Term' can have corresponding index parts, we can
match it against existing indexes and choose most selective one. We need to
find BETWEEN case (x > 6 && x <= 7), because it likely a way more selective
then any other non-EQ index lookup.

We should match Terms against TREE indexes available (and only EQ for HASH
indexes) and pay attention to the fact that only a last part can be non-EQ
in the original expression.

The example to illustrate the fact for the expression
p1 > 1 && p2 > 2 && p3 > 2:

box.schema.space.create('test')
box.space.test:create_index('primary', {parts = {1, 'unsigned', 2, 'unsigned',
    3, 'unsigned'}})
s = box.space.test
s:replace{1, 1, 1}
s:replace{1, 2, 2}
s:replace{2, 1, 1}
s:replace{2, 2, 2}
s:select({1, 2, 2}, {iterator = 'GT'})
- [2, 1, 1] // fail (p2 > 2 term)
- [2, 2, 2] // ok

So these are cases we match:

Set of EQ Terms.
One non-EQ Term.
BETWEEN on one part (generate non-EQ index lookup and STOP value).
Set of EQ Terms and one non-EQ Term (generate non-EQ index lookup and STOP
values using all EQ Terms).

The examples for the case 4:

x == 1 && y > 7 => INDEX {x,y} GE {1,7} STOP {>1,}

x == 1 && y == 2 && z > 7 => INDEX {x,y,z} GE {1,2,7} STOP {>1,,} STOP {,>2,}

'Conjunct OR Conjunct OR ... OR Conjunct' can be fetched with several index
lookups (one for each conjunct) when all conjuncts have an index. Otherwise
one of conjucts requires to do full scan and it is better to perform one full
scan for the entire expression.

Performing several index lookups also does not seem to be profitable in the
case when there are more then one non-EQ and non-BETWEEN lookups.

So we need to construct DNF (disjunctive normal form), match terms against
index parts, find BETWEEN patterns, match parts against indexes (and choose
the best one) for each Conjunct, then choose whether is it worth to perform
several index lookups or a full scan.

There is the alternative approach: don't construct DNF expliticly, but do
corresponding union / cartesian product math on found index parts. This is
harder to describe and possibly hides some details, so I decided to stick with
the approach described above.

Needed for #228.

Totktonada · 2022-04-13T22:52:08Z

I'm going to archive the repository. I'll proceed as follows:

Mark all open pull requests with the archived label and close.
Mark all open issues with archived and wontfix labels and close.
Archive the repository.

Consider the following projects:

Totktonada added enhancement New feature or request expressions labels Sep 14, 2018

Totktonada added a commit that referenced this issue Feb 4, 2019

expressions: add constant propagation pass

ecbc2be

Needed for #228.

Totktonada mentioned this issue Feb 4, 2019

expressions: add constant propagation pass #271

Merged

Totktonada added a commit that referenced this issue Feb 4, 2019

expressions: add constant propagation pass

95e3cf4

Needed for #228.

coveralls mentioned this issue Feb 4, 2019

WIP: Find index for an expression #251

Closed

Totktonada added wontfix This will not be worked on archived labels Apr 13, 2022

Totktonada closed this as completed Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expressions: use indexes to speed up #228

expressions: use indexes to speed up #228

Totktonada commented Sep 14, 2018

Totktonada commented Sep 26, 2018 •

edited

Loading

Totktonada commented Apr 13, 2022

expressions: use indexes to speed up #228

expressions: use indexes to speed up #228

Comments

Totktonada commented Sep 14, 2018

Totktonada commented Sep 26, 2018 • edited Loading

Totktonada commented Apr 13, 2022

Totktonada commented Sep 26, 2018 •

edited

Loading