Skip to content
This repository was archived by the owner on Apr 14, 2022. It is now read-only.

Index lookup by a partial key #65

Merged
merged 7 commits into from
Mar 7, 2018
Merged

Conversation

Totktonada
Copy link
Member

@Totktonada Totktonada added enhancement New feature or request prio1 labels Mar 3, 2018
@Totktonada Totktonada self-assigned this Mar 3, 2018
@Totktonada Totktonada force-pushed the index-lookup-by-partial-key branch from 26cefbc to 5ed3cea Compare March 4, 2018 16:28
@Totktonada Totktonada requested a review from Kasen March 5, 2018 16:11
Copy link
Contributor

@Khatskevich Khatskevich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watched all except last two commits

@@ -1,5 +1,7 @@
#!/usr/bin/env tarantool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in commit message (tarantoo) 08529e9

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -1,5 +1,7 @@
#!/usr/bin/env tarantool

shard_initialized = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the variable be local for the module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed.

local variables_1_3 = {user_str = 'user_str_b'}
local result = gql_query_1:execute(variables_1_3)
results = results .. print_and_return(
('RESULT\n%s'):format(yaml.encode(result)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, If you write rests this way, at least give any unique identifiers to its output.
What I see in a result file is a lot of data and I can not find there the result of the particular query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests do not check if new functions work correctly(
Tests on added feature needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Done in the latter commit.
  2. We check here that functionality was not broken after the change. Index lookup check depends on metainfo (Add query meta #71), will be tracked by Test we use index lookup as expected for top-level collections #72.

Copy link
Member Author

@Totktonada Totktonada Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Khatskevich Are you about that here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -231,6 +231,13 @@ function compound_index_testdata.run_queries(gql_wrapper)
('RESULT\n%s'):format(yaml.encode(result)))
end)

utils.show_trace(function()
local variables_1_3 = {user_str = 'user_str_b'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test tests that the result is correct (user_str is user_str_b), however, does it test if the index is really used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be done in the scope of non-prio1 issues, tracked by #72.

get_best_matched_index(successor_node, new_filter)
if branch_index_name ~= nil and branch_len > max_len then
index_name = branch_index_name
max_len = branch_len
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should max_len = max_len + 1 be moved here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. It needed for comparing branches. I’ll rename it to max_branch_len to avoid confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I’ll change description of len return argument as (matched parts + 1) and will describe the reason why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved branch_len = branch_len + 1 to here.

--- `value_list`
---
--- @treturn table `value_list` (optional) is values list from the `filter`
--- argument ordered in the such way that can be passed to the found index (has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in such way that it can be passed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

--- @treturn table `value_list` is values list from the `filter` argument
--- ordered in the such way that can be passed to the found index (has some
--- meaning only when `index_name ~= nil`)
--- @treturn table `new_filter` is the filter value w/o values extracted into
Copy link
Contributor

@Khatskevich Khatskevich Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest name brute_filter (or scan_filter) for more semantics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree that common names are bad for domain-specific things. I’ll consider post_filter names or smth like so when there will less pending PRs.

There are related questions / points:

  1. post_filter (postprocessing filter) name will loose some of its meaning once we’ll support pushdown filtering to sotrage nodes.
  2. There are two other things of the same (at least for now) structure: pivot.filter (slow case of offset) and determinator of union-type connections (Support destination_collection deducible from a parent object #8).

@@ -402,6 +441,34 @@ local function build_lookup_index_name(indexes)
return lookup_index_name
end

--- Build `parts_tree` to use in @{get_index_name} for lookup best matching
--- index.
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It required from me some time to get the return value structure.
I suppose it would be great if you draw small schematic image here. e.g.

transformation example
i1: f1 f2 f3 f4
i2: f1 f2 f4
become
f1--- f2 -- f3 -- f4 --> i1
         `-- f4 --> i2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

--- @tparam table filter map of key-value to filter objects against
---
--- @treturn string `index_name` or `nil` is found index
---
Copy link
Contributor

@Khatskevich Khatskevich Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want some information about its running time here.
I tried to calculate it and I suppose it is something like O(CNT(prefix tree nodes)*SIZE(filter)) in worst case which is good for most use cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -14,6 +14,11 @@ local function print_and_return(...)
return table.concat({...}, ' ') .. '\n'
end

local function format_result(name, query, variables, result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature makes review process much easier)
However to make the previous commit readable, could you please move this commit on top?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I do not want you to waste your time. By now it is not important anymore.

@Totktonada Totktonada force-pushed the index-lookup-by-partial-key branch from 5ed3cea to 5550c3e Compare March 6, 2018 23:54
Copy link
Contributor

@Khatskevich Khatskevich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main points are negotiated verbally.
I have made minor comments, while overall it looks good enough for me.

local variables_1_3 = {user_str = 'user_str_b'}
local result = gql_query_1:execute(variables_1_3)
results = results .. print_and_return(
('RESULT\n%s'):format(yaml.encode(result)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

--- Future optimizations
--- --------------------
---
--- * replace table.copy() with something more light (maybe 'closed set' of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can delete item before recursive call and insert it back after

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that it is not so easy, because of pairs() invalidation.

--- (M * K)). Nodes count (M * K) can be limited upside as count of index
--- parts in all indexes, so we have O(N^2 * COUNT(index parts)).
---
--- We can consider worst case scenario when any node has any of filter keys as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is useful. But it took effort and I do not insist on it's deletion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different views on the complexity problem: dependency of filter size in context of fixed indexes or in context of worst-case indexes. Maybe it is something one should not consider.

local new_filter = table.copy(filter)
new_filter[k] = nil
local branch_index_name, branch_len =
get_best_matched_index(successor_node, new_filter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move + 1 here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Still don’t see why it is better, but changed because of lack of any opinion about that.

--- Complexity
--- ----------
---
--- In short: O(N^2 * COUNT(index parts)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide here information, that n= len(filter)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

--- \ \
--- \ + --> abc
--- \
--- + ------> abc --> efg --> hij
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please draw also index names as leaf nodes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaf nodes are actually 'abc', 'hij' and so on, but I added index names.

@Totktonada Totktonada force-pushed the index-lookup-by-partial-key branch from 5550c3e to 9243e2c Compare March 7, 2018 19:00
The change is about code readability only, because we don't run `app =
tarantool` default server from existing snapshots. No functional changes
made.
The change is needed because we allow user to provide its own
`get_index` function. Lack of index existence check can lead to an error
that can be hard to debug.
The case described in [1]. The function proposed to including into shard
in [2].

[1]: tarantool/shard#59
[2]: tarantool/shard#60
@Totktonada Totktonada force-pushed the index-lookup-by-partial-key branch from 9243e2c to db859fa Compare March 7, 2018 19:10
@Totktonada Totktonada merged commit 096fead into master Mar 7, 2018
@Totktonada Totktonada deleted the index-lookup-by-partial-key branch March 7, 2018 19:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request prio1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants