Skip to content

[Showerthought] Use virtual indexes for zero-downtime rebuilds? #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Grendel7 opened this issue Dec 14, 2017 · 13 comments
Open

[Showerthought] Use virtual indexes for zero-downtime rebuilds? #75

Grendel7 opened this issue Dec 14, 2017 · 13 comments

Comments

@Grendel7
Copy link
Contributor

Right now, when you rebuild a index, the index is nuked first, then rebuilt from scratch. During this reindexing process, any searches to the index might fail.

Instead, you could use "virtual indexes" to perform a rebuild without downtime. By that, I mean that you create a real index with a different name, e.g. index_name.<timestamp>. You can then point an alias for index_name and point it to the real index.

When rebuilding the index, you could create a new index in the background, populate it, then switch the aliases over. That way, the application can still use the old index while the new index is being created.

Most Elasticsearch applications I know use something like this and I'm willing to contribute something similar to this project. However, before I do that, I would like to know whether this is a desirable feature to have or whether it's unnecessary complexity for a generic library.

@Grendel7 Grendel7 changed the title [Showerthought] Use virtual indexing for zero-downtime rebuilds? [Showerthought] Use virtual indexes for zero-downtime rebuilds? Dec 14, 2017
@andreyrusanov
Copy link

We were using feature like this (self-made) on several projects as well, because migrations happens on the way for DB and Search engine as well. We used raw numbers instead of timestamps for simplicity (at least for our case it was simpler)

@andreyrusanov
Copy link

One note - I believe if it will be introduced it needs to be done explicitly with some management command or something like this.

@ezbc
Copy link

ezbc commented May 23, 2019

I need this feature for a current project. Is this feature still desired in the library? If so I can start a PR.

From what I understand the command accepts an index name argument to build in the background, and the alias name argument. The command creates and builds the new index. When the new index is finished rebuilding the alias will be updated to point to the new index.

Is this the desired behavior?

@josh-stableprice
Copy link

josh-stableprice commented Jun 5, 2019

I would suggest that the alias name and potentially the new index name suffix are configurable.

For example adding a --alias products and --index_suffix 20190605

This will then allow people to put whatever meaning to their reindexes that they need to capture, the index alias could default to f'{index_name}_alias' and the indexes to f'{index_name[:64]}_{uuid4}'

@ezbc
Copy link

ezbc commented Jun 5, 2019

Good thinking. Should the user be able to create an alias for each model so the virtual reindexing could be done for each model in the registry in one command? Building off your suggestion, perhaps the CLI could look like:

--alias_prefix alias_prefix and --index_suffix 20190605 and for each model an alias would be created following {alias_prefix}_{model_name} or something naming schema based off the model name and alias?

How does this sound?

@josh-stableprice
Copy link

Sounds perfect

@josh-stableprice
Copy link

@ezbc if you take a look at https://github.com/rtfd/readthedocs.org/pull/4368/files#diff-2859d2a6db2d38d6545b0ecadbae2f61R58 it looks like @safwanrahman has already done all of this along with making it celery based in the readthedocs project. We probably would want to heavily borrow this

@safwanrahman
Copy link
Collaborator

Thanks @ezbc for your interest. Yes, this feature is very much desired.
I implemented this feature in Read The Docs, but did not get time to push it to this package. If you would like to start working on this, I would be very much happy to assist you in this. You can borrow the implementation I have done in RTD as mentioned by @josh-stableprice

@ezbc
Copy link

ezbc commented Jun 10, 2019

Thanks for pointing out the PR for RTD. I’ll get started on this feature this week and bring up any issues or questions along the way.

@ezbc
Copy link

ezbc commented Jun 10, 2019

@josh-stableprice and @safwanrahman I'm wondering if we should always delete the old index or not after a successful population of a new index. One use case I can think of for keeping indexes is if a user wanted to verify the new index before switching over the alias.

If we did not delete the old index automatically that would open a can of worms for the user to manage existing indexes, e.g. change aliases and delete old indexes. One option is to automatically delete the old index for now and add the functionality later for a user to not delete the old index and add commands to manage the old indexes.

What are your thoughts?

@josh-stableprice
Copy link

josh-stableprice commented Jun 11, 2019 via email

@ezbc
Copy link

ezbc commented Sep 5, 2019

@josh-stableprice or @safwanrahman, I'm getting back into this now.

I'm considering if the aliases should all be updated in the same transaction after each model has a new rebuilt index. This seems like the safest option to me in case an app with multiple models deploys breaking changes for the model indexes.

What do you think?

@josh-stableprice
Copy link

josh-stableprice commented Sep 11, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants