Skip to content

Django3: add new django.db.models.JSONField #8868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 14, 2022
Merged

Conversation

humitos
Copy link
Member

@humitos humitos commented Jan 31, 2022

Create new JSON fields postfixed with _json for all the JSON fields defined
via jsonfield third party package.

In [2]: Build.objects.filter(_config_json__build__image='readthedocs/build:latest')
Out[2]: <BuildQuerySet [<Build: Build qtile for admin (2)>, <Build: Build qtile for admin (1)>]>

In [3]: Build.objects.filter(_config_json__build__image='readthedocs/build:latest').first()._config
Out[3]: 
{'version': '1',
 'formats': ['htmlzip', 'epub', 'pdf'],
 'python': {'version': '3',
  'install': [{'requirements': None}],
  'use_system_site_packages': False},
 'conda': None,
 'build': {'image': 'readthedocs/build:latest', 'apt_packages': []},
 'doctype': 'sphinx',
 'sphinx': {'builder': 'sphinx',
  'configuration': '/usr/src/app/checkouts/readthedocs.org/user_builds/qtile/checkouts/stable/docs/conf.py',
  'fail_on_warning': False},
 'mkdocs': {'configuration': None, 'fail_on_warning': False},
 'submodules': {'include': 'all', 'exclude': [], 'recursive': True},
 'search': {'ranking': {}, 'ignore': []}}

In [4]: Build.objects.filter(_config_json__build__image='readthedocs/build:latest').first()._config_json
Out[4]: 
{'build': {'image': 'readthedocs/build:latest', 'apt_packages': []},
 'conda': None,
 'mkdocs': {'configuration': None, 'fail_on_warning': False},
 'python': {'install': [{'requirements': None}],
  'version': '3',
  'use_system_site_packages': False},
 'search': {'ignore': [], 'ranking': {}},
 'sphinx': {'builder': 'sphinx',
  'configuration': '/usr/src/app/checkouts/readthedocs.org/user_builds/qtile/checkouts/stable/docs/conf.py',
  'fail_on_warning': False},
 'doctype': 'sphinx',
 'formats': ['htmlzip', 'epub', 'pdf'],
 'version': '1',
 'submodules': {'exclude': [], 'include': 'all', 'recursive': True}}

In [5]:

Create new JSON fields postfixed with `_json` for all the JSON fields defined
via `jsonfield` third party package.
@humitos humitos requested a review from a team as a code owner January 31, 2022 12:19
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable. Is the plan to then remove the old fields, once we've converted the data? The migration path wasn't clear to me from the description or code.

Ah, I see in #8869 that is likely the plan.

# TODO: now that we are using a proper JSONField here, we could
# probably change this field to be a ForeignKey to avoid repeating the
# config file over and over again and re-use them to save db data as
# well
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, an FK seems reasonable but would be another field right? I agree that our current implementation is just a hacky FK :)

Copy link
Member Author

@humitos humitos Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'm thinking about having something like

# builds.models.Build
config = ForeignKey(BuildConfig)

# config.models.BuildConfig
data = JSONField(default=None)

... and later in the code, I'm expecting to do:

build.config = BuildConfig.objects.get_or_create(data=user_config)
build.save()

and relying on Postgres to do the work about deciding if it's the same or a new one properly 😄

@humitos
Copy link
Member Author

humitos commented Feb 3, 2022

Is the plan to then remove the old fields, once we've converted the data? The migration path wasn't clear to me from the description or code.

I'm sorry I didn't explain this properly.

My idea is to execute this current PR first that will create a new JSON field and populate it with the data we currently have as strings in our db. This step is not destructive and should be safe to deploy. The only concern may be how much time it will take to migrate all the Build's config fields.

The next step is #8869 which removes the old config string's fields and renames the new JSON fields so they are now used in production's code.

If we do both at the same deploy, we won't need to resync. However, if we only do the first one, we will do to the copy for new Build objects.

In theory, it should be safe to do both steps in the same deploy, but these things always scare me 😄

@stsewd
Copy link
Member

stsewd commented Feb 3, 2022

If we do both at the same deploy, we won't need to resync. However, if we only do the first one, we will do to the copy for new Build objects.

This looks like it will still result in some data loss.

  • Migrations adding the new field and the data migration are run
  • In the meantime, more objects are created (and will fill only the old json fields, leaving the new json fields empty)
  • Migration removing the old fields are run
  • Objects that were created in the meantime will result with empty json fields

I think we should split the migration in two: one creating the fields, and another doing the data migration, and to avoid data loss, we should override the save() methods of the models to populate the new json fields with the information from the old json fields when a model is saved. Then the steps will be as follows:

  • Run the migration that creates the new fields before deploying the webs
  • Deploy the code (now new objects will be saving the same data in both fields)
  • Run the data migration for the old models
  • Run the migration that renames the field (I think there will be some downtime, since the original fields are being deleted before the other one is renamed)
  • Since we were only setting the content of the old json fields to the new ones (we aren't accessing the field, just setting an attribute), the code on the save() method can be safely deleted in another deploy.

Apply Santos' technique to avoid lossing data while doing the deploy.

See #8868 (comment)
@humitos
Copy link
Member Author

humitos commented Feb 3, 2022

Good description of the potential problem 👍

@stsewd I updated this PR to split the migration and overwrite the .save method. Please, let me know if I understood your comment correctly or there is something else that's missing.

Copy link
Member

@stsewd stsewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to make sure to not run all migrations at once during deploy.

I'm not sure if there is an easy way to avoid the downtime from re-naming the fields... but hopefully it should be a quick operation.

@humitos
Copy link
Member Author

humitos commented Feb 8, 2022

I'm moving this out to next's week deploy.

- cast dict to make them serializable
- explicitly declare the fields used for IntegrationForm model
- use the default empty dict instead of `None` since it's not allowed in JSON fields
Instead of using JSON as string, we pass a real dict object.
@humitos humitos merged commit 40bc5b1 into master Feb 14, 2022
@humitos humitos deleted the humitos/new-json-fields branch February 14, 2022 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants