Skip to content

Add build metadata collection #3751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
agjohnson opened this issue Mar 7, 2018 · 4 comments
Closed

Add build metadata collection #3751

agjohnson opened this issue Mar 7, 2018 · 4 comments
Labels
Improvement Minor improvement to code Needed: design decision A core team decision is required

Comments

@agjohnson
Copy link
Contributor

As we are pushing projects to use readthedocs.yml, we are losing the ability to query our database for builds that should be using a certain build container, python version, etc. Having a table that tracks the metadata for each build would be helpful for users as well as for debug and making design decisions on our side.

Some of the metadata we might track:

  • Build image container
  • Build image SHA
  • Python version
  • Requirements file
  • I guess most of the settings in readthedocs.yml, though we also want this for historical build metadata that is in the database

Anything that we might want to show to the user on the build output page could be useful to track.

There are a number of ways to address this from the technical side. Concerning storage:

  • We could make a relation table with fields for each piece of metadata we want to track, though i think we'll want to shift this metadata fairly frequently
  • We could make and EAV style table for storing this data, though I'm generally 👎 on that pattern
  • We could store the metadata as a JSON blob

I lean towards JSONField, as we don't have to worry about the schema of what we're storing. Metadata will be mostly used for some display, querying all builds/projects will be more expensive but very infrequent. We might, for instance, want to query to see how many projects are using our latest build image, but that will be rare.

@agjohnson agjohnson added Improvement Minor improvement to code Needed: design decision A core team decision is required labels Mar 7, 2018
@agjohnson agjohnson added this to the Build output page milestone Mar 7, 2018
@humitos
Copy link
Member

humitos commented Mar 9, 2018

I agree with this idea in general.

I also prefer JSONField because the reason you mentioned and besides because it's easy to make queries and rely completely in the db instead of end writting hacky chunks of Python to query what we want.

Creating a relation table, will involve more work on maintaining it (adding new fields / removing olds / making mistakes on the data type, etc). I prefer to avoid this.

This raised me more questions:

  1. is YAML fully translatable into JSON?
  2. what we are going to do while we support both Admin UI + YAML for settings? We can't just dump the YAML into the db, we will need to dump the mix of them (exactly what we are using today for the build).

Relying on the data from the JSONField to display things to the user under the Build page makes me a little of noise on "relying on data that maybe is not available / chaged the scheme" and won't be easy to notice.

@agjohnson
Copy link
Contributor Author

YAML -> dict -> JSON should be fine as we don't have any odd types in there like datetime.

Fun unrelated fact: python's yaml can decode json files. The opposite isn't true though.

Relying on the data from the JSONField to display things to the user under the Build page makes me a little of noise on "relying on data that maybe is not available / chaged the scheme" and won't be easy to notice.

All of this will need to be conditional in the templates, yeah. That is a downside to schemaless here. Writing these templates might be annoying, but we'll have legacy builds that are missing this data to worry about as well.

@stsewd
Copy link
Member

stsewd commented Dec 12, 2018

We are already saving the config used on the db #4749, I guess we can close this.

@humitos
Copy link
Member

humitos commented Jan 23, 2019

The only thing missing from the original request is the build.image.sha. I suppose we can close it anyway, since we are not going to make decisions based on that (we have the build.image.name)

The Docker image SHA is saved into the readthedocs-environment.json though, in case we need to debug something very particular for the environment.

In [7]: test_builds.builds.last().config
Out[7]: 
{'version': '1',
 'formats': ['htmlzip', 'epub', 'pdf'],
 'python': {'version': 2,
  'requirements': None,
  'install_with_pip': False,
  'install_with_setup': False,
  'extra_requirements': [],
  'use_system_site_packages': False},
 'conda': None,
 'build': {'image': 'readthedocs/build:2.0'},
 'doctype': 'sphinx',
 'sphinx': {'builder': 'sphinx',
  'configuration': '/home/docs/checkouts/readthedocs.org/user_builds/test-builds/checkouts/latest/docs/conf.py',
  'fail_on_warning': False},
 'mkdocs': {'configuration': None, 'fail_on_warning': False},
 'submodules': {'include': 'all', 'exclude': [], 'recursive': True}}

In [8]: 

I'm closing here but feel free to reopen if you consider, @agjohnson.

Also, we should open another issue to discuss about how to present this information to the user somewhere.

@humitos humitos closed this as completed Jan 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Minor improvement to code Needed: design decision A core team decision is required
Projects
None yet
Development

No branches or pull requests

3 participants