-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Design doc: collect data about builds #8124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
2046cb3
Design doc: collect data about builds
stsewd d6cc0c6
Apply suggestions from code review
stsewd b10d437
Updates
stsewd b21a41f
Mention json fields
stsewd 8c6810f
Mention to only save data for one year inside the db
stsewd 02443d9
Merge branch 'master' into build-telemetry-design-doc
stsewd ba20364
We have json fields now
stsewd bc7d49f
More updates
stsewd 8cfb254
Another update
stsewd ae248bb
Update
stsewd 6752de1
organization was missing
stsewd a58ba22
Updates
stsewd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,315 @@ | ||
Collect Data About Builds | ||
========================= | ||
|
||
We may want to take some decisions in the future about deprecations and supported versions. | ||
Right now we don't have data about the usage of packages and their versions on Read the Docs | ||
to be able to make an informed decision. | ||
|
||
.. contents:: | ||
:local: | ||
:depth: 3 | ||
|
||
Tools | ||
----- | ||
|
||
Kibana: | ||
- https://www.elastic.co/kibana | ||
- We can import data from ES. | ||
- Cloud service provided by Elastic. | ||
Superset: | ||
- https://superset.apache.org/ | ||
- We can import data from several DBs (including postgres and ES). | ||
- Easy to setup locally, but doesn't look like there is cloud provider for it. | ||
Metabase: | ||
- https://www.metabase.com/ | ||
- We can import data from several DBs (including postgres). | ||
- Cloud service provided by Metabase. | ||
|
||
Summary: We have several tools that can inspect data form a postgres DB, | ||
and we also have ``Kibana`` that works *only* with ElasticSearch. | ||
The data to be collected can be saved in a postgres or ES database. | ||
Currently, we are making use of Metabase to get other information, | ||
so it's probably the right choice for this task. | ||
|
||
Data to be collected | ||
-------------------- | ||
|
||
The following data can be collected after installing all dependencies. | ||
|
||
Configuration file | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
We are saving the config file in our database, | ||
but to save some space we are saving it only if it's different than the one from a previous build | ||
(if it's the same we save a reference to it). | ||
|
||
The config file being saved isn't the original one used by the user, | ||
but the result of merging it with its default values. | ||
|
||
We may also want to have the original config file, | ||
so we know which settings users are using. | ||
|
||
PIP packages | ||
~~~~~~~~~~~~ | ||
|
||
We can get a json with all and root dependencies with ``pip list``. | ||
This will allow us to have the name of the packages and their versions used in the build. | ||
|
||
.. code-block:: | ||
|
||
$ pip list --pre --local --format json | jq | ||
# and | ||
$ pip list --pre --not-required --local --format json | jq | ||
[ | ||
{ | ||
"name": "requests-mock", | ||
"version": "1.8.0" | ||
}, | ||
{ | ||
"name": "requests-toolbelt", | ||
"version": "0.9.1" | ||
}, | ||
{ | ||
"name": "rstcheck", | ||
"version": "3.3.1" | ||
}, | ||
{ | ||
"name": "selectolax", | ||
"version": "0.2.10" | ||
}, | ||
{ | ||
"name": "slumber", | ||
"version": "0.7.1" | ||
}, | ||
{ | ||
"name": "sphinx-autobuild", | ||
"version": "2020.9.1" | ||
}, | ||
{ | ||
"name": "sphinx-hoverxref", | ||
"version": "0.5b1" | ||
}, | ||
] | ||
|
||
With the ``--not-required`` option, pip will list only the root dependencies. | ||
|
||
Conda packages | ||
~~~~~~~~~~~~~~ | ||
|
||
We can get a json with all dependencies with ``conda list --json``. | ||
That command gets all the root dependencies and their dependencies | ||
(there is no way to list only the root dependencies), | ||
so we may be collecting some noise, but we can use ``pip list`` as a secondary source. | ||
|
||
.. code-block:: | ||
|
||
$ conda list --json --name conda-env | ||
|
||
[ | ||
{ | ||
"base_url": "https://conda.anaconda.org/conda-forge", | ||
"build_number": 0, | ||
"build_string": "py_0", | ||
"channel": "conda-forge", | ||
"dist_name": "alabaster-0.7.12-py_0", | ||
"name": "alabaster", | ||
"platform": "noarch", | ||
"version": "0.7.12" | ||
}, | ||
{ | ||
"base_url": "https://conda.anaconda.org/conda-forge", | ||
"build_number": 0, | ||
"build_string": "pyh9f0ad1d_0", | ||
"channel": "conda-forge", | ||
"dist_name": "asn1crypto-1.4.0-pyh9f0ad1d_0", | ||
"name": "asn1crypto", | ||
"platform": "noarch", | ||
"version": "1.4.0" | ||
}, | ||
{ | ||
"base_url": "https://conda.anaconda.org/conda-forge", | ||
"build_number": 3, | ||
"build_string": "3", | ||
"channel": "conda-forge", | ||
"dist_name": "python-3.5.4-3", | ||
"name": "python", | ||
"platform": "linux-64", | ||
"version": "3.5.4" | ||
} | ||
] | ||
|
||
APT packages | ||
~~~~~~~~~~~~ | ||
|
||
We can get the list from the config file, | ||
or we can list the packages installed with ``dpkg --get-selections``. | ||
That command would list all pre-installed packages as well, so we may be getting some noise. | ||
|
||
.. code-block:: console | ||
|
||
$ dpkg --get-selections | ||
|
||
adduser install | ||
apt install | ||
base-files install | ||
base-passwd install | ||
bash install | ||
binutils install | ||
binutils-common:amd64 install | ||
binutils-x86-64-linux-gnu install | ||
bsdutils install | ||
build-essential install | ||
|
||
We can get the installed version with: | ||
|
||
.. code-block:: console | ||
|
||
$ dpkg --status python3 | ||
|
||
Package: python3 | ||
Status: install ok installed | ||
Priority: optional | ||
Section: python | ||
Installed-Size: 189 | ||
Maintainer: Ubuntu Developers <[email protected]> | ||
Architecture: amd64 | ||
Multi-Arch: allowed | ||
Source: python3-defaults | ||
Version: 3.8.2-0ubuntu2 | ||
Replaces: python3-minimal (<< 3.1.2-2) | ||
Provides: python3-profiler | ||
Depends: python3.8 (>= 3.8.2-1~), libpython3-stdlib (= 3.8.2-0ubuntu2) | ||
Pre-Depends: python3-minimal (= 3.8.2-0ubuntu2) | ||
Suggests: python3-doc (>= 3.8.2-0ubuntu2), python3-tk (>= 3.8.2-1~), python3-venv (>= 3.8.2-0ubuntu2) | ||
Description: interactive high-level object-oriented language (default python3 version) | ||
Python, the high-level, interactive object oriented language, | ||
includes an extensive class library with lots of goodies for | ||
network programming, system administration, sounds and graphics. | ||
. | ||
This package is a dependency package, which depends on Debian's default | ||
Python 3 version (currently v3.8). | ||
Homepage: https://www.python.org/ | ||
Original-Maintainer: Matthias Klose <[email protected]> | ||
|
||
Or with | ||
|
||
.. code-block:: console | ||
|
||
$ apt-cache policy python3 | ||
|
||
Installed: 3.8.2-0ubuntu2 | ||
Candidate: 3.8.2-0ubuntu2 | ||
Version table: | ||
*** 3.8.2-0ubuntu2 500 | ||
500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages | ||
100 /var/lib/dpkg/status | ||
|
||
Python | ||
~~~~~~ | ||
|
||
We can get the Python version from the config file when using a Python environment, | ||
and from the ``conda list`` output when using a Conda environment. | ||
|
||
OS | ||
~~ | ||
|
||
We can infer the OS version from the build image used in the config file, | ||
but since it changes with time, we can get it from the OS itself: | ||
|
||
.. code-block:: | ||
|
||
$ lsb_release --description | ||
Description: Ubuntu 18.04.5 LTS | ||
# or | ||
$ cat /etc/issue | ||
Ubuntu 18.04.5 LTS \n \l | ||
|
||
Format | ||
~~~~~~ | ||
|
||
The final information to be saved would consist of: | ||
|
||
- organization: the organization id/slug | ||
- project: the project id/slug | ||
- version: the version id/slug | ||
- build: the build id, date, length, status. | ||
- user_config: Original user config file | ||
- final_config: Final configuration used (merged with defaults) | ||
- packages.pip: List of pip packages with name and version | ||
- packages.conda: List of conda packages with name, channel, and version | ||
- packages.apt: List of apt packages | ||
- python: Python version used | ||
- os: Operating system used | ||
|
||
.. code-block:: json | ||
|
||
{ | ||
"organization": { | ||
"id": 1, | ||
"slug": "org" | ||
}, | ||
"project": { | ||
"id": 2, | ||
"slug": "docs" | ||
}, | ||
"version": { | ||
"id": 1, | ||
"slug": "latest" | ||
}, | ||
"build": { | ||
"id": 3, | ||
"date/start": "2021-04-20-...", | ||
"length": "00:06:34", | ||
"status": "normal", | ||
"success": true, | ||
"commit": "abcd1234" | ||
}, | ||
"config": { | ||
"user": {}, | ||
"final": {} | ||
}, | ||
"packages": { | ||
"pip": [{ | ||
"name": "sphinx", | ||
"version": "3.4.5" | ||
}], | ||
"pip_all": [ | ||
humitos marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
"name": "sphinx", | ||
"version": "3.4.5" | ||
}, | ||
{ | ||
"name": "docutils", | ||
"version": "0.16.0" | ||
} | ||
], | ||
"conda": [{ | ||
"name": "sphinx", | ||
"channel": "conda-forge", | ||
"version": "0.1" | ||
}], | ||
"apt": [{ | ||
"name": "python3-dev", | ||
"version": "3.8.2-0ubuntu2" | ||
}], | ||
}, | ||
"python": "3.7", | ||
"os": "ubuntu-18.04.5" | ||
} | ||
|
||
Storage | ||
------- | ||
|
||
All this information can be collected after the build has finished, | ||
and we can store it in a dedicated database (telemetry), using Django's models. | ||
|
||
Since this information isn't sensitive, | ||
we should be fine saving this data even if the project/version is deleted. | ||
As we don't care about historical data, | ||
we can save the information per-version and from their latest build only. | ||
And delete old data if it grows too much. | ||
|
||
Should we make heavy use of JSON fields? | ||
Or try to avoid nesting structures as possible? | ||
Like config.user/config.final vs user_config/final_config. | ||
humitos marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Or having several fields in our model instead of just one big json field? |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know if with this "list of dictionaries" we can query
I guess we can, but it may be good to double-check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do basic things like "start with
2.
", not sure about something more complex (we could use an > operator, but that will depend on the ascii ordering)