Skip to content

Manually define dependency license type metadata not automatically specified #841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

per1234
Copy link
Contributor

@per1234 per1234 commented Jan 30, 2025

Background

The Licensed dependency license checker tool uses the licensee gem to automatically determine the license type based on data contained in the dependency codebase. licensee checks several files for this data. The discovered data is recorded in the licenses sequence of the dependency license metadata cache file. It might find multiple sources of licensing data. The way Licensed handles this case is described as:

https://github.com/github/licensed/blob/v5.0.1/docs/commands/status.md#checking-status-with-metadata-loaded-from-cached-files

If license: other is specified and all of the licenses entries match an allowed license a failure will not be logged

It is of course correct to treat the dependency as compatible under these conditions. However, the design of Licensed around the handling of multiple licensing data sources is not very user friendly:

Lack of Transparency Re: Detected License Type

Even though Licensed knows exactly which license type each of the sources was detected as, it does not record this data in the dependency license metadata cache file.

Ambiguous Special License Type Identifier

When multiple license data sources are found, Licensed sets the license type for the dependency to other in the license key of the dependency license metadata cache file.

Although it is correct for the tool to use a special identifier, unfortunately "Licensed" uses the same identifier for two significantly different cases:

  • license type was not identifiable from the data (e.g., modifications were made to standard license text)
  • multiple license types were identified from the data

The better approach would be for Licensed to use a separate identifier for each of these situations (e.g., other, multi).

Failure to Set License Type Even when Identified

Even when all of the multiple data sources are identified as the same license type, "Licensed" still unnecessarily sets the license key of the dependency license metadata cache file to other instead of setting it to the identifier of the identified license.

Problem

Project maintainers expect that an identified license type of other means that the license type could not be identified and that they must manually identify and set the license type in order for the compliance check to pass. They will be extremely concerned to find that the check is passing even though a dependency's license type is defined as other, as this appears to be a false negative (meaning that the system is not effectively enforcing compliance).

The ambiguous special license type identifier and lack of information about the detected type of individual license data sources will make it impossible for them to understand why the check is passing despite the lack of a defined compatible license type, and so they will waste time troubleshooting what is actually a completely functional system.

Resolution

Always define the license type in the metadata cache, even when doing so is not required to get a passing check.

In this case, all the dependencies previously assigned the other identifier due to having multiple license data sources actually had a single license type. So this was handled just the same as is done when a dependency is assigned an other identifier due to the license data not being machine identifiable.

…ecified

Background
----------

The "Licensed" dependency license checker tool uses the "licensee" tool to automatically determine the license type
based on data contained in the dependency codebase. "licensee" checks several files for this data. The discovered data
is recorded in the `licenses` sequence of the dependency license metadata cache file. It might find multiple sources of
licensing data. The way "Licensed" handles this case is described as:

https://github.com/github/licensed/blob/v5.0.1/docs/commands/status.md#checking-status-with-metadata-loaded-from-cached-files

> If license: other is specified and all of the `licenses` entries match an `allowed` license a failure will not be
> logged

It is of course correct to treat the dependency as compatible under these conditions. However, the design of "Licensed"
around the handling of multiple licensing data sources is not very user friendly:

Lack of Transparency Re: Detected License Type
----------------------------------------------

Even though "Licensed" knows exactly which license type each of the sources was detected as, it does not record this
data in the dependency license metadata cache file.

Ambiguous Special License Type Identifier
-----------------------------------------

When multiple license data sources are found, "Licensed" sets the license type for the dependency to `other` in the
`license` key of the dependency license metadata cache file.

Although it is correct for the tool to use a special identifier, unfortunately "Licensed" uses the same identifier for
two significantly different cases:

* license type was not identifiable from the data (e.g., modifications were made to standard license text)
* multiple license types were identified from the data

The better approach would be for "Licensed" to use a separate identifier for each of these situations (e.g., `other`,
`multi`).

Failure to Set License Type Even when Identified
------------------------------------------------

Even when all of the multiple data sources are identified as the same license type, "Licensed" still unnecessarily sets
the `license` key of the dependency license metadata cache file to `other` instead of setting it to the identifier of
the identified license.

Problem
-------

Project maintainers expect that an identified license type of `other` means that the license type could not be
identified and that they must manually identify and set the license type in order for the compliance check to pass. They
will be extremely concerned to find that the check is passing even though a dependency's license type is defined as
`other`, as this appears to be a false negative (meaning that the system is not effectively enforcing compliance).

The ambiguous special license type identifier and lack of information about the detected type of individual license data
sources will make it impossible for them to understand why the check is passing despite the lack of a defined compatible
license type, and so they will waste time troubleshooting what is actually a completely functional system.

Resolution
----------

Always define the license type in the metadata cache, even when doing so is not required to get a passing check.

In this case, all the dependencies previously assigned the `other` identifier due to having multiple license data
sources actually had a single license type. So this was handled just the same as is done when a dependency is assigned
an `other` identifier due to the license data not being machine identifiable.
@per1234 per1234 added type: enhancement Proposed improvement topic: infrastructure Related to project infrastructure labels Jan 30, 2025
@per1234 per1234 self-assigned this Jan 30, 2025
@per1234 per1234 merged commit 6d3d341 into arduino:main Jan 30, 2025
10 checks passed
@per1234 per1234 deleted the manual-dep-license-definitions branch January 30, 2025 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: infrastructure Related to project infrastructure type: enhancement Proposed improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant