Skip to content

Adds support for dense_vector type #1708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 26, 2021

Conversation

morganlutz
Copy link
Contributor

Hi all. We need to support a dense_vector field for our elastic search implementation. I saw you have an open ticket to support this here #1700. I only included dense_vector since sparse_vector is deprecated and both rank_feature and rank_features appear to already be supported.

Closes #1700

  • You have read the Spring Data contribution guidelines.
  • There is a ticket in the bug tracker for the project in our issue tracker. - Add missing "Document ranking types" #1700
  • You use the code formatters provided here and have them applied to your changes. Don’t submit any formatting related changes.
  • You submit test cases (unit or integration tests) that back your changes.
  • You added yourself as author in the headers of the classes you touched. Amend the date range in the Apache license header if needed. For new types, add the license header (copy from another file and set the current year only).

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Feb 25, 2021
Copy link
Collaborator

@sothawo sothawo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. The first time I get a PR - and that from a first-time contributor - where I have (almost nothing) to remark. Just one thing: check that dims is set for FieldType.DenseVector. I commented that at the corresponding place.

@@ -191,6 +197,8 @@ private MappingParameters(InnerField field) {
|| (maxShingleSize >= 2 && maxShingleSize <= 4), //
"maxShingleSize must be in inclusive range from 2 to 4 for field type search_as_you_type");
positiveScoreImpact = field.positiveScoreImpact();
dims = field.dims();
Assert.isTrue(dims <= 2048, "The maximum number of dimensions that can be in a vector should not exceed 2048.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assert.isTrue(dims <= 2048, "The maximum number of dimensions that can be in a vector should not exceed 2048.");
Assert.isTrue(dims <= 2048, "The maximum number of dimensions that can be in a vector must not exceed 2048.");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -153,6 +157,8 @@ private MappingParameters(Field field) {
|| (maxShingleSize >= 2 && maxShingleSize <= 4), //
"maxShingleSize must be in inclusive range from 2 to 4 for field type search_as_you_type");
positiveScoreImpact = field.positiveScoreImpact();
dims = field.dims();
Assert.isTrue(dims <= 2048, "The maximum number of dimensions that can be in a vector should not exceed 2048.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assert.isTrue(dims <= 2048, "The maximum number of dimensions that can be in a vector should not exceed 2048.");
Assert.isTrue(dims <= 2048, "The maximum number of dimensions that can be in a vector must not exceed 2048.");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 334 to 341
if (dims >= 1) {
builder.field(FIELD_PARAM_DIMS, dims);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html dims is a required parameter. So it should be checked that if type == FieldType.DenseVector then the dims value must be greater or equal 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! I combined the two validations for dims here and here and added a new test. I thought it would be best to group it with the other validations, so that malformed data does not make it past the creation of MappingParameters.

@sothawo sothawo removed the status: waiting-for-triage An issue we've not yet triaged label Feb 25, 2021
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Feb 25, 2021
A dense_vector field stores dense vectors of float values. The maximum number of dimensions that can be in a vector should not exceed 2048. A dense_vector field is a single-valued field.

Closes spring-projects#1700
@morganlutz
Copy link
Contributor Author

@sothawo thanks so much for the quick review and help!! 🙏

@sothawo sothawo merged commit 3f2ab4b into spring-projects:master Feb 26, 2021
@morganlutz morganlutz deleted the dense-vector-support branch February 26, 2021 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage An issue we've not yet triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add missing "Document ranking types"
3 participants