Documnet setting with external synonyms file #2647

MobinaPak · 2023-07-25T13:10:41Z

Hi.

I was wondering if test case "SynonymRepositoryIntegrationTests" works with setting file "/synonyms/settings-with-external-file.json"?

I want my synonym file to be a part of my source code (instead of using a file in elasticsearch or listing the synonyms in the analyzer filter definition) . So I tried the approach in test case but it didn't work.

It would be nice to provide a feature so that you can include synonyms in the source code (In a way that the test case would work :))
Thanks

sothawo · 2023-07-25T19:21:26Z

When I look at the Elasticsearch documentation the synonyms are either provided in a file on the server or defined inline in the settings. For Spring Data Elasticsearch this currently has two possibilities:

You either define them a settings file like in the SynonymRepositoryIntegrationTests which you do not want to.
You set the createIndex parameter in the @Document annotation to false and make sure that on application startup you create the index by yourself (IndexOperations.create(java.util.Map<java.lang.String,java.lang.Object>, org.springframework.data.elasticsearch.core.document.Document) and provide the mapping (you can obtain this with IndexOperations.createMapping() and the settings. The settings you'd need to create by yourself

Thinking about this, it might be a solution to add some kind of template resolution into loading a settings or mapping json file, but I am hesitant to introduce some random templating patterns like "#synoms#" into Spring Data Elasticsearch.

But you could try this approach by yourself. Write a settings file that might look like this (using the example from the test):

{
  "index": {
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "analysis": {
      "analyzer": {
        "synonym_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "synonym_filter"
          ]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "lenient": true,
          "synonyms": [
            $$MY-SYNONYMS_HERE$$
          ]
        }
      }
    }
  }
}

Read this file into a String. Then lets assume that you have a file of synonym definitions, each in a line. Read that file line by line, warp each line in double quotes and then join them with a comma as separator. This joined string would then replace the "$$MY-SYNONYMS_HERE$$" from the settings file.

After that replacement you can put the resulting String into the org.springframework.data.elasticsearch.core.index.Settings#parse() method to get a Settings object (which implements Map<String, Object>. With that you can create the index.

Basically that could be integrated somehow into Spring Data Elasticsearch. But there could be as well something like an include mechanism that might look like this (just writing down an idea):

{
  "index": {
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "analysis": {
      "analyzer": {
        "synonym_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "synonym_filter"
          ]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "lenient": true,
          "$include synonyms": "synonyms.txt"
        }
      }
    }
  }
}

That would be valid JSON; after reading that into a map we'd need to iterate through the map and replacing a property with a name like "$include <name>" and a value of "<filename>" by a new property named <name> that has as value the content of <filename>. That could be repeated recursively.

Not sure about that, have to think some time about that.

MobinaPak · 2023-07-30T11:46:50Z

Thank you for responding

About the first way as you mentioned I can't include the synonyms in the json file; because I have more than hundred synonyms and its reduces my files readablity.

Your second suggestion was a good alternative to solve the problem, but I don't want to create the index myself . It complicates my code and the automatic way is actually a great fit.

The idea that you have mentioned is awesome. It would be great if we could be able to add some extra notations to our setting that spring data elasticsearch would use before creating the index. A similar case of stopword file comes to my mind. It might also be useful for other field values. For example, you might want to add a part of your analyzer setting only if some condition is true, or load a field value from configuration properties or your environment variables. I'm not sure if it's a feature that all projects would use. just thinking out load here :)

sothawo · 2023-08-01T15:39:54Z

I think providing such an include mechanism seems could be handy in quite some situations, I created #2657 as a general improvement for this.

As for the use with synonyms keep in mind what the documentation states:

However, it is recommended to define large synonyms set in a file using synonyms_path, because specifying them inline increases cluster size unnecessarily.

MobinaPak · 2023-08-05T08:31:05Z

Thanks

A part of our conversation was left, which I think could be a useful feature as well

It would be useful to be able to have access to config properties in setting.json.

For example :

{
"index": {
    "number_of_shards": "1",
    "number_of_replicas": "0",
    "analysis": {
      "analyzer": {
        "synonym_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "synonym_filter"
          ]
        }
      },
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "lenient": "${synonyms.lenient}",
          "synonyms": "${synonyms.file.path}"
        }
      }
    }
  }

}

here "synonyms.lenient" and "synonyms.file.path" are properties that I've mentioned in my application.yml. And this feature is not only for synonyms. Rather, it will be suitable for all setting file blocks. maybe someone wants to fetch "number_of_shards" from config properties as well!

MobinaPak · 2023-08-05T08:37:12Z

And another thing is still unclear for me (as I mentioned when I first created this issue) .

I was wondering if test case "SynonymRepositoryIntegrationTests" works with setting file "/synonyms/settings-with-external-file.json"?

if not please remove the mentioned json file from project test files because its confusing!

Thanks

sothawo · 2023-08-06T13:50:38Z

As for the use of property values: They would be needed to be retrieved from the Spring environment; although this would be possible, a replacement mechanism would be way more complicated. We not only would need to check the keys in a Map parsed from JSON to find a place where to include another JSON fragment, but also would need to parse the values of these map entries. And this would not work for example for numeric parameters as

"number_of_shards": ${config.number-of-shards}

would not be parsed into a Map as it is not valid JSON.
If you want to configure index settings parameters via the configuration, I'd suggest, you programmtically create a Settings object and set these values; then call the appropriate IndexOperations method.

As for the synonyms/settings-with-external-file.json file. This indeed seems to be a leftover from the times when Spring Data Elasticsearch started it's own Elasticsearch instance for the integration tests. Need to check if we can mount this inot the testcontainers instance and reintroduce a test for that or remove it.

MobinaPak · 2023-08-08T09:20:27Z

Thank you for your cooperation and response

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jul 25, 2023

sothawo added the status: waiting-for-feedback We need additional information before we can continue label Jul 25, 2023

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 30, 2023

sothawo mentioned this issue Aug 1, 2023

Add the possibility to include other json files when reading an index settings or mapping file. #2657

Open

sothawo added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Aug 1, 2023

MobinaPak closed this as completed Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documnet setting with external synonyms file #2647

Documnet setting with external synonyms file #2647

MobinaPak commented Jul 25, 2023 •

edited

Loading

sothawo commented Jul 25, 2023

MobinaPak commented Jul 30, 2023

sothawo commented Aug 1, 2023

MobinaPak commented Aug 5, 2023

MobinaPak commented Aug 5, 2023

sothawo commented Aug 6, 2023

MobinaPak commented Aug 8, 2023

Documnet setting with external synonyms file #2647

Documnet setting with external synonyms file #2647

Comments

MobinaPak commented Jul 25, 2023 • edited Loading

sothawo commented Jul 25, 2023

MobinaPak commented Jul 30, 2023

sothawo commented Aug 1, 2023

MobinaPak commented Aug 5, 2023

MobinaPak commented Aug 5, 2023

sothawo commented Aug 6, 2023

MobinaPak commented Aug 8, 2023

MobinaPak commented Jul 25, 2023 •

edited

Loading