-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Documnet setting with external synonyms file #2647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When I look at the Elasticsearch documentation the synonyms are either provided in a file on the server or defined inline in the settings. For Spring Data Elasticsearch this currently has two possibilities:
Thinking about this, it might be a solution to add some kind of template resolution into loading a settings or mapping json file, but I am hesitant to introduce some random templating patterns like "#synoms#" into Spring Data Elasticsearch. But you could try this approach by yourself. Write a settings file that might look like this (using the example from the test): {
"index": {
"number_of_shards": "1",
"number_of_replicas": "0",
"analysis": {
"analyzer": {
"synonym_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"synonym_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"lenient": true,
"synonyms": [
$$MY-SYNONYMS_HERE$$
]
}
}
}
}
} Read this file into a String. Then lets assume that you have a file of synonym definitions, each in a line. Read that file line by line, warp each line in double quotes and then join them with a comma as separator. This joined string would then replace the "$$MY-SYNONYMS_HERE$$" from the settings file. After that replacement you can put the resulting String into the Basically that could be integrated somehow into Spring Data Elasticsearch. But there could be as well something like an include mechanism that might look like this (just writing down an idea): {
"index": {
"number_of_shards": "1",
"number_of_replicas": "0",
"analysis": {
"analyzer": {
"synonym_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"synonym_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"lenient": true,
"$include synonyms": "synonyms.txt"
}
}
}
}
} That would be valid JSON; after reading that into a map we'd need to iterate through the map and replacing a property with a name like "$include <name>" and a value of "<filename>" by a new property named <name> that has as value the content of <filename>. That could be repeated recursively. Not sure about that, have to think some time about that. |
Thank you for responding About the first way as you mentioned I can't include the synonyms in the json file; because I have more than hundred synonyms and its reduces my files readablity. Your second suggestion was a good alternative to solve the problem, but I don't want to create the index myself . It complicates my code and the automatic way is actually a great fit. The idea that you have mentioned is awesome. It would be great if we could be able to add some extra notations to our setting that spring data elasticsearch would use before creating the index. A similar case of stopword file comes to my mind. It might also be useful for other field values. For example, you might want to add a part of your analyzer setting only if some condition is true, or load a field value from configuration properties or your environment variables. I'm not sure if it's a feature that all projects would use. just thinking out load here :) |
I think providing such an include mechanism seems could be handy in quite some situations, I created #2657 as a general improvement for this. As for the use with synonyms keep in mind what the documentation states:
|
Thanks A part of our conversation was left, which I think could be a useful feature as well It would be useful to be able to have access to config properties in setting.json. For example :
here "synonyms.lenient" and "synonyms.file.path" are properties that I've mentioned in my application.yml. And this feature is not only for synonyms. Rather, it will be suitable for all setting file blocks. maybe someone wants to fetch "number_of_shards" from config properties as well! |
And another thing is still unclear for me (as I mentioned when I first created this issue) . I was wondering if test case "SynonymRepositoryIntegrationTests" works with setting file "/synonyms/settings-with-external-file.json"? if not please remove the mentioned json file from project test files because its confusing! Thanks |
As for the use of property values: They would be needed to be retrieved from the Spring environment; although this would be possible, a replacement mechanism would be way more complicated. We not only would need to check the keys in a
would not be parsed into a As for the synonyms/settings-with-external-file.json file. This indeed seems to be a leftover from the times when Spring Data Elasticsearch started it's own Elasticsearch instance for the integration tests. Need to check if we can mount this inot the testcontainers instance and reintroduce a test for that or remove it. |
Thank you for your cooperation and response |
Hi.
I was wondering if test case "SynonymRepositoryIntegrationTests" works with setting file "/synonyms/settings-with-external-file.json"?
I want my synonym file to be a part of my source code (instead of using a file in elasticsearch or listing the synonyms in the analyzer filter definition) . So I tried the approach in test case but it didn't work.
It would be nice to provide a feature so that you can include synonyms in the source code (In a way that the test case would work :))
Thanks
The text was updated successfully, but these errors were encountered: