Tuesday, September 20, 2016

Elasticsearch: Levenshtein sorting

Leave a Comment

I have a query that works sufficiently, but I want to sort the results of this by using levenshtein between the query param and the field in question.

Right now I'm doing the query in ES and then I do the sorting in my application. Right now I'm testing the script field in sort. This is the script

import  org.elasticsearch.common.logging.*; ESLogger logger = ESLoggerFactory.getLogger('levenshtein_script');  def str1 = '%s'.split(' ').sort().join(' '); def str2 = doc['%s'].values.join(' '); //Needed since the field is analyzed. This will change when I reindex the data. def dist = new int[str1.size() + 1][str2.size() + 1] (0..str1.size()).each { dist[it][0] = it } (0..str2.size()).each { dist[0][it] = it } (1..str1.size()).each { i ->    (1..str2.size()).each { j ->        dist[i][j] = [dist[i - 1][j] + 1, dist[i][j - 1] + 1, dist[i - 1][j - 1] + ((str1[i - 1] == str2[j - 1]) ? 0 : 1)].min()    } } def result = dist[str1.size()][str2.size()] logger.info('Query param: ['+str1+'] | Term: ['+str2+'] | Result: ['+result+']'); return result; 

Basically this is a template (check the %s) that I fill in my application like this

sortScript = String.format(EDIT_DISTANCE_GROOVY_FUNC, fullname, FULLNAME_FIELD_NAME); 

The problem is this http://code972.com/blog/2015/03/84-elasticsearch-one-tip-a-day-avoid-costly-scripts-at-all-costs. Which is understandable.

My question is, how can I do what I need (sort the results by levenshtein) inside elasticsearch so I can avoid the overhead in my application. Can I use lucene expressions for this? Do you have an example? Is there some other way that I can accomplish this?

I'm using ElasticSearch 1.7.5 as a service. So native plugins should not be the first solution (I don't know even if it's possible, I'll have to check with my provider, but if it's the only viable solution I will do just that).

UPDATE

So it seems a good solution would be to save it in config/scripts folder as it will be compiled once https://www.elastic.co/blog/running-groovy-scripts-without-dynamic-scripting. The script can be indexed instead of saving it https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html . This is much more convenient for my use case. Does this have the same behaviour regarding the compilation of the script? Will it be compiled only once?

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment