One of the most common questions we get from our WordPress VIP clients, many of whom are large media companies that publish constantly, is how they can bias their search results towards more recent content when scoring and sorting them. This type of problem is extremely hard to solve with a traditional RDBMS but we provide most of our VIP clients their own dedicated Elasticsearch index and as it happens ES comes with some powerful scoring functions for just this purpose.
With Elasticsearch adding date based weighting of results via function scores is pretty straightforward. A query like the following will multiply the TF-IDF textual relevancy score of the content with a date based score that makes older content less and less important as time progresses:
Adding such a time based decay is simple however it’s not obvious which type of function scoring should be used (
linear) much less what are good values to use when configuring the
decay of each function.
While leading a workshop on querying Elasticsearch last week I struggled to explain exactly how these scoring functions will effect the final ranking of documents. So instead of trying to workout the coefficients we are adding with function scores by hand I decided to build a simple visualization so that it’s easier to play around with different settings to see what will happen to our scores under various settings.
The CodePen at the top of this post allows you to adjust various settings then shows how scores decay over a 1 year period with each of the three types of function scorers. Note, both offset and scale are specified in days so
28 is the equivalent of
28d when used within the ES query DSL.