solr 打分计算公式
solr 6.3 版本的打分公式
计算公式:
\[score(q,d) = coord(q,d) \cdot queryNorm(q) \cdot \sum_{t\;in\;q} {(\;tf(t\;in\;d) \cdot idf(t)^2 \cdot t.getBoost() \cdot norm(t,d)\;)}\]
coord(q,d):
coordination factorhow many of the query terms are found in the specified document 关键词命中率
queryNorm(q):
query normalizationserves as a normalization factor to attempt to make cores between queries comparable
\[queryNorm(q) = queryNorm(sumOfSquaredWeights) = \frac{1}{\sqrt{sumOfSquaredWeights}}\]$sumOfSquaredWeights = q.getBoost()^2 \cdot (idf(t) \cdot t.getBoost())^2$
sumOfSquaredWeights
等于每个 term
相加。默认 boost = 1
,则为每个 term
的 idf(t)
的平方相加
tf(t in d):
term frequencyit is a measure of how often a particular term appears in a matching document
$tf(t\;in\;d) = frequency^{1/2}$
idf(t):
inverse document frequencyrarer terms give higher contribution to the total score
$idf(t) = 1 + \log \left(\frac{docCount\;+\;1}{docFreq\;+\;1} \right)$
- $\log$ 是以 e 为底数的对数;对应 JAVA 中的 $\log$ 方法
norm(t,d):
normalization factorcombine the matching document’s boost, the matching field’s boost, and a length-normalization factor that penalizes longer documents
$norm(t,d) = d.getBoost() \cdot lengthNorm(f) \cdot f.getBoost()$
$lengthNorm = 1 \; / \; Math.sqrt(numTerms) $
t.getBoost:
即每个词的权重,没给则默认为 1