Loss Functions

Shoelace currently provides three ranking loss functions:

  • ListNet (Top-1 approximation)
  • ListMLE
  • ListPL

Usage

The loss functions can easily be plugged into a chainer architecture. We can define a Chain class that uses the ListNet loss, assuming we have a network architecture called predictor, as follows:

from shoelace.loss.listwise import listnet
from chainer import Chain

class Ranker(Chain):
    def __call__(self, x, t):
        return listnet(self.predictor(x), t)

loss = Ranker(predictor=predictor)

ListNet

The ListNet [CQL+07] loss function is the cross-entropy between the probability of the labels given a permutation and the probability of the network scores given a permutation:

\[\mathcal{L}(f(x), y) = - \sum_{\pi \in \Omega} P(\pi \mid y) \log P(\pi \mid f(x))\]

Where \(P(\pi \mid z)\) is a Plackett-Luce probability model of permutation \(\pi\) given item-specific set of scores \(z\). Since the set \(\Omega\) has size \(\mathcal{O}(n!)\), it is too large to compute in practice. Instead, we use the top-1 approximation as in the original paper, which reduces to a softmax of the scores followed by a cross entropy.

This loss function is implemented in shoelace.loss.listwise.listnet.

ListMLE

The ListMLE [XLW+08] loss function is the negative probability of a single permutation \(\pi \in \{\pi \mid y_{\pi_i} \geq y_{\pi_j}; i < j\}\) of the ground truth labeling. It is defined as follows:

\[\mathcal{L}(f(x), y) = - \log P(\pi \mid f(x))\]

This loss function is implemented in shoelace.loss.listwise.listmle.

ListPL

The ListPL [JKdR17] loss function is an approximation to the cross-entropy loss of ListNet. It can be seen as a stochastic variant of ListMLE where during every update a new permutation \(\pi\) is drawn:

\[\begin{split}\mathcal{L}(f(x), y) = - \log P(\pi \mid f(x)) \\ \pi \sim P(\pi \mid y)\end{split}\]

This loss function is implemented in shoelace.loss.listwise.listpl.

References

[CQL+07]Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, 129–136. ACM, 2007.
[JKdR17]Rolf Jagerman, Julia Kiseleva, and Maarten de Rijke. Modeling label ambiguity for neural list-wise learning to rank. arXiv preprint arXiv:1707.07493, 2017.
[XLW+08]Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning, 1192–1199. ACM, 2008.