작성자 : 남성욱

발표일 : 2020.11.23

•

참고자료

maximum likelihood

variational auto-encoder

KL-divergence

criteo (크리테오) 광고

### Abstract(summary)

The users’ behavior consists of two parts: the sequence of items that they viewed without intervention (the organic part) and the sequences of items recommended to them and their outcome (the bandit part).

1.

the organic signal

•

is not always relevant to the recommendation task

•

is typically strong and covers most items

2.

the bandit signal

•

gives direct feedback of recommendation performance,

•

the signal quality is very uneven

In order to leverage the organic signal to efficiently learn the bandit signal in a Bayesian model we identify three fundamental types of distances, namely action-history, action-action and history-history distances

We implement a scalable approximation of the full model using variational auto-encoders and the
local re-parameterization trick

We show using extensive simulation studies that our method out-performs or matches the value of both state-of-the-art organic-based recommendation algorithms, and of bandit-based methods (both value and policy-based)

### Modeling

#### the notation

#### the organic part

the user’s interest is described by a K dimensional variable $w_n$ which can be interpreted as the user’s interest in K topics.

The organic embedding matrix Ψ is P × K and represents information about how items correlate in a users session organically

(We will show that using variational auto encoders with the re-parameterization trick is an effective way to train the organic model.)

#### the bandit part

Once organic session is generated a recommendation or actions is made to user $u$ denoted $a_u$ and a reward or click will be observed $c_u$.

The bandit embedding matrix β is P × K and represents information about how to personalize recommendations to a user $u$ with a latent user representation $w_n$.

where,

The approach developed in this paper takes the organic model and estimates Ψ by maximum likelihood and ω by posterior mean (denoted ωˆ) and then treats Ψ and ωˆ as observed in the bandit model. The graphical model is shown in Figure 2. In this probabilistic model we will develop full Bayesian inference of the β, κ, $w_a$,$w_b$ and $w_c$.

We refer to the organic only component of the model as BLO(Bayesian Latent Organic) model (we apply maximum likelihood to Ψ, ρ and integrate ω). The full model is referred to as BLOB(Bayesian Latent Organic Bandit Model).

three fundamental types of distances, namely action-history, action-action and history-history distances

•

action-history similarity

(auto-completion assumption) it would have similarity between recommendation and the items in history

The mean of the matrix normal Ψ embodies this assumption.

•

action-action similarity

if action(recommendation) $a_1$ and $a_2$ are similar then we expect that the responses to these actions to the same (or similar) users be correlated.

This distance is encoded with the first (low rank) co-variance ΨΨT in the matrix normal prior on β.

•

history-history similarity

If user $u_1$ and $u_2$ are similar then we expect the response to the same (or similar) action on these users to be correlated.

This distance is encoded with the second (low rank) co-variance ΨT Ψ in the matrix normal prior on β

### Model training

(organic part only)

process (VAE에서 늘 하는 과정)

•

we want to maximize the log likelihood

•

It is hard → find the lower bound, and maximize it

•

it is also hard → use the re-parametrization trick, get the optimal parameter using b.p.

### Results

metric

일반적인 recommendation 평가 지표.

benchmarking dataset

Recogym dataset (next item prediction)

아래, 일반적인 benchmark dataset 목록,

models

•

Pop : 가장 인기있는 상품을 모두에게 추천.

•

ItemKNN :

•

RNN : GRU4Rec, 유저에 상관없이, 과거 상품 리스트를 근거로 다음 상품을 예측하는 모델.

•

Bouch/AE : A linear variational auto-encoder using the Bouchard bound

re-parameterization을 사용하지 않아도 풀 수 있게 변형할 수 있는듯 보임, appendix에 수식전개가 있고, lower bound를 새로 정의함.

•

RT/AE : re-parameterization 을 이용한 auto-encoder

The results are shown in Table 2. BLO is much better than the baselines at standard organic recommender systems metrics.

•

baseline 모델에 비해 이 논문이 제안한 BLO가 성능이 훨씬 좋다.

•

그렇지만 일반적인 dataset을 사용하지 않은 점이 아쉽다.

•

RNN model로 GRU4Rec을 적용하였다고 하는데, 해당 논문의 성능이 이렇게 안좋은 성능이 나오는 이유가 명시되지 않았다. dataset의 어떤 특성이 있던건지? Recall@20 이 아니라 Recall@5를 쓴게 문제인지?

GRU4Rec 성능표

metric

•

CTR

click through rate

models

•

Logistic Regression :

•

Contextual bandit : policy based method

•

MultiVAE : A state of the art deep learning recommendation algorithm similar to the organic portion of the model presented here except the model is non-linear and uses some nonstandard heuristics such as “beta-annealing”

•

BLO : The organic portion of the model developed here

•

BLOB-NQ : (organic and bandit combined)The complete model developed here,

the normal variational approximation NQ

•

BLOB-MNQ : (organic and bandit combined)The complete model developed here,

the matrix normal variational approximation MNQ.

The first experiment considers the catalog size to be P=100, the number of user sessions to be 1000, the simulated A/B test is done over 4000 users and the logging policy being session popularity with epsilon greedy exploration (epsilon=0.3).

50번 반복했을때, organic model들은 random 하게 추천하는것 보다 오히려 안좋아 지고 있음을 확인 할 수 있다.

### Conclusion

•

organic signal과 bandit signal을 동시에 고려한 model 이라는 점에서 의의가 있다.

•

기존 model들보다 bandit signal을 고려한 문제에서 더 높은 성능을 보인다.

•

이 논문에서 사용하고 있는 benchmark dataset이 얼마나 적절한가에 대해서는 의문이 남는다.