Contextual bandit is useful algorithm for the recommendation task in many applications such as NETFLEX, Amazon Echo, etc. Many algorithms are researched and showed a good result in terms of high total reward or low regret. However, when user wants to receive a recommendation in the new task, these algorithms do not use information that learned from before task.
We suggest new topic, Bandit Parameter Estimation, to solve that inefficient problem. In the same setting with Contextual bandit, we consider as user’s latent profile. And then we propose some algorithms to estimate as fast as possible.
We conducted to experiment to verify algorithms that we proposed in two case by using a synthetic dataset. As a result of experiment, we found that our algorithm estimates parameters faster than other algorithms in Contextual bandit. ⓒ 2017 DGIST
Table Of Contents
Ⅰ. Introduction 1--
1.1 Overview 1--
1.2 Background 2--
1.2.1 Multi-Armed bandit 2--
1.2.2 K-armed (Linear) Contextual bandit 3--
1.3 Related work 4--
1.3.1 algorithm 4--
1.3.2 UCB 5--
1.3.3 LinUCB 6--
Ⅱ. Materials 8--
2.1 Problem setting for Bandit Parameter Estimation 8--
2.2 The uncertainty ellipsoid of 𝚹_(*) 9--
2.2.1 𝑀𝑎𝑥(𝑚𝑖𝑛𝐸𝑖𝑔. 𝑣𝑎𝑙) 10--
2.2.2 𝑀𝑎𝑥(𝑇𝑟(Σ_(t))) 11--
2.2.3 𝑀𝑖𝑛(𝑇𝑟(Σ_(t)^(-1))) 11--
2.2.4 Max(Det(Σ_(t))) 12Ⅲ. Method 13--
3.1 Generating synthetic data 13--
3.2 The experiment process 13--
Ⅳ. Experimental result 14--
4.1 The experiment case 1 : Various k, fixed d 14--
4.2 The experiment case 2 : Various d, fixed k 15--