WEB OF SCIENCE
SCOPUS
Contextual bandit is useful algorithm for the recommendation task in many applications such as NETFLEX, Amazon Echo, etc. Many algorithms are researched and showed a good result in terms of high total reward or low regret. However, when user wants to receive a recommendation in the new task, these algorithms do not use information that learned from before task.
We suggest new topic, Bandit Parameter Estimation, to solve that inefficient problem. In the same setting with Contextual bandit, we consider as user’s latent profile. And then we propose some algorithms to estimate as fast as possible.
We conducted to experiment to verify algorithms that we proposed in two case by using a synthetic dataset. As a result of experiment, we found that our algorithm estimates parameters faster than other algorithms in Contextual bandit. ⓒ 2017 DGIST