Department Seminars & Colloquia

Category SAARC Seminar
Event SAARC Seminar
Title OptiDICE for Offline Reinforcement Learning
Abstract Offline reinforcement learning (RL) refers to the problem setting where the agent aims to optimize the policy solely from the pre-collected data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this talk, I present OptiDICE, an offline RL algorithm that prevents overestimation in a more principled way. OptiDICE directly estimates the stationary distribution corrections of the optimal policy and does not rely on policy-gradients, unlike previous offline RL algorithms. Using an extensive set of benchmark datasets for offline RL, OptiDICE is shown to perform competitively with the state-of-the-art methods. This is a joint work with Jongmin Lee (UC Berkeley), Wonseok Jeon (Qualcomm), Byung-Jun Lee (Korea U.), and Joelle Pineau (MILA)
Daytime 2022-05-27 (Fri) / 10:00 ~ 11:00 ** 날짜에 유의하세요. **
Place Zoom (ID: 683 181 3833 / PW: saarc)
Language To be announced
Speaker`s name Kim Kee-Eung
Speakers`s Affiliation Korea Advanced Institute of Science and Technology
Speaker`s homepage
Other information
Hosts 확률 해석 및 응용 연구센터
URL
담당자 확률 해석 및 응용 연구센터
연락처 042-350-8111/8117