|
2025-06-04 / 16:00 ~ 17:00
|
학과 세미나/콜로퀴엄 - 기타: Second-order learning in confidence bounds, contextual bandits, and regression |

|
|
by 전광성()
|
Confidence sequence provides ways to characterize uncertainty in stochastic environments, which is a widely-used tool for interactive machine learning algorithms and statistical problems including A/B testing, Bayesian optimization, reinforcement learning, and offline evaluation/learning.In these problems, constructing confidence sequences that are tight and correct is crucial since it has a significant impact on the performance of downstream tasks. In this talk, I will first show how to derive one of the tightest empirical Bernstein-style confidence bounds, both theoretically and numerically. This derivation is done via the existence of regret bounds in online learning, inspired by the seminal work of Raklin& Sridharan (2017). Then, I will discuss how our confidence bound extends to unbounded nonnegative random variables with provable tightness. In offline contextual bandits, this leads to the best-known second-order bound in the literature with promising preliminary empirical results. Finally, I will turn to the $[0,1]$-valued regression problem and show how the intuition from our confidence bounds extends to a novel betting-based loss function that exhibits variance-adaptivity. I will conclude with future work including some recent LLM-related topics.
|
|
|