Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to test the model. While many DA algorithms have demonstrated considerable empirical success, the unavailability of target labels in DA makes it challenging to determine their effectiveness in new datasets without a theoretical basis. Therefore, it is essential to clarify the assumptions required for successful DA algorithms and quantify the corresponding guarantees. In this work, we focus on the assumption that conditionally invariant components (CICs) useful for prediction exist across the source and target data. Under this assumption, we demonstrate that CICs found via conditional invariant penalty (CIP) play three essential roles in providing guarantees for DA algorithms. First, we introduce a new CIC-based algorithm called importance-weighted conditional invariant penalty (IW-CIP), which has target risk guarantees beyond simple settings like covariate shift and label shift. Second, we show that CICs can be used to identify large discrepancies between source and target risks of other DA algorithms. Finally, we demonstrate that incorporating CICs into the domain invariant projection (DIP) algorithm helps to address its known failure scenario caused by label-flipping features. We support our findings via numerical experiments on synthetic data, MNIST, CelebA, and Camelyon17 datasets.
|