AI can match or even outperform humans in many tasks, but they are also prone to mistakes on unfamiliar inputs. Unfortunately, AI often encounter unfamiliar inputs when deployed in the real world because training data is seldom diverse enough to reflect the diversity of real-world inputs. This not only degrades (AI) reliability but can also lead to safety violations. For example,
My research leverages statistical science to improve the reliability and safety of AI in the real world. Towards this goal, we work on:
Despite transformative advances in AI (experts predict that AI will exceed human capabilities in the near future [Bengio et al]), we evaluate AI in the same ways as we evaluate simpler predictive models (eg accuracy on held-out examples). Such simple evaluation practices cannot assess rich AI outputs holistically, leading to blind spots that ultimately hamper AI safety and reliability. To fill this gap between AI evaluation demands and practice, we develop new statistical theory and methods for AI evaluation. Some representative papers are:
We take a statistical approach to algorithmic fairness: algorithmic biases are often caused by distribution shifts (eg due to sampling biases in the training data), which suggests they are statistical problems that admit statistical solutions. A key insight from our work is (contrary to popular belief) aligning AI so that they are fair/safe/transparent may not be at odds with accuracy. In fact, alignment can improve (out-of-distribution) performance. Sometimes, we can even exploit the distribution shifts to achieve otherwise unachievable fairness objectives (eg achieve equality of outcomes with (policies that satisfy) equality of opportunity/treatment). This motivates some of our work on distribution shifts. Here are some representative papers:
AI often encounter adversarial and out-of-distribution inputs in the real world, but the pernicious effects of distribution shifts are poorly understood. This is a form of technical debt that hinders us from anticipating AI risks before they arise. We seek to repay this technical debt. Some representative papers are: