Yuekai Sun

Research

AI can match or even outperform humans in many tasks, but they are also prone to mistakes on unfamiliar inputs. Unfortunately, AI often encounter unfamiliar inputs when deployed in the real world because training data is seldom diverse enough to reflect the diversity of real-world inputs. This not only degrades (AI) reliablity but can also lead to safety violations. For example,

  1. polygenic risk scores are many times less accurate on non-European people because most subjects in genome-wide association studies are European [Martin et al];
  2. word embeddings perpetuate gender stereotypes: “man is to computer programmer as woman is to homemaker” [Bolukbasi et al];
  3. malicious users can circumvent ChatGPT’s safety training with adversarial inputs (“jailbreak” attacks) [Wei et al].

My research leverages statistical science to improve the reliability and safety of AI in the real world. Towards this goal, we work on:

AI evaluation & safety

Despite transformative advances in AI (experts predict that AI will exceed human capabilities in the near future [Bengio et al]), AI remain prone to unexpected behavior, making them unreliable and potentially dangerous in the hands of malicious actors. We develop new theories and methods for AI alignment and evaluation to help us anticipate and manage AI risks. Some representative papers are:

  1. Efficient multi-prompt evaluation of LLMs
    F Maia Polo, R Xu, L Weber, M Silva, O Bhardwaj, L Choshen, A Oliveira, Y Sun, M Yurochkin. to appear in NeurIPS 2024.
  2. A transfer learning framework for weak-to-strong generalization
    S Somerstep, F Maia Polo, M Banerjee, Y Ritov, M Yurochkin, Y Sun.
  3. tinyBenchmarks: evaluating LLMs with few examples
    F Maia Polo, L Weber, L Choshen, Y Sun, G Xu, M Yurochkin. ICML 2024.

Algorithmic fairness

We developed a suite of algorithms to help practitioners implement individual fairness (IF), an intuitive notion of algorithmic fairness that requires algorithms to “treat similar inputs similarly”. The suite includes algorithms for aligning similarity metrics with user feedback, auditing algorithms for violations of IF, and training individually fair AI. Before our work, IF was dismissed as impractical because there were no similarity metrics for many AI tasks and no practical algorithms to train individually fair AI. The (similarity) metric learning algorithms in the suite address the first issue, and the training algorithms address the latter issue. The algorithms are implemented in IBM’s AIF360 toolkit. Here are some representative papers:

  1. Post-processing for Individual Fairness
    F Petersen, D Mukherjee, Y Sun, M Yurochkin. NeurIPS 2021.
  2. SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness
    M Yurochkin, Y Sun. ICLR 2021.
  3. Two simple ways to learn individual fairness metrics from data
    D Mukherjee, M Yurochkin, M Banerjee, Y Sun. ICML 2020.

More recently, we have adopted a statistical approach to algorithmic fairness: algorithmic biases are often caused by distribution shifts (eg due to sampling biases in the training data), which suggests they are statistical problems that admit statistical solutions. Sometimes, we can even exploit the distribution shifts to achieve otherwise unachievable fairness objectives (eg achieve equality of outcomes with (policies that satisfy) equality of opportunity/treatment). This motivates a large part of our work on distribution shifts.

Learning under distribution shifts

AI often encounter adversarial and out-of-distribution inputs in the real world, but the pernicious effects of distribution shifts are poorly understood. This is a form of technical debt that hinders us from anticipating AI risks before they arise. We seek to repay this technical debt. A key insight from our work is (contrary to popular belief) aligning AI so that they are fair/safe/transparent may not be at odds with performance. In fact, alignment can improve (out-of-distribution) performance. Some representative papers are:

  1. Algorithmic Fairness in Performative Policy Learning: Escaping the Impossibility of Group Fairness
    S Somerstep, Y Ritov, Y Sun. FAccT 2024.
  2. An Investigation of Representation and Allocation Harms in Contrastive Learning
    S Maity, M Agarwal, M Yurochkin, Y Sun. ICLR 2024.
  3. Domain Adaptation meets Individual Fairness. And they get along.
    D Mukherjee, F Petersen, M Yurochkin, Y Sun. NeurIPS 2022.