About me

I am a Ph.D. candidate in the RLAI lab at the University of Alberta, under the supervision of Dr. Martha White and Dr. Adam White. I am passionate about reinforcement learning (RL), with a primary research interest in offline learning, offline-to-online, and real-world applications.

Education

  1. Doctor of Philosophy in Computing Science

    Sept 2020 - Present
    Expected graduation: 2025

    Supervised by Dr. Martha White and Dr. Adam White
    RLAI lab, University of Alberta
    Reinforcement learning, Offline RL, Offline-to-online

  2. Master of Science in Computing Science

    Sept 2017 - Sept 2020

    Supervised by Dr. Adam White and Dr. Martha White
    RLAI lab, University of Alberta
    Reinforcement learning, Representation learning

  3. Bachelor of Science with Honors in Computing Science

    Sept 2013 - June 2017

    University of Alberta
    Graduated with first class honors

Work History

  1. Intern

    Nov 2023 - Nov 2024

    Reinforcement learning for a real-world water treatment system
    RL Core

  2. Research Intern

    May 2022 - Dec 2022

    Offline reinforcement learning
    Noah's Ark Lab, Huawei Technologies Canada Co., Ltd.

  3. Teaching Assistant

    Winter 2022

    CMPUT 267 - Basics of Machine Learning
    University of Alberta

    Fall 2018, Fall 2017, Fall 2016

    CMPUT 366 - Intelligent Systems, University of Alberta
    University of Alberta

  4. Programmer

    July 2016

    Henan Yufa Property Limited Company

Publications

* denotes equal contribution

  1. Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies

    ICLR, 2025

    Lingwei Zhu*, Han Wang*, Yukie Nagai

    ⮩ An offline RL algorithm learning a sparse policy.

    ⮩ Click to see our code

    FtTPO

  2. q-Exponential Family for Policy Optimization

    ICLR, 2025

    Lingwei Zhu*, Haseeb Shah*, Han Wang*, Martha White

    ⮩ An investigation on policy parameterization.

    ⮩ Student's t-distribution is a strong candidate for drop-in replacement to the Gaussian.

    ⮩ Click to see our code

    qexp

  3. Investigating the Properties of Neural Network Representations in Reinforcement Learning

    Artificial Intelligence, 2024

    Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White

    ⮩ A systematic approach to better understand why some representations work better for transfer.

    ⮩ Click to see our code

    repproprepprop

  4. A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization

    Reinforcement Learning Journal, 2024

    Yudong Luo, Yangchen Pan, Han Wang, Philip Torr, Pascal Poupart

  5. Exploiting the Replay Memory Before Exploring the Environment: Enhancing Reinforcement Learning Through Empirical MDP Iteration

    NeurIPS, 2024

    Hongming Zhang, Chenjun Xiao, Chao Gao, Han Wang, Bo Xu, Martin Müller

  6. In-sample Offline Reinforcement Learning via Tsallis Regularization

    Transactions on Machine Learning Research, 2024

    Lingwei Zhu, Matthew Kyle Schlegel, Han Wang, Martha White

  7. The In-Sample Softmax for Offline Reinforcement Learning

    ICLR, 2023, notable-top-25%

    Chenjun Xiao*, Han Wang*, Yangchen Pan, Adam White, Martha White

    ⮩ InAC: An offline RL method using in-sample softmax to learn good policy under insufficient action-coverage.

    ⮩ Click to see our code

    inac

  8. Measuring and Mitigating Interference in Reinforcement Learning

    CoLLAs, 2023

    Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White

  9. Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay

    ICLR, 2023

    Hongming Zhang, Chenjun Xiao, Han Wang, Jun Jin, Bo Xu, Martin Müller

  10. No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

    Transactions on Machine Learning Research, 2022

    Han Wang*, Archit Sakhadeo*, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White

    ⮩ Hyperparameter tuning from offline data, to fully specify the hyperparameters for an RL agent that learns online in the real world

    hyper

Preprints

* denotes equal contribution

  1. Fine-Tuning without Performance Degradation

    Under Review

    Han Wang, Adam White, Martha White

    ⮩ A practical fine-tuning algorithm that gradually allows more exploration based on online estimates of performance.

    finetune

  2. Avoiding Model Estimation in Robust Markov Decision Processes with a Generative Model

    Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang

Thesis

  1. Emergent Representations in Reinforcement Learning and Their Properties

    M.Sc., 2020