📖 Educations

  • Sep. 2019 - Dec. 2025 (now), Huazhong University of Science and Technology. Wuhan, China. Ph.D. in Control Science and Engineering.

📝 Publications

  1. Preference-CFR Beyond Nash Equilibrium for Better Game Strategies. (ICML 2025) Proposes the Preference Counterfactual Regret Minimization (Pref-CFR) algorithm to achieve diverse Nash equilibria, enabling customizable strategies by incorporating preference and vulnerability parameters. Demonstrates distinct play styles in Texas Hold’em without sacrificing strategic strength.
  2. Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play (NeurIPS 2024). Introduces the Monte Carlo Counterfactual Value-Based Fictitious Play (MCCFVFP) algorithm for large-scale games, achieving 20–50\% faster convergence than standard MCCFR in complex settings like Texas Hold’em.
  3. Real-Time Weighted Fictitious Play: Converging to Equilibrium at the Speed of $O(T^{-1})$ in Games. Presents the Real-Time Weighted Fictitious Play (RTWFP) algorithm with $O(T^{-1})$ convergence in two-player zero-sum games, extending to correlated equilibrium and continuous-time FP. Outperforms existing algorithms in scalability and speed.
  4. ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models. Proposes ELO-Rated Sequence Rewards (ERRL), which uses ordinal preferences and ELO ratings to replace numerical rewards, achieving superior performance in long-term RL tasks like Atari benchmarks.

💻 Experiences

Fine-tuning LLM via Game Theory (Mar. 2025 ~ Now)

Exploring game-theoretic approaches to overcome data bottlenecks in LLM training. By enabling self-play to generate high-quality data, we aim to advance LLM fine-tuning beyond reliance on static datasets.

SD Model Fine-Tuning via Reinforcement Learning at Vivo (Feb. 2025 ~ Apr. 2025)

Worked on fine-tuning Stable Diffusion with reinforcement learning. Designed a reward model combining aesthetics, textual relevance, diversity, and human feedback. Early results show improved image quality and alignment, with ongoing efforts to refine reward design and scale distributed training.

Texas Hold’em AI Development at Fen AI Lab (Sep. 2023 ~ Jan. 2024)

Contributed to building a poker AI rivaling Pluribus. Implemented and tuned MCCFR and its variants, optimized computation, and developed strategy storage and visualization tools. Helped achieve professional-level play in 2-player games.

Reinforcement Learning for Game NPCs at ByteDance Nuverse (Jul. 2021 ~ Mar. 2022)

During my role as the main implementer at ByteDance Nuverse from July 2021 to March 2022, I designed multi-style AI companion NPCs for the game One Piece: Burning Blood. Enhanced RL framework with a style evolution module, integrating human preference data to create diverse playstyles. Improved key performance metrics by 80–120%, with some models production-ready. Co-authored a paper on human–AI interaction.

Land Auction Strategy Optimization at China Resources Group (Feb. 2021 ~ Jun. 2022)

Developed a bidding algorithm for high-stakes land auctions by applying Fictitious Play to model competitor strategies and optimize bids. Validated in multiple live auctions, the method improved bid accuracy by 3–4× and increased win probability by ~5%. The algorithm was subsequently adopted as the standard tool for large-scale transactions.