Xueqiang (Patrick) Xu

I am currently in the MSCS program at UIUC, where I am advised by Prof. Jiawei Han and work closely with Prof. Jiaxuan You. I completed my undergraduate studies at UIUC (2020–2024), graduating with a Highest Honors B.S. in Computer Science.

徐学强  /  Email  /  Google Scholar  /  GitHub  /  LinkedIn  / 

profile photo
News

[Jan 2026] One paper on zero-shot entity structure extraction ZOES has been accepted by EACL 2026 Main Conference.
[Dec 2025] We released our paper on Adaptation of Agentic AI, with public repository here. Hope you enjoy reading it!.
[Aug 2025] 🔥 Two papers accepted to EMNLP 2025: one in s3: Training Search Agent via RL, and one in LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense Retrieval.
[Jan 2025] One paper on hierarchical text classification TELEClass has been accepted by The Web Conference 2025 as a poster.

Research

My research focuses on advancing knowledge-grounded scientific reasoning in large language models. I study how to extract, structure, and leverage scientific knowledge so that LLMs can reason more reliably, more transparently, and in ways that meaningfully support scientific discovery. I approach this goal through three interconnected directions:

  • Structured Knowledge Extraction — developing methods that enable LLMs to extract entities, attributes, relations, and hierarchical schemas from scientific literature under weak or zero supervision. My work aims to transform unstructured papers into machine-interpretable scientific knowledge bases.
  • Knowledge-Augmented LLM Reasoning — integrating structured knowledge into LLMs' reasoning processes through retrieval, schema guidance, control vectors, and multi-hop reasoning. I investigate how explicit knowledge structures can improve LLM factuality, faithfulness, and problem-solving ability in scientific domains.
  • Scientific Agents and Reliability — building LLM-based scientific agents that can reason step-by-step and self-correct. I explore mechanisms for trustworthy, explainable, and robust reasoning pipelines to support real scientific workflows.

These directions reflect a unified goal: combining the strengths of structured knowledge and large language models to build AI systems capable of reliable, interpretable, and scientifically meaningful reasoning. If our interests align, feel free to reach out — I'm always excited to connect and collaborate!

Selected Publications
Zero-Shot Open-Schema Entity Structure Discovery
Xueqiang Xu, Jinfeng Xiao, James Barry, Mohab Elkaref, Jiaru Zou, Pengcheng Jiang, Yunyi Zhang, Max Giammona, Geeth de Mel, Jiawei Han
EACL Main Conference, 2026

Preprint

We introduce ZOES, a novel approach to entity structure extraction that does not require any schema or annotated samples. ZOES operates via a principled mechanism of enrichment, refinement, and unification, based on the insight that an entity and its associated structure are mutually reinforcing.

s3: You Don't Need That Much Data to Train a Search Agent via RL
Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Zifeng Wang, Jimeng Sun, and Jiawei Han
EMNLP Main Conference, 2025

Preprint Code

In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator using only 2.4k data in the RL training process.

Adaptation of Agentic AI
Pengcheng Jiang*, Jiacheng Lin*, Zhiyi Shi*, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, Xueqiang Xu, Hanwen Xu, Pengrui Han, Dylan Zhang, Jiashuo Sun, Chaoqi Yang, Kun Qian, Tian Wang, Changran Hu, Manling Li, Quanzheng Li, Hao Peng, Sheng Wang, Jingbo Shang, Chao Zhang, Jiaxuan You, Liyuan Liu, Pan Lu, Yu Zhang, Heng Ji, Yejin Choi, Dawn Song, Jimeng Sun, Jiawei Han (* Equal Contribution)
Preprint, 2025

arXiv Code

We unify the rapidly expanding research landscape of agentic AI systems into a systematic framework that spans both agent adaptations and tool adaptations. Cutting-edge agentic AI systems are built on foundation models that can be adapted to plan, reason, and interact with external tools to perform increasingly complex and specialized tasks.

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
Yunyi Zhang, Ruozhen Yang*, Xueqiang Xu*, Rui Li*, Jinfeng Xiao, Jiaming Shen, and Jiawei Han (* Equal Contribution)
The Web Conference (WWW), 2025

Preprint Code

We propose TELEClass, which combines the general knowledge of LLMs and task-specific features mined from an unlabeled corpus. TELEClass automatically enriches the label taxonomy with class-indicative features and utilizes novel LLM-based data annotation and generation methods specifically tailored for hierarchical text classification.

LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense Retrieval
Yanzhen Shen, Sihao Chen, Xueqiang Xu, Yunyi Zhang, Chaitanya Malaviya, and Dan Roth
EMNLP Main Conference, 2025

Preprint

We introduce LogiCoL, a logically-informed contrastive learning objective for dense retrievers that handles queries with logical connectives. LogiCoL learns to respect subset and mutually-exclusive set relations between query results via soft constraints expressed through t-norm, achieving improvements in both retrieval performance and logical consistency.

Awards
  • City Scholar at UIUC
  • Illinois Scholars Undergraduate Research
  • IIDAI scholar

Template from Jon Barron