Xueqiang (Patrick) Xu

I am currently in the MSCS program at UIUC, where I am advised by Prof. Jiawei Han and work closely with Prof. Jiaxuan You. I completed my undergraduate studies at UIUC (2020–2024), graduating with a Highest Honors B.S. in Computer Science.

徐学强  /  Email  /  Google Scholar  /  GitHub  /  LinkedIn  / 

profile photo
News

[Aug 2025] 🔥 Two papers accepted to EMNLP 2025: one in s3: Training Search Agent via RL, and one in LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense Retrieval.
[Jan 2025] One paper on hierarchical text classification TELEClass has been accepted by The Web Conference 2025 as a poster.

Research

My research focuses on advancing knowledge-grounded scientific reasoning in large language models. I study how to extract, structure, and leverage scientific knowledge so that LLMs can reason more reliably, more transparently, and in ways that meaningfully support scientific discovery. I approach this goal through three interconnected directions:

  • Structured Knowledge Extraction — developing methods that enable LLMs to extract entities, attributes, relations, and hierarchical schemas from scientific literature under weak or zero supervision. My work aims to transform unstructured papers into machine-interpretable scientific knowledge bases.
  • Knowledge-Augmented LLM Reasoning — integrating structured knowledge into LLMs' reasoning processes through retrieval, schema guidance, control vectors, and multi-hop reasoning. I investigate how explicit knowledge structures can improve LLM factuality, faithfulness, and problem-solving ability in scientific domains.
  • Scientific Agents and Reliability — building LLM-based scientific agents that can reason step-by-step and self-correct. I explore mechanisms for trustworthy, explainable, and robust reasoning pipelines to support real scientific workflows.

These directions reflect a unified goal: combining the strengths of structured knowledge and large language models to build AI systems capable of reliable, interpretable, and scientifically meaningful reasoning. If our interests align, feel free to reach out — I'm always excited to connect and collaborate!

Selected Publications
Zero-Shot Open-Schema Entity Structure Discovery
Xueqiang Xu, Jinfeng Xiao, James Barry, Mohab Elkaref, Mohab Elkaref, Jiaru Zou, Pengcheng Jiang, Yunyi Zhang, Max Giammona, Geeth de Mel, Jiawei Han

We introduce ZOES, a novel approach to entity structure extraction that does not require any schema or annotated samples. ZOES operates via a principled mechanism of enrichment, refinement, and unification, based on the insight that an entity and its associated structure are mutually reinforcing.

s3: You Don't Need That Much Data to Train a Search Agent via RL
Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Zifeng Wang, Jimeng Sun, and Jiawei Han
EMNLP Main Conference, 2025
Code

In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator using only 2.4k data in the RL training process.

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
Yunyi Zhang, Ruozhen Yang*, Xueqiang Xu*, Rui Li*, Jinfeng Xiao, Jiaming Shen, and Jiawei Han (* Equal Contribution)
The Web Conference (WWW), 2025
Code

We propose TELEClass, which combines the general knowledge of LLMs and task-specific features mined from an unlabeled corpus. TELEClass automatically enriches the label taxonomy with class-indicative features and utilizes novel LLM-based data annotation and generation methods specifically tailored for hierarchical text classification.

LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense Retrieval
Yanzhen Shen, Sihao Chen, Xueqiang Xu, Yunyi Zhang, Chaitanya Malaviya, and Dan Roth
EMNLP Main Conference, 2025

We introduce LogiCoL, a logically-informed contrastive learning objective for dense retrievers that handles queries with logical connectives. LogiCoL learns to respect subset and mutually-exclusive set relations between query results via soft constraints expressed through t-norm, achieving improvements in both retrieval performance and logical consistency.

Awards
  • City Scholar at UIUC
  • Illinois Scholars Undergraduate Research
  • IIDAI Scholar
Academic Services
  • Reviewer for conferences: EMNLP 2025.