LLM | Sen He

Harnessing large language models for virtual reality exploration testing: a case study

Thu, 01 Jan 2026 00:00:00 +0000

Performance analysis of AI-generated code: A case study of Copilot, Copilot Chat, CodeLlaMa, and DeepSeek-Coder models

Thu, 01 Jan 2026 00:00:00 +0000

AI-Driven Software Testing

Sat, 01 Jun 2024 00:00:00 +0000

Overview

Modern software systems — from virtual reality applications to microservice architectures — present unique testing challenges that traditional approaches struggle to address. This project explores how large language models and AI techniques can be applied to automate and improve software testing across these domains.

Research Directions

LLM-Based VR Testing: We investigate using LLMs for exploration testing of VR applications, where the state space is large and traditional test generation methods are insufficient. Our work demonstrates how LLMs can generate meaningful interaction sequences that achieve high code coverage.

Code Clone Detection in VR Software: Empirical studies on code cloning patterns specific to VR software development, identifying unique maintenance and security implications.

Software Security in VR: Investigating software security weaknesses across VR projects, examining when and why vulnerabilities emerge during the development lifecycle.

Microservice Log Analysis with AI: Systematic analysis of how AI techniques can be applied to microservice log data for anomaly detection, root cause analysis, and system health monitoring.

Multi-Agent LLM Orchestration for Software Engineering

Mon, 01 Jan 2024 00:00:00 +0000

Overview

Large language models have demonstrated remarkable capabilities in code generation, but individual models often produce code with functional errors or performance inefficiencies. This project investigates multi-agent orchestration frameworks that coordinate multiple LLMs through structured collaboration pipelines — combining specialized agents for categorization, code generation, debugging, and refinement.

Key Contributions

Our flagship system, PerfOrch, introduces a memory-augmented multi-agent architecture where:

A Categorizing Agent classifies programming tasks using a fixed category vocabulary to enable retrieval of relevant historical solutions.
Generator and Debugger Agents collaborate through iterative cycles to produce functionally correct code.
A Refinement Agent optimizes code performance using aggregated insights from the orchestration’s memory module.
The architecture leverages asymmetric aggregation strategies (product vs. sum) across different pipeline stages.

We evaluate across 5 frontier LLMs and demonstrate significant improvements over single-model baselines on both correctness and runtime performance metrics.

Status

Manuscript submitted to ACM Transactions on Software Engineering and Methodology (TOSEM).