Department

Computer Science

Major

Computer Science

Research Advisor

Tripathy, Ardhendu

Advisor's Department

Computer Science

Abstract

Reinforcement Learning from Human Feedback (RLHF) can be used as a means to align Al agents and Large Language Models (LLM) to better represent human expectations. There is a myriad of RLHF methods that exist, however it is difficult to benchmark and compare such methods in terms of alignment, training cost, data collection cost, and other metrics. This project aims to create a robust classification for different RLHF methods from a theoretical point of view. Additionally, this project will attempt to propose bounds for the degree of influence on LLMs that stems from human feedback. Open source LLMs will be the prime focus of this project. They will be fine-tuned using different RLHF methods and will have their outputs compared using various criteria. This topic sits at the intersection of Computer Science, Math, and Philosophy, as researchers attempt to determine the impact of a human's unconscious bias on Al.

Biography

Matthew Dominicis is a Junior in Computer Science graduating in May 2025. His interests lie in the fields of Artificial Intelligence and Quantum Computing. He participated in the 2023-2024 OURE cohort, where he did research under Dr. Avah Banerjee studying the theory of Quantum Computation. In his free time, he conducts research, spends time with organizations including the Eta Kappa Nu honors society, the Society of Hispanic Professional Engineers, the Google Developer Student Club, and works part-time as an intern at Worldwide Technology. He looks forward to branching out and learning more about Al.

Research Category

Sciences

Presentation Type

OURE Fellows Proposal Oral Applicant

Document Type

Poster

Location

Innovation Forum - 1st Floor Innovation Lab

Presentation Date

10 April 2024, 1:00 pm - 4:00 pm

Share

COinS
 
Apr 10th, 1:00 PM Apr 10th, 4:00 PM

TA Theoretical Framework for Comparing RLHF Method

Innovation Forum - 1st Floor Innovation Lab

Reinforcement Learning from Human Feedback (RLHF) can be used as a means to align Al agents and Large Language Models (LLM) to better represent human expectations. There is a myriad of RLHF methods that exist, however it is difficult to benchmark and compare such methods in terms of alignment, training cost, data collection cost, and other metrics. This project aims to create a robust classification for different RLHF methods from a theoretical point of view. Additionally, this project will attempt to propose bounds for the degree of influence on LLMs that stems from human feedback. Open source LLMs will be the prime focus of this project. They will be fine-tuned using different RLHF methods and will have their outputs compared using various criteria. This topic sits at the intersection of Computer Science, Math, and Philosophy, as researchers attempt to determine the impact of a human's unconscious bias on Al.