Evaluating trust and safety of large language models
Accepted to the 2024 International Conference on Machine Learning, two Livermore papers examined trustworthiness—how a model uses data and makes decisions—of large language models, or LLMs. In “TrustLLM: Trustworthiness in Large Language Models,” Bhavya Kailkhura and collaborators from universities and research organizations around the world developed a comprehensive trustworthiness evaluation framework. They examined 16 mainstream LLMs—ChatGPT, Vicuna, and Llama2 among them—across 8 dimensions of trustworthiness, using 30 public datasets as benchmarks on a range of simple to complex tasks. Led by Lehigh University, the study is a deep dive into what makes a model trustworthy. The authors gathered assessment metrics from the already extensive scientific literature on LLMs, reviewing more than 600 papers published during the past 5 years. Spoiler alert: None of the tested models was truly trustworthy according to TrustLLM benchmarks.
In “Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression,” James Diffenderfer, Brian Bartoldson, and Kailkhura joined colleagues from several universities to investigate trustworthiness in the context of compression, where a model is modified to reduce the amount of data and compute resources necessary for efficiency. The team applied five compression techniques to leading LLMs, testing the effects on various trustworthiness metrics, and discovered that compression via quantization was generally better—i.e., the model scored higher on trust metrics—than compression via pruning. Furthermore, they saw improved performance of 4-bit quantized models on certain trustworthiness tasks compared to models with 3- and 8-bit compression. The rapid pace of LLM development raises new questions even as researchers answer existing ones. And with growing emphasis on this technology among the AI/ML community and at top conferences, understanding how LLMs work is the key to realizing their potential. Read more about both papers via LLNL News.