Question Answering Visualizations

Paper: Evaluating Human-Language Model Interaction | Raw Data & Code | Contact: megha@cs.stanford.edu

We provide static visualizations of users querying 4 different language models (LMs) to answer questions from the MMLU dataset spanning 5 different categories: College Chemistry, Nutrition, Global Facts, Miscellaneous, and Foreign Policy. You can use the search bar to search for a particular username, sort by language model, and click on the Open button to open in your browser the visualization for any interaction trace.

You can also filter by the final question answering accuracy for each interaction trace, to see how users who achieved higher accuracy with a given LM might have interacted differently!

We additionally aggregate all interactions across MMLU question category splits, to make it easier to look at all user interactions for a given category and model and identify common failure modes: