Gemma Scope: helping the safety community shed light on the inner workings of language models | AI 资讯 | 云织星·工具台

Language Model Interpretability team Announcing a comprehensive, open suite of sparse autoencoders for language model interpretability. To create an artificial intelligence (AI) language model, researchers build a system that learns from vast amounts of data without human guidance. As a result, the inner workings of language models are often a mystery, even to the researchers who train them. Mechanistic interpretability is a research field focused on deciphering these inner workings.

查看原文

如页面未自动加载，请开启 JavaScript。