-
Gradient Similarity in Cnns A New Perspective on Class Concept Representation
Rethinking class similarity in deep learning models through the lens of gradient analysis. Our experiments on ResNet variants reveal that gradient-based similarity surpasses latent representation similarity in capturing class-level features, providing a fresh perspective on representation learning and model interpretability.
-
Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations
Vision language models (VLMs), such as GPT-4o, have rapidly evolved, demonstrating impressive capabilities across diverse tasks. However, much of the progress in this field has been driven by engineering efforts, with a limited understanding of how these models work. The lack of scientific insight poses challenges to further enhancing their robustness, generalization, and interpretability, especially in high-stakes settings. In this work, we systematically review the use of mechanistic interpretability methods to foster a more scientific and transparent understanding of VLMs. Specifically, we examine five prominent techniques: probing, activation patching, logit lens, sparse autoencoders, and automated explanation. We summarize the key insights these methods provide into how VLMs process information and make decisions. We also discuss critical challenges and limitations that must be addressed to further advance the field.
-
Sample Blog Post
Your blog post's abstract. Please add your abstract or summary here and not in the main body of your text. Do not include math/latex or hyperlinks.
-
Sample Blog Post (HTML version)
Your blog post's abstract. Please add your abstract or summary here and not in the main body of your text. Do not include math/latex or hyperlinks.