Improving Language Model Reasoning with Multimodal Chain-of-Thought Technique
This paper presents a novel approach to enhance the complex reasoning ability of language models by utilizing the Multimodal Chain-of-Thought (CoT) technique. While Large Language Models (LLMs) have demonstrated impressive performance in reasoning, current CoT studies have focused only on the language modality. Our proposed Multimodal-CoT framework incorporates both textual and visual information, through a two-stage process of generating rationale and inferring answers. This method can lead to even more accurate and detailed outputs from language models.
下载地址
用户评论