ChatGPTの曖昧な問題への対応力を高める方法についてお話ししましょう

ヒントエンジニアリング技術は、大規模な言語モデルが検索強化型生成システムで代名詞などの複雑なコア参照を処理するのに役立ちます。

「ChatGPT のあいまいなプロンプトを理解する能力の向上」からの翻訳です。著者の Cheney Zhang は、Zilliz の優れたアルゴリズムエンジニアです。彼は、LLM や Retrieval-Augmented Generation (RAG) などの最先端の AI テクノロジーに深い情熱と専門知識を持ち、Towhee、Akcio などの多くの革新的な AI プロジェクトに積極的に貢献しています。

拡大を続ける AI の分野では、ChatGPT のような大規模言語モデル (LLM) が、これまでにないペースで革新的な研究とアプリケーションを推進しています。重要な進展は、検索拡張生成 (RAG) の出現です。この技術は、LLM のパワーと長期メモリとしてのベクトルデータベースを組み合わせて、生成された応答の精度を高めます。 RAG メソッドの典型的な実施形態は、強力な質問応答システムを提供するオープンソースプロジェクト Akcio です。

アクシオの建築

Akcio のアーキテクチャでは、Milvus や Zilliz (完全に管理された Milvus) などのデータローダーを使用して、ドメイン固有の知識がベクターストアにシームレスに統合されます。ベクターストアは、ユーザークエリに最も関連性の高い上位 K 件の結果を取得し、それらを LLM に送信して、ユーザーの質問に関するコンテキストを LLM に提供します。その後、LLM は外部の知識に基づいて対応を改善します。

たとえば、ユーザーが「2023 年の Big Language Models の使用例は何ですか?」と、Akcio にインポートされた「Big Language Model Progress Insights Report 2023」という記事について問い合わせると、システムはレポートから最も関連性の高い 3 つの段落をスマートに取得します。

 1. In 2023, the LLM use cases can be divided into two categories: generation AI and decision-making. Decision-making scenarios are expected to have higher business value. 2. The generation AI scenario mainly includes dialogue interaction, code development, intelligent agents, etc. 3. NLP applications include text classification, machine translation, sentiment analysis, automatic summarization, etc.

Akcio はこれらの段落を元のクエリと組み合わせて LLM に転送し、詳細で正確な応答を生成します。

 The application scenarios of the large model industry can be divided into generation and decision-making scenarios.

RAG における共参照解決の課題

しかし、進歩にもかかわらず、RAG システムの実装は、特に共参照解決を伴う複数ターンの会話において、依然として課題を抱えています。次の一連の質問について考えてみましょう。

 Q1: What are the use cases of the large language model in 2023? A1: The use cases of large language models can be divided into generation AI and decision-making. Q2: What are their differences, and can you provide examples?

Q2 の代名詞「their」は「生成 AI と意思決定」を指します。ただし、LLM は質問とは無関係な結果を生成し、会話の一貫性を損なう可能性があります。

 1. BERT and GPT are important models in the field of NLP, but they have significant differences in their design and application scenarios. 2. The difference between large models and small models lies in their scale and complexity. Large models usually have more parameters and more complex structures, requiring more computational resources and time for training and inference. Small models, on the other hand, are relatively simple with fewer parameters and faster training and inference speeds. 3. However, there is no additional information to distinguish between these two products because they appear very similar.

トークン化、レンマ化、リカレントニューラルネットワークを使用したキーワード置換などの従来の方法は、複雑な参照を解析するには不十分な場合がよくあります。そのため、研究者は共参照解決タスクのために ChatGPT などの LLM に目を向けました。このアプローチでは、提供されたコンテキストに基づいて、代名詞を置き換えるか、元の質問を維持するように ChatGPT に指示します。このアプローチは有望ですが、プロンプトの指示に従うのではなく、直接回答を生成する場合があり、より洗練された戦略が必要であることが示唆されています。

テスト例

次のプロンプト形式を使用して、ChatGPT に代名詞を置き換えるように直接指示する実験を行いました。

 prompt = f'''Please generate a new question adhering to the following guidelines: 1. Complete any missing pronouns or conditions in the question based on the context. 2. Retain the original question if it's already complete. {history} Question: {question}'''

例1

クエリ:

 history = ''' What is NLP? NLP stands for Natural Language Processing. It is a field of computer science and artificial intelligence that focuses on the interaction between computers and humans using natural language. ''' question = 'What problem does it solve?'

ChatGPTの回答:

 What problem does Natural Language Processing (NLP) solve?

この場合、ChatGPT は「it」を「自然言語処理 (NLP)」に置き換えるという優れた仕事をします。

例2

クエリ:

 history = '''When was GPT4 released? GPT4 was released in 2023''' question = 'What progress has been made in computer vision this year?'

ChatGPTの回答:

 What progress has been made in computer vision this year?

この場合、ChatGPT は「今年」を置き換える際に困難に遭遇し、共参照の解決が不完全になりました。

例3

クエリ:

 history = '''What is GPT3? GPT-3 (Generative Pre-trained Transformer 3) is a state-of-the-art language processing model developed by OpenAI. When was GPT4 released? GPT4 was released in 2023''' question = 'What is the difference between them? What are the advantages of the latter?'

ChatGPTの回答:

 Sorry, the information provided about GPT-4's release date is not accurate as it has not been officially announced by OpenAI. Therefore, it is not possible to answer the question about the difference between GPT-3 and GPT-4 or the advantages of the latter.

残念ながら、ChatGPT は複雑な意思決定プロセスが原因で、指示から逸脱しています。プロンプトを強化するための努力にもかかわらず、ChatGPT は直接的な回答を優先することがあり、共参照解決タスクが複雑になります。

少数ショットのプロンプトと思考連鎖：改善されたアプローチ

ヒントエンジニアリングは、LLM を効果的に活用する上で重要な役割を果たします。私たちは、有望な戦略として、少数ショットのプロンプトと Chain of Thought (CoT) アプローチを組み合わせてテストすることにしました。少数のプロンプトは、LLM に複数の参照例を提示し、回答でそれらの例を模倣するように導きます。 CoT は、回答を段階的に推論するよう促すことで、LLM の複雑な推論タスクのパフォーマンスを向上させます。

これらの技術を統合することで、ChatGPT の共参照解決をガイドするプロンプト形式を開発します。改訂されたプロンプト形式には、空の会話履歴、基本的な例、失敗した代名詞の置換、複数の代名詞が含まれるケースが含まれており、ChatGPT に明確な指示と参照例が提供されます。 ChatGPT が NEED COREFERENCE RESOLUTION: Yes を返すインスタンスは、一貫した応答を取得するために ChatGPT が代名詞または曖昧な参照を置き換える必要があることを示しているため、重要です。

最適化されたプロンプト形式は次のとおりです。

 REWRITE_TEMP = f''' HISTORY: [] NOW QUESTION: Hello, how are you? NEED COREFERENCE RESOLUTION: No => THOUGHT: Consequently, the output question mirrors the current query. => OUTPUT QUESTION: Hello, how are you? ------------------- HISTORY: [Q: Is Milvus a vector database? A: Yes, Milvus is a vector database.] NOW QUESTION: How to use it? NEED COREFERENCE RESOLUTION: Yes => THOUGHT: I must substitute 'it' with 'Milvus' in the current question. => OUTPUT QUESTION: How to use Milvus? ------------------- HISTORY: [] NOW QUESTION: What are its features? NEED COREFERENCE RESOLUTION: Yes => THOUGHT: Although 'it' requires substitution, there's no suitable reference in the history. Thus, the output question remains unchanged. => OUTPUT QUESTION: What are its features? ------------------- HISTORY: [Q: What is PyTorch? A: PyTorch is an open-source machine learning library for Python. It provides a flexible and efficient framework for building and training deep neural networks. Q: What is Tensorflow? A: TensorFlow is an open-source machine learning framework. It provides a comprehensive set of tools, libraries, and resources for building and deploying machine learning models.] NOW QUESTION: What is the difference between them? NEED COREFERENCE RESOLUTION: Yes => THOUGHT: 'Them' should be replaced with 'PyTorch and Tensorflow' in the current question. => OUTPUT QUESTION: What is the difference between PyTorch and Tensorflow? ------------------- HISTORY: [{history}] NOW QUESTION: {question} NEED COREFERENCE RESOLUTION: '''

テスト済みの例と最適化された応答

最適化のヒントに関する実験の結果をいくつか紹介します。

例1

クエリ:

 history = '''When was GPT-4 released? GPT-4 was released in 2023''' question = 'What progress has been made in computer vision this year?'

ChatGPT 最適化された応答:

 Yes => THOUGHT: I need to replace "this year" with "2023" in the now question. => OUTPUT QUESTION: What progress has been made in computer vision in 2023?

例2:

クエリ:

 history = '''What is GPT-3? GPT-3 (Generative Pre-trained Transformer 3) is a state-of-the-art language processing model developed by OpenAI. When was GPT-4 released? GPT-4 was released in 2023''' question = 'What is the difference between the two? What are the advantages of the latter?'

ChatGPT 最適化された応答:

 Yes => THOUGHT: I need to replace 'the two' with 'GPT-3 and GPT-4' and 'the latter' with 'GPT-4' in the now question. => OUTPUT QUESTION: What is the difference between GPT-3 and GPT-4? What are the advantages of GPT-4?

最適化されたプロンプト形式により、ChatGPT が複雑な共参照解決タスクを処理する能力が大幅に向上します。これまでは課題となっていた複数のエンティティが関与する問題も、効果的に解決できるようになりました。 ChatGPT は代名詞や曖昧な参照を巧みに置き換え、正確で文脈に適した応答を提供します。

結論は

ヒントエンジニアリングは、LLM を使用した RAG システムの共参照解決問題を解決する上で重要な役割を果たします。少数ショットプロンプトや CoT メソッドなどの革新的な技術を統合することで、RAG システムで複雑な参照を処理する能力が大幅に向上し、ChatGPT などの LLM が代名詞や曖昧な参照を正確に置き換えて、一貫した応答を生成できるようになります。

<<: 快手ドラゴンフライ戦略エンジンの設計と応用

>>: 速報です！ OpenAIがByteDanceアカウントを禁止！コンテンツ生成のための GPT の不正使用に関する内部告発