OpenAI の公式チュートリアル: GPT-4 で議事録生成 AI を作成する方法

このチュートリアルでは、OpenAI の Whisper と GPT-4 モデルを使用して自動会議議事録ジェネレーターを開発する方法を説明します。このアプリの機能は、会議の音声を書き起こし、議論された内容を要約し、重要なポイントとアクション項目を抽出し、感情分析を実行することです。

基本的なスキル

このチュートリアルでは、読者が Python と OpenAI API キーの基本を理解していることを前提としています。このチュートリアルに付属のオーディオを使用することも、独自のオーディオを使用することもできます。

さらに、python-docx と OpenAI ライブラリをインストールする必要があります。次のコマンドを使用して、新しい Python 環境を作成し、必要なパッケージをインストールできます。

 python -m venv env source env/bin/activate pip install openai pip install python-docx

Whisperで音声を書き起こす

会議の音声を書き起こす最初のステップは、会議の音声ファイルを OpenAI の /v1/audio API に渡すことです。 Whisper は、音声言語をテキストに変換するオーディオ API を強化するモデルです。最初はプロンプトまたは温度パラメータ（モデル出力を制御するために使用されるオプションのパラメータ）を渡さずに、デフォルトを使用します。

次に、必要なパッケージをインポートし、Whisper を使用してオーディオファイルを読み取り、書き起こす関数を定義します。

 import openai from docx import Document def transcribe_audio(audio_file_path): with open(audio_file_path, 'rb') as audio_file: transcription = openai.Audio.transcribe("whisper-1", audio_file) return transcription['text']

この関数では、audio_file_path は文字起こしするオーディオファイルへのパスです。この関数はファイルを開き、それを Whisper ASR モデル (whisper-1) に渡して転写します。結果は生のテキスト形式で返されます。 openai.Audio.transcribe 関数では、ローカルサーバーまたはリモートサーバー上のファイルへのパスだけでなく、実際のオーディオファイルを渡す必要があることに注意することが重要です。つまり、オーディオファイルが保存されていない可能性のあるサーバー上でコードを実行している場合は、まずそのデバイスにオーディオファイルをダウンロードするための前処理手順が必要になる可能性があります。

GPT-4を使用してトランスクリプトを要約および分析する

トランスクリプトを取得したら、ChatCompletions API を使用して GPT-4 に渡します。 GPT-4 は、OpenAI がリリースした現在最高の大規模言語モデルであり、要約の生成、重要なポイントとアクション項目の抽出、感情分析の実行に使用されます。

このチュートリアルでは、GPT-4 に実行させるタスクごとに異なる関数を使用します。これは最も効率的な方法ではありませんが (これらの指示を関数に入れることもできます)、タスクを分離すると、より質の高い要約が作成されます。

これらのタスクを分離するには、関数 meeting_minutes を定義し、それをアプリケーションのメイン関数として使用します。

 def meeting_minutes(transcription): abstract_summary = abstract_summary_extraction(transcription) key_points = key_points_extraction(transcription) action_items = action_item_extraction(transcription) sentiment = sentiment_analysis(transcription) return { 'abstract_summary': abstract_summary, 'key_points': key_points, 'action_items': action_items, 'sentiment': sentiment }

この機能では、転写は Whisper から取得されたテキストです。転写は、それぞれ特定のタスクを実行する他の 4 つの関数に渡すことができます。abstract_summary_extraction (会議の概要を生成する)、key_points_extraction (重要なポイントを抽出する)、action_item_extraction (アクション項目を識別する)、sentiment_analysis (感情分析を実行する) です。他の機能を追加したい場合は、上記と同じフレームワークを使用できます。

各機能の動作は次のとおりです。

要約抽出

abstract_summary_extraction 関数は、不必要な詳細や余談を避けながら最も重要なポイントを保持することを目的として、トランスクリプトを簡潔な要約にまとめます。このプロセスを実現するための主なメカニズムは、次のシステムメッセージです。いわゆるプロンプトエンジニアリングを通じて、同様の結果を達成するためのさまざまな方法が可能です。これを最も効果的に行う方法を知りたい場合は、OpenAI が提供する GPT ベストプラクティスガイドに記載されている詳細なアドバイスを確認してください: https://platform.openai.com/docs/guides/gpt-best-practices

 def abstract_summary_extraction(transcription): response = openai.ChatCompletion.create( model="gpt-4", temperature=0, messages=[ { "role": "system", "content": "You are a highly skilled AI trained in language comprehension and summarization. I would like you to read the following text and summarize it into a concise abstract paragraph. Aim to retain the most important points, providing a coherent and readable summary that could help a person understand the main points of the discussion without needing to read the entire text. Please avoid unnecessary details or tangential points." }, { "role": "user", "content": transcription } ] ) return response['choices'][0]['message']['content']

要点の抽出

key_points_extraction 関数は、会議の議論の主要なポイントを識別してリストします。これらのハイライトには、会議の議論の内容にとって極めて重要な、最も重要なアイデア、調査結果、またはトピックを含める必要があります。繰り返しになりますが、これらの識別ポイントを制御する主なメカニズムはシステムメッセージです。ここでは、プロジェクトや会社の運営方法を説明するために、次のような追加情報を提供する必要があるかもしれません。「当社は、消費者にレーシングカーを販売する会社です。当社の事業内容と目標は何ですか。」この追加情報により、モデルが関連情報を抽出する能力が大幅に向上します。

 def key_points_extraction(transcription): response = openai.ChatCompletion.create( model="gpt-4", temperature=0, messages=[ { "role": "system", "content": "You are a proficient AI with a specialty in distilling information into key points. Based on the following text, identify and list the main points that were discussed or brought up. These should be the most important ideas, findings, or topics that are crucial to the essence of the discussion. Your goal is to provide a list that someone could read to quickly understand what was talked about." }, { "role": "user", "content": transcription } ] ) return response['choices'][0]['message']['content']

アクションアイテムの抽出

action_item_extraction 関数は、会議中に合意または言及されたタスク、割り当て、またはアクションを識別します。これには、特定の個人に割り当てられたタスクやグループによって決定されたアクションが含まれる場合があります。このチュートリアルでは詳しく説明しませんが、Chat Completions API は、タスク管理ソフトウェアでタスクを自動的に作成し、関係者に割り当てることができる機能を提供します。

 def action_item_extraction(transcription): response = openai.ChatCompletion.create( model="gpt-4", temperature=0, messages=[ { "role": "system", "content": "You are an AI expert in analyzing conversations and extracting action items. Please review the text and identify any tasks, assignments, or actions that were agreed upon or mentioned as needing to be done. These could be tasks assigned to specific individuals, or general actions that the group has decided to take. Please list these action items clearly and concisely." }, { "role": "user", "content": transcription } ] ) return response['choices'][0]['message']['content']

感情分析

sentiment_analysis 関数は、会議の議論の全体的な感情を分析します。使用される言語の調子、伝えられる感情、単語やフレーズが現れる文脈を考慮します。複雑度の低いタスクの場合は、gpt-4 に加えて gpt-3.5-turbo を試して、同様のパフォーマンスレベルを達成できるかどうかを確認する価値があります。 sentiment_analysis 関数の結果を他の関数に渡して、会話の感情が他のプロパティにどのように影響するかを確認することも役立つ場合があります。

 def sentiment_analysis(transcription): response = openai.ChatCompletion.create( model="gpt-4", temperature=0, messages=[ { "role": "system", "content": "As an AI with expertise in language and emotion analysis, your task is to analyze the sentiment of the following text. Please consider the overall tone of the discussion, the emotion conveyed by the language used, and the context in which words and phrases are used. Indicate whether the sentiment is generally positive, negative, or neutral, and provide brief explanations for your analysis where possible." }, { "role": "user", "content": transcription } ] ) return response['choices'][0]['message']['content']

会議議事録をエクスポートする

会議の議事録を生成した後、通常は、人間が読みやすく配布しやすい形式で保存する必要があります。このタイプのレポートの一般的な形式は Microsoft Word です。 Python docx ライブラリは、Word 文書を作成するためによく使用されるオープンソースライブラリです。エンドツーエンドの会議議事録アプリケーションを構築している場合は、このエクスポート手順を削除し、代わりにフォローアップメールで要約を送信することを検討してください。

このエクスポートプロセスを実装するには、元のテキストを Word 文書に変換する関数 save_as_docx を定義します。

 def save_as_docx(minutes, filename): doc = Document() for key, value in minutes.items(): # Replace underscores with spaces and capitalize each word for the heading heading = ' '.join(word.capitalize() for word in key.split('_')) doc.add_heading(heading, level=1) doc.add_paragraph(value) # Add a line break between sections doc.add_paragraph() doc.save(filename)

この機能では、議事録は会議の要約、要点、アクション項目、感情分析を含む辞書です。ファイル名は、作成する Word 文書ファイルの名前です。この関数は、新しい Word 文書を作成し、議事録の各セクションにタイトルと内容を追加し、その文書を現在の作業ディレクトリに保存します。

最後に、すべてをまとめて、オーディオファイルから会議の議事録を生成できます。

 audio_file_path = "Earningscall.wav" transcription = transcribe_audio(audio_file_path) minutes = meeting_minutes(transcription) print(minutes) save_as_docx(minutes, 'meeting_minutes.docx')

このコードは、まずオーディオファイル Earningscall.wav を書き起こし、次に会議の議事録を生成して出力し、最後に会議の議事録を meeting_minutes.docx という名前の Word 文書として保存します。

基本的な会議議事録処理はこれで完了です。プロンプトエンジニアリングを通じてパフォーマンスを最適化するか、ローカル関数呼び出しを使用してエンドツーエンドのシステムを構築してみてください。

<<: ドローンレースが人間のトッププレイヤーを上回り、強化学習が再びネイチャーの表紙を飾る

>>: