✌️ジョブの作成方法

モデルの選択

モデルハブからモデルを選択する方法は2通りあります：

モデルカタログ：DeepSeek、Gemma、Llama、Qwenなど、様々なプロバイダーからのモデルソースを保管するリポジトリ。
プライベートモデル：ユーザー所有モデルおよび微調整済みモデルのリポジトリ。これらのモデルには必要なファイルがすべてアップロード済みである必要があります。互換性と準備完了を示す特定のタグを含める必要があります。

注記：テストジョブで使用するには、モデルは以下の条件を満たす必要があります：

モデルタイプ：
- LLM: テキスト入力のみ対応
- VLM: テキストと画像の両方の入力に対応
モデルサイズ > 0
学習段階: 必ずインストラクション調整済みであること

例:

モデル名

モデルバージョン

モデルファミリー

モデルタイプ

モデルサイズ

学習ステージ

ft_Llama-3.1-8B_20250508124054_samples-15d5e2f6fe7

15d5e2f6fe7

Llama

LLM (→ 利用可能)

8B (→ 利用可能)

Instruction-tuned (→ 利用可能)

テストスイートの選択

モデルをテストする適切なテストスイートを選択してください。

以下のテストスイートを提供します：

テストスイート

目的

最適対象

標準

独自のデータセットを使用したモデル評価。

内部ベンチマーク、特定分野のタスク（例：金融、医療...）

Nejumi Leaderboard 3

特に日本語タスク向けのLLMをベンチマーク。参照: Nejumi Leaderboard 3

日本語タスクにおけるLLMの比較。

LM 評価ハーネス

多くの標準的なNLPベンチマークで言語モデルを評価するための汎用フレームワーク。参照: LM Evaluation Harness

英語中心のLLMを評価し、研究文献との比較可能性を確保

VLM 評価キッ

マルチモデルタスクにおけるVLMs（視覚言語モデル）の評価。参照: VLMEvalKit

マルチモーダルモデルのテスト

データ形式の選択

テストスイート = 標準を選択する場合にのみデータ形式を選択してください

サポートされているデータ形式

サポートされているファイル形式

サポートされているファイルサイズ

Alpaca

- CSV - JSON - JSONLINES - ZIP - PARQUET

制限100MB

ShareGPT

- JSON - JSONLINES - ZIP - PARQUET

制限 100MB

ShareGPT_Image

- ZIP - PARQUET

制限100MB

現在テスト用にサポートしているデータ形式は以下の通りです：

a/ Alpaca

Alpaca、教師あり微調整タスク向けに、入力と出力のペアを用いた指示順守形式でモデルを微調整するために、非常にシンプルな構造を採用しています。基本構造は以下の通りです：

Instruction:モデルが実行すべき特定のタスクや要求を含む文字列。
Input: モデルがタスクを実行するために処理する必要がある情報を含む文字列。
Output: 指示と入力を処理して生成される、モデルが返すべき結果を表す文字列。

詳細な構造:

[
   {
      “instruction”: “string”,
      “input”: “string”,
      “output” “string”
  }
]

例:

[
    {
        "instruction": "In this task, you are given Wikipedia articles on a range of topics as passages and a question from the passage. We ask you to answer the question by classifying the answer as 0 (False) or 1 (True)\n\nNow complete the following instance -\nInput: Passage: Missouri River -- Tonnage of goods shipped by barges on the Missouri River has seen a serious decline from the 1960s to the present. In the 1960s, the USACE predicted an increase to 12 million short tons (11 Mt) per year by 2000, but instead the opposite has happened. The amount of goods plunged from 3.3 million short tons (3.0 Mt) in 1977 to just 1.3 million short tons (1.2 Mt) in 2000. One of the largest drops has been in agricultural products, especially wheat. Part of the reason is that irrigated land along the Missouri has only been developed to a fraction of its potential. In 2006, barges on the Missouri hauled only 200,000 short tons (180,000 t) of products which is equal to the amount of daily freight traffic on the Mississippi.\nQuestion: is there barge traffic on the missouri river\nOutput:",
        "input": "",
        "output": "1"
    },
    {
        "instruction": "CARAÏBES Communauté des Caraïbes (CARICOM) Aperçu Les 15 membres de la Communauté des Caraïbes (CARICOM) sont Antigua et Barbuda, les Bahamas, la Barbade, le Belize, la République dominicaine, la Grenade, la Guyane, Haïti, la Jamaïque, Saint-Kitts-et-Nevis, Sainte-Lucie, Saint Vincent et les Grenadines, le Suriname, Trinité-et-Tobago, ainsi que Montserrat (sous la dépendance du Royaume-Uni).\n\nWhich language is this?",
        "input": "",
        "output": "French"
    }
]

サンプル: https://github.com/fpt-corp/ai-studio-samples/tree/main/sample-datasets/alpaca

サポートされているファイル形式: .csv, .json, .jsonlines, .zip, .parquet

b/ ShareGPT

ShareGPTは、ユーザーとAIアシスタント間の複数ターンにわたる会話（やり取りのあるチャット）を表現するために設計されています。これは、複数ターンにわたる文脈のある会話を処理する必要がある対話システムやチャットボットのモデルをトレーニングまたは微調整する際に一般的に使用されます。

Each data sample consists of a conversations array, where each turn in the chat includes:

from: 発言者 — 通常"human" または"system".
value: その話者からの実際のメッセージテキスト。

詳細な構造:

[
   {
      “conversations”: [
         {
            “from”: “string”,
            “value”: “string”
         }
      ]
   }
]

例:

[
    {
        "conversations": [
            {
                "from": "system",
                "value": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."
            },
            {
                "from": "human",
                "value": "Select the latest conversation state of the provided conversation history. Response only with the conversation state index.\n\n# Context\n- Hôm nay: Thứ tư, 19/02\n- Họ tên khách hàng: LÊ HOÀNG LOAN\n- Sản phẩm vay: Điện thoại Samsung\n- Số tiền cần thanh toán kỳ này: 1515000\n- Ngày đến hạn thanh toán kỳ này: 16/02\n- Số ngày đã trễ hạn thanh toán: 3\n- Hạn thanh toán cuối: 21/02\n\n# Conversation history\nTổng đài viên: Dạ có phải Anh LOAN đang nghe máy không ạ?\nNgười nhận cuộc gọi: có gì không em\n\n# Conversation states\n1) Xác nhận danh tính của mình là khách hàng - Ví dụ: \"ừ ai đấy\", \"phải chị\", \"đúng rồi ai gọi thế\", \"ờ vâng\", \"dạ vâng\", \"Anh LOAN đây\", \"sao thế\", \"mình nghe ạ\", \"có chuyện gì\", \"nói đi\"\n2) Hỏi danh tính của tổng đài viên - Ví dụ: \"ai đấy\", \"bạn tên gì\", \"bạn là ai\", \"công ty gì\", \"lừa đảo à\"\n3) Xác nhận là người thân (như ông, bà, bố, mẹ, cô, dì, chú, bác, anh, chị, em, vợ, con) của khách hàng hoặc có quen biết khách hàng - Ví dụ: \"anh là anh nó\", \"mình là chị của LOAN\", \"bác là mẹ nó nghe máy hộ nó đây\", \"cô là dì nó\", \"em là người nhà của Anh LOAN\", \"bạn ấy ra ngoài rồi mình là bạn của bạn ấy\", \"à Anh làm cùng cơ quan với nó\", \"mình là bạn nó\", \"quen biết qua thôi\"\n4) Thông báo không quen biết khách hàng hoặc thông báo nhầm số / nhầm người hoặc thông báo khách hàng đi vắng / chỉ nghe máy hộ và không cung cấp thông tin về mối quan hệ với khách hàng - Ví dụ: \"có quen biết gì đâu\", \"mình không biết bạn đấy\", \"ủa số này của tôi mà có biết đấy là ai đâu\", \"tôi nghe hộ bạn ấy ra ngoài rồi\", \"không nhầm rồi em ơi\", \"LOAN nào nhỉ\", \"LOAN á không phải rồi\", \"ai cơ\"\n5) Đồng ý thanh toán chung chung hoặc thông báo thời điểm hoặc số tiền sẽ thanh toán - Ví dụ: \"à ờ\", \"đúng rồi\", \"biết rồi nhé\", \"lát nữa nhé\", \"tý nữa giả nhé\", \"mai Anh đóng\", \"để tôi sắp xếp\", \"để hỏi nhờ người nhà xem sao\", \"giờ chưa có nhưng mai sẽ trả\", \"ngày kia trả được không nhỉ\"\n6) Đặt câu hỏi hoặc thắc mắc về khoản nợ, số hotline, phương thức thanh toán, ngày tháng - Ví dụ: \"tôi vay sản phẩm nào ấy nhỉ\", \"Anh cần trả bao nhiêu tiền ấy nhỉ\", \"Anh chưa biết cấch thanh toán như thế nào ấy\", \"hợp đồng của tôi còn bao nhiêu kỳ nữa\", \"ơ hạn thanh toán là ngày mùng sáu hàng tháng mà nhỉ\", \"sao lại nhiều tiền thế nhỉ\", \"sao tháng này cần đóng nhiều tiền hơn tháng trước\", \"lãi suất quá hạn là bao nhiêu hả em\", \"Anh đóng luôn mấy kỳ còn lại được không\", \"hôm nay ngày mấy nhỉ\", \"số điện thoại phản ánh là số mấy\", \"không biết đóng ở cửa hàng nào\"\n7) Báo bận hoặc đang không tiện nghe máy hoặc hẹn gọi lại sau - Ví dụ: \"Anh đang bận nhé\", \"tôi đang đi làm bạn ơi\", \"giờ không tiện lắm gọi lại sau đi\", \"tôi đang ngoài đường gọi lại sau nhé\"\n8) Phàn nàn hoặc khiếu nại do bị gọi nhiều - Ví dụ: \"bên em gọi nhiều thế\", \"sao một ngày gọi chục cuộc thế hả em\", \"đừng có gọi nữa mệt lắm rồi\"\n9) Không nghe rõ hoặc yêu cầu nhắc lại thông tin - Ví dụ: \"alo em nói gì thế\", \"Anh chưa nghe được em ơi\", \"ừ cái gì ấy nhỉ\", \"bạn nói lại giúp mình với\", \"ồn quá mình không nghe được ấy\", \"không nói gì à\"\n10) Thông báo khách hàng đã qua đời - Ví dụ: \"bạn ấy bị tại nạn mới mất rồi\", \"nó qua đời rồi em ạ\", \"Anh LOAN chết rồi bạn ơi\"\n11) Muốn gặp tổng đài viên - Ví dụ: \"cho tôi nói chuyện với tổng đài bên bạn nào\", \"Anh muốn gặp tổng đài viên\", \"chuyển máy cho tổng đài xử lý được không\"\n12) Trả lời sai trọng tâm của câu hỏi trước đó hoặc cung cấp thông tin chưa rõ ràng hoặc cần xác nhận lại thông tin hoặc đặt yêu cầu ngoài phạm vi hỗ trợ\n\n# Answer:"
            }
        ]
    }
]
[
    {
        "conversations": [
            {
                "from": "human",
                "value": "Summarize the main ideas of Jeff Walker's Product Launch Formula into bullet points as it pertains to a growth marketing agency implementing these strategies and tactics for their clients..."
            },
            {
                "from": "gpt",
                "value": "Here are the main ideas of Jeff Walker's Product Launch Formula that can be applied by a growth marketing agency for their clients:\n\n1. Identify the target audience and their needs: Understand the ideal customer for the product or service, and create a messaging that resonates with them.\n2. Pre-launch: Build anticipation and excitement for the launch by creating buzz, gathering testimonials and case studies, and using social media to create awareness.\n3. Launch: Use a well-crafted launch sequence to maximize sales and conversions. This can include offering bonuses, creating scarcity, and using a deadline to create urgency.\n4. Post-launch: Follow up with customers, gather feedback, and continue to provide value to keep them engaged and loyal.\n5. Create a product suite: Once the initial product is successful, expand the offering by creating additional products or services that address other needs of the same audience.\n6. Continual optimization: Continually monitor and optimize the launch process and product suite to improve results.\n7. Build a community: Use the launch process to build a community of customers who are passionate about the product and can help spread the word.\n8. Use automation: Use technology and automation to streamline the launch process and improve efficiency."
            }
        ]
    }
]

サンプル: https://github.com/fpt-corp/ai-studio-samples/tree/main/sample-datasets/sharegpt

サポートされているファイル形式: .json, .jsonlines, .zip, .parquet

c/ ShareGPT_Image

ShareGPT_Image は、ShareGPT マルチターンチャット形式の拡張機能です。マルチモーダル学習、すなわち会話においてテキストと画像の両方を扱うモデルの学習に特化して設計されています。

これは、画像と自然言語を同時に処理する必要がある視覚言語モデル（VLMs）の微調整に使用されます。

構造は以下の通りです：

下の会話ターン一覧 "message" (ShareGPTと同様).
A field called "image" or "image_path" that points to the image used in the conversation (using format png, jpg, jpeg)

注意:

画像を表示させる必要があるチャットコンテンツには、必ず image トークンを含める必要があります。
複数の画像がある場合：
- 画像パスはimages 配列内で定義する必要があります。
- チャットフロー内の画像の位置は、 imageトークンで示されます
- チャット内の image トークンの数は、 images 配列内のアイテム数と一致する必要があります。
- 画像は出現順にマッピングされ、各 image トークンは images 配列の対応する画像に置き換えられます。

詳細な構造:

[
   {
      “message”: [
         {
            “role”: “string”, “content”: “<image>string”
         }
      ],
      “images”: [
         “images/0.jpg”
      ]
   }
]

例:

[
    {
        "messages": [
            {
                "role": "user",
                "content": "<image>How many baseball players are visible in the image?"
            },
            {
                "role": "assistant",
                "content": "There are three baseball players visible in the image."
            }
        ],
        "images": [
            "images/0.jpg"
        ]
    },
    {
        "messages": [
            {
                "role": "user",
                "content": "What is the man wearing in the image?<image>"
            },
            {
                "role": "assistant",
                "content": "The man in the image is wearing chaps, a cowboy hat, and is dressed like a cowboy."
            }
        ],
        "images": [
            "images/1.jpg"
        ]
    },
    {
        "messages": [
            {
                "role": "user",
                "content": "<image>Explain the visual content of the image in great detail."
            },
            {
                "role": "assistant",
                "content": "The image presents a view of a hotel room, showing both the bathroom and bedroom areas. In the bathroom, there is a white sink under a mirror, accompanied by several bottles placed on the counter, and two cups nearby. \n\nLooking through a doorway into the bedroom area, a large bed can be seen dominating the space. A chair is also visible within the room, situated near the foot of the bed. The hair drier is located in the foreground, implying that it likely belongs to the bathroom area as well. The arrangement of the objects and the furniture suggests a comfortable and well-equipped hotel room."
            }
        ],
        "images": [
            "images/2.jpg"
        ]
    }
]

サンプル: https://github.com/fpt-corp/ai-studio-samples/tree/main/sample-datasets/sharegpt-image

サポートされているファイル形式：.zip, .parquet

Select Test Data

You have two ways to transfer the Test data:

Connect to Data Hub
1. Click Data Hub
2. Select a connection or dataset from the Data Hub. Notice: Ensure the dataset is compatible with the selected format.
3. (Optional) Click Open Data Hub to preview or manage datasets.
4. (Optional) Click Reload icon to update connection and dataset list.
5. Follow the detailed guide Data Hub

Supported data format

Supported file format

Supported file size

Alpaca

- CSV - JSON - JSONLINES - ZIP - PARQUET

Limit 100MB

ShareGPT

- JSON - JSONLINES - ZIP - PARQUET

Limit 100MB

ShareGPT_Image

- ZIP - PARQUET

Limit 100MB

Notice: Ensure the file matches the selected data format

Upload a file
1. Default value Upload file
2. Choose a local file from your computer.
3. (Optional) Click Download sample to see an example of the expected format.

Test Criteria

Click Add & update button
The Tasks window appears. Select the task type:
- Text similarity: Measures similarity metrics between model outputs and reference texts.
Click Next to open the list of available metrics
Select one or more metrics
Click Update to apply changes

The following metrics of Text similarity are available:

Test criteria / Metric

How it tests

Best for

BLEU

Measures how precisely a model’s output matches reference text using n-gram overlap.

Evaluating translation and short text similarity.

Fuzzy Match

Measures how closely the model’s output resembles the reference text, allowing for minor differences in wording or order.

Checking approximate correctness.

ROUGE-1

Measures unigram (single word) overlap between model output and reference text.

Short summarization and text generation tasks.

ROUGE-2

Measures bigram (two-word sequence) overlap between model output and reference text.

Evaluating contextual accuracy.

ROUGE-L

Measures the longest common subsequence (LCS) between model output and reference text to capture fluency and word order similarity.

Longer text evaluation where structure matters.

ROUGE-LSUM

Measures LCS-based similarity across multiple sentences, suitable for evaluating longer summaries.

Summarization tasks.

パラメータの設定

パラメータを使用すると、テスト中のモデルの動作を調整できます。以下に各パラメータとその目的を説明します：

名称

説明

型

サポートされる値

ログサンプル

モデルの出力とモデルに入力されたテキストが保存されます

bool

真 / 偽

最大トークン数

生成するトークンの最大数

整数

(0, +∞)

少数のショット数

コンテキストに配置する少数のショット例の数。整数で指定する必要があります。

整数

[0, +∞)

温度

サンプリングの温度

浮動小数点数

[0, +∞)

繰り返しペナルティ

プロンプトと生成されたテキストに現れるかどうかに基づいて新しいトークンにペナルティを課す浮動小数点数。

値が 1 より大きい場合新しいトークンを奨励する。
値 < 1繰り返しを奨励する。

浮動小数点数

(0, 5)

シード

再現性のための乱数シード

整数

[0, +∞)

上位K

考慮する上位トークンの累積確率を制御する整数。すべてのトークンを考慮するには-1を設定します。

整数

-1 or (0, +∞)

上位P

考慮対象となる上位トークンの累積確率を制御する浮動小数点数。すべてのトークンを考慮するには1に設定します。

浮動小数点数

(0, 1]

完了

最後に、以下の条件を満たす一意のジョブ名（例: testj_20250919145022）を入力する必要があります:

文字または数字で始まる
アルファベット（a-z, A-Z）、数字（0-9）、アンダースコア「_」、ハイフン「-」のみ使用
最大100文字
重複しない名前

最大200文字のオプションのメモ付きジョブ記述を行い、パイプラインが成功または失敗した際にメール通知（[email protected]）を受け取ります。

Previousチュートリアル Nextジョブの管理方法

Last updated 22 days ago