データセット形式の選択

データセットのフォーマットは、選択したトレーナーによって異なります。

トレーナー

サポートされているデータ形式data format

サポートされているファイル形式

サポートされているファイルサイズ

SFT

Alpaca

CSV JSON JSONLINES ZIP PARQUET

制限100MB

SFT

ShareGPT

JSON JSONLINES ZIP PARQUET

制限100MB

SFT

ShareGPT_Image

ZIP PARQUET

制限100MB

DPO

ShareGPT

JSON JSONLINES ZIP PARQUET

制限100MB

事前学習

Corpus

TXT JSON JSONLINES ZIP PARQUET

制限100MB

現在、微調整用のデータ形式として以下の形式をサポートしています:

a/ Alpaca

Alpacaは、教師あり微調整タスク向けに、入力と出力のペアを用いた指示順守形式でモデルを微調整する非常にシンプルな構造を採用しています。基本構造は以下の通りです：

Instruction: モデルが実行すべき特定のタスクまたは要求を含む文字列。
Input: モデルがタスクを実行するために処理する必要がある情報を含む文字列。
Output: 指示と入力を処理して生成される、モデルが返すべき結果を表す文字列。

詳細な構造:

[
   {
      “instruction”: “string”,
      “input”: “string”,
      “output” “string”
  }
]

例:

[
    {
        "instruction": "In this task, you are given Wikipedia articles on a range of topics as passages and a question from the passage. We ask you to answer the question by classifying the answer as 0 (False) or 1 (True)\\n\\nNow complete the following instance -\\nInput: Passage: Missouri River -- Tonnage of goods shipped by barges on the Missouri River has seen a serious decline from the 1960s to the present. In the 1960s, the USACE predicted an increase to 12 million short tons (11 Mt) per year by 2000, but instead the opposite has happened. The amount of goods plunged from 3.3 million short tons (3.0 Mt) in 1977 to just 1.3 million short tons (1.2 Mt) in 2000. One of the largest drops has been in agricultural products, especially wheat. Part of the reason is that irrigated land along the Missouri has only been developed to a fraction of its potential. In 2006, barges on the Missouri hauled only 200,000 short tons (180,000 t) of products which is equal to the amount of daily freight traffic on the Mississippi.\\nQuestion: is there barge traffic on the missouri river\\nOutput:",
        "input": "",
        "output": "1"
    },
    {
        "instruction": "CARAÏBES Communauté des Caraïbes (CARICOM) Aperçu Les 15 membres de la Communauté des Caraïbes (CARICOM) sont Antigua et Barbuda, les Bahamas, la Barbade, le Belize, la République dominicaine, la Grenade, la Guyane, Haïti, la Jamaïque, Saint-Kitts-et-Nevis, Sainte-Lucie, Saint Vincent et les Grenadines, le Suriname, Trinité-et-Tobago, ainsi que Montserrat (sous la dépendance du Royaume-Uni).\\n\\nWhich language is this?",
        "input": "",
        "output": "French"
    }
]

サンプル: https://github.com/fpt-corp/ai-studio-samples/tree/main/sample-datasets/alpaca

サポートされているファイル形式: .csv, .json, .jsonlines, .zip, .parquet

b/ ShareGPT

b.1/ トレーナー= SFT

ShareGPTは、ユーザーとAIアシスタント間の複数ターンにわたる会話（やり取りのあるチャット）を表現するために設計されています。これは、複数ターンにわたる文脈のある会話を処理する必要がある対話システムやチャットボットのモデルをトレーニングまたは微調整する際に一般的に使用されます。

各データサンプルは conversations配列で構成され、チャットの各ターンには以下が含まれます：

から: 発言者 — 通常"human" または"gpt".
価値: その話者からの実際のメッセージテキスト。

詳細な構造:

[
   {
      “conversations”: [
         {
            “from”: “string”,
            “value”: “string”
         }
      ]
   }
]

例:

[
  {
    "conversations": [
      {
        "from": "human",
        "value": "You are a powerful Ai that will detail Character Biographies in succinct and detailed manner. You WILL cover their Personality, Age, Physical Appearance, Attire, Background, and Abilities to the highest degree of accuracy. Use explicit and visceral and graphic descriptions for appearances, clothes, sight, smell, texture, taste, feelings, actions.\nGenerate a profile of “Sona Sitri” from “Highschool DxD”"
      },
      {
        "from": "gpt",
        "value": "Name: Sona Sitri\nFranchise: Highschool DxD\nSpecies: Devil\n\nPersonality:\nSona Sitri is a highly intelligent, poised, and reserved individual. As the heir to the Sitri clan, she carries herself with an air of noble grace and exhibits impeccable manners. She is diligent, resourceful, and has an unwavering dedication to duty, remaining steadfast in her decisions.\n\nAlthough Sona is often perceived as cold due to her formal demeanor, she is shown to care deeply for her peers and possesses a strong sense of responsibility towards her clan and fellow devils. She is often seen mentoring her classmates, demonstrating her willingness to provide guidance and assistance.\n\nAge:\nSona is in her late teens; however, as a devil, her lifespan greatly exceeds that of a human.\n\nPhysical Appearance:\nSona is a strikingly beautiful young woman. Her ethereal beauty is embodied in her delicate features: large, expressive, violet-blue eyes framed by soft, charcoal-black hair that cascades to her waist in gentle waves. Her complexion is flawless and porcelain-like, accentuating her alluring and dignified aura.\n\nHer curved figure is slender, yet well-toned, an indication of the physical strength and durability embedded within her. Sona stands at an average height, with legs that seem to stretch for miles, adding to her overall allure.\n\nAttire:\nSona's attire exudes elegance, reflecting her noble background and personal tastes. She dons the Kuoh Academy girls’ uniform with perfection, consisting of a crisp, white collared blouse, a black ribbon tied elegantly around her slender neck, and a dark maroon blazer with gold trimmings.\n\nHer ensemble is completed with a modest, black knee-length skirt that both accents and mirrors the mystery and beauty of her eyes. Sona's attire is impeccably well-maintained, embodying her impressive discipline and attention to detail.\n\nBackground:\nSona Sitri is the next in line to inherit the governance of the Sitri clan, one of the 72 Pillars of the Devil society. Born into a life of wealth, power, and nobility, Sona has been groomed from a young age to one day lead the clan. She enrolls as a student at Kuoh Academy, a formerly all-girls school that has recently begun to admit male students.\n\nDespite her noble heritage, Sona strives to maintain a sense of normalcy in her life, even if only at school. While attending Kuoh Academy, she forms the Student Council, further showcasing her leadership capabilities.\n\nAbilities:\nAs a high-ranking devil of the Sitri clan, Sona possesses impressive powers and abilities. Her most unique ability is the control and manipulation of vast amounts of water. She is capable of bending water at her will, wielding it as both a weapon and a shield. This ability to control water also grants her the power to create barriers that can imprison her enemies or protect her allies.\n\nSona has demonstrated tremendous magical aptitude, able to create intricate and potent spells with ease. Her intelligence and strategic prowess make her adaptive to any situation, allowing her to design detailed plans and strategies to defeat her enemies.\n\nIn terms of physical capabilities, Sona possesses strength and agility that surpasses those of a human. As a devil, she also has heightened senses and the ability to regenerate from wounds more rapidly than a human.\n\nIn conclusion, Sona Sitri is a captivating devil with a multitude of talents within her arsenal. She is fiercely intelligent, incredibly disciplined, and possesses a near-flawless balance of sophistication and power. Sona ultimately defies expectations through her enigmatic blend of elegance and ferocity, ensuring her place as an unforgettable character within the Highschool DxD universe."
      }
    ]
  },
  {
    "conversations": [
      {
        "from": "human",
        "value": "Write a speech Donnel Udina would have given to justify his coup in Mass Effect 3 that he would have broadcast on the extranet in an attempt to get the people to support his coup against the Council and legitimize his new regime. Have him be vague in his goals and plans but make sure that he would make clear a new order will be set up under his rule."
      },
      {
        "from": "gpt",
        "value": "Ladies and gentlemen, citizens of the galaxy, I am <PRESIDIO_ANONYMIZED_PERSON>. Today, I stand before you to address a critical matter - a decision I have had to make to ensure the survival and future of our galaxy. It is no secret that the galaxy is under grave threat posed by the Reapers, and it is in these trying times that we must evaluate our past, present, and future. It is clear that our past and present courses have led us to the precipice of total annihilation.\n\nFor years, the current Council has made decisions that, though well-intended, have resulted in the weakening of our galactic community. Under their leadership, we have seen power centralized among a few select species, while others have been left to fend for themselves; we have seen bureaucracy take precedence over swift and decisive action; we have seen our defenses languish due to backroom politics and the preservation of personal interest.\n\nThe Reapers have arrived, and all the Council has done so far is to cling to old, failing ways. Now is the time to look to our future - a future where we must rise above our divisions, look beyond petty differences and unite as one to face the greatest darkness that has ever descended upon our galaxy.\n\nSo today, I have made the difficult decision to break free of the Council's stagnation and inertia. I want to be clear: this is not a coup; this is a reformation of our leadership to save our galaxy, and to ensure that our collective strength and potential are used to their fullest. I have decided to establish a new order, one that will emphasize unity, efficiency and strength in our response to the Reaper threat. An order that understands the need for unity above all, and that is willing to take strong, decisive steps.\n\nI know that this decision may be met with skepticism and even hostility. But I implore you, citizens of the galaxy, to consider this - isn't it important to give ourselves the best chance at survival? Would you rather go down fighting, knowing that we pushed ourselves to the limit, or would you perish in bitter resignation, cursing the Council for their indecision and lack of foresight?\n\nI understand that change is frightening, especially change of this magnitude; however, we must face our fears and move forward. I am not asking for immediate, blind loyalty - all I ask is for your consideration, for a chance to show you that this new order will be better equipped to face the challenges ahead.\n\nTogether, the galaxy will stand united, and in unity we will find the power to end this war. Under this new order, I pledge to do everything in my power to ensure that we stand strong in the face of the Reapers, united in our purpose and undivided in our resolve.\n\nToday, we face the greatest challenge our galaxy has ever known, but we are not defeated; we have simply begun a new chapter in our evolution. A chapter where we reject the old ways of division, bureaucracy, and weakened defences.\n\nJoin me, citizens of the galaxy, as we embrace this new, unified purpose. Together, we will emerge victorious - not only against the darkness that threatens us this day, but against all challenges that come to pass, for as long as our light shines bright. Stand with me, and together, we shall create a new legacy for our generations to come.\n\nThank you."
      },
      {
        "from": "human",
        "value": "Write down a Modern AU news article for Game of Thrones which talks about the Green Trial (including the purge of the High Sparrows and Tyrells) and the Coronation of Queen Cersei from the perspective of a modern Westeros, modernizing the events of Game of Thrones to a war torn Westeros inspired by modern civil wars in the process. Have there be a rigged referendum where Cersei seizing control of the throne after Tommen’s death was “approved” by an “overwhelming majority” of voters and “unanimously approved” by a (purged and puppeted) legislature which also approved other amendments to the constitution, with one of said measures an amendment which formalized the one-party nature of the Westerosi state under the Lannisters (even though the country had de facto functioned as such for years). Mention other measures that could be taken by the Lannister regime in such a scenario. Finally, feel free to name the Party the one-party state has and its (official) ideology. "
      },
      {
        "from": "gpt",
        "value": "Headline: Queen Cersei Seizes Powers in Green Trial, Westeros Moves to One-Party State\n\nKING'S LANDING - The Green Trial marked a significant turning point in Westeros's modern civil wars as it saw the Queen Mother, Cersei Lannister, purge the High Sparrows and effectively wipe out House Tyrell. Dramatic events unfolded in the capital, culminating in the Coronation of Queen Cersei after the tragic death of King Tommen Baratheon in the course of the purge.\n\nIn the days following the Green Trial, a nationwide referendum was hastily organized, where citizens of Westeros were asked to approve the seizure of control by Queen Cersei. This referendum saw an overwhelming majority vote in favor of Cersei's rule, with the subsequent unanimous approval by a purged and puppeted legislature. Critics argue that the referendum and parliamentary proceedings were rigged, voicing concerns about the swift and uncompromising nature of these events.\n\nThe same legislative session also approved amendments to the Westerosi constitution, most notably transforming the nation into a one-party state under the control of the Lannister Party. This move further cements the Lannister grip on power, while any remaining opposition will be forced underground, despite the fact that Westeros had been functioning de facto as a one-party state led by the Lannisters for years.\n\nThe new one-party state has adopted the ideology of Lannisterism, which entails the preservation of the Lannister rule, with a focus on stability and economy industry, albeit with a disregard for individual freedoms and political dissent. \n\nOther measures enacted under this new regime include the consolidation of Lannister loyalists in the military and the centralization of political and economic control, all in the name of stability and security. It is said that spies and informants have been deployed across Westeros, ensuring that any form of dissent will be rooted out swiftly and effectively. Cultural measures, such as the promotion of the Lannister lion as a symbol of national pride and unity, have also been introduced to solidify the Lannister's dominance.\n\nInternational reactions have been mixed, with some neighboring nations voicing concerns about the lack of transparency in Cersei's recent power grab, and others remaining silent, perhaps out of fear or a desire to maintain political alliances with the increasingly powerful Lannister regime.\n\nWesteros's future remains uncertain, as the specter of civil wars and violence looms large, and opposition to Cersei's rule simmers beneath the surface. Regime opponents are likely to consist of a fragile alliance of various factions, all with different agendas - from democrats seeking greater political pluralism to regional separatists looking for increased autonomy or independence. The Lannisters, however, stand stronger than ever, with their regime poised to further entrench its power across the realm.\n\nWill the Lannister hegemony bring peace and stability to the war-torn Westeros, or will it only fan the flames of division and conflict? The answer remains to be seen, but it is clear that the world will be watching Westeros closely in the coming months and years, as the machinations of Queen Cersei and her Lannister regime continue to unfold."
      }
    ]
  }
]

サンプル: https://github.com/fpt-corp/ai-studio-samples/tree/main/sample-datasets/sharegpt

サポートされているファイル形式: .json, .jsonlines, .zip, .parquet

b.2 / トレーナー= DPO

ShareGPT_DPOは、ShareGPTから収集した会話（プロンプト＋応答）と、人間がどちらが優れているかに基づいてランク付けした応答ペアで構成されるデータセットです。以下の目的で使用されます：

GPTなどの言語モデルを訓練し、人間の好みに沿った応答を生成させる。
DPO（Direct Preference Optimization）手法を用いた応答品質の最適化。

詳細な構造：

[
   {
      “conversations”: [
         {
            “from”: “string”,
            “value”: “string”
         }
],
      “chosen”: {
         “from”: “string”,
         “value”: “string”
},
      “rejected”: {
         “from”: “string”,
         “value”: “string”
      }
   }
]

例:

[
    {
        "conversations": [
            {
                "from": "human",
                "value": "Q: Based on this review, would the user recommend this product? === Review: Recommended by a friend that I will never again trust, I quit on this book half-way through. It is utterly pointless garbage. What is most offensive about it is that it isn't even well-written. This guy sounds like a frat-boy braggart, and his stories are no more creative than those you might overhear in a bar. Answer:\\nA:"
            }
        ],
        "chosen": {
            "from": "gpt",
            "value": "Based on this review, the user would not recommend this product. The review is overwhelmingly negative, describing the book as \\"utterly pointless garbage\\" and stating that it is not well-written. The user also mentions that they quit reading the book halfway through, which suggests that they did not find it engaging or worthwhile. Overall, this review would likely deter potential buyers from purchasing the product."
        },
        "rejected": {
            "from": "gpt",
            "value": "No, based on this review, the user would not recommend this product."
        }
    },
    {
        "conversations": [
            {
                "from": "human",
                "value": "What Swift code can be used to display a UIAlertController with a text field and a \\"reset\\" button to reset a specific setting in a mobile app?\\nHere's the current implementation:\\nfunc resetSetting() {\\n    let alert = UIAlertController(title: \\"Reset Setting\\", message: \\"Enter the setting you want to reset:\\", preferredStyle: .alert)\\n    alert.addTextField(configurationHandler: { textField in\\n        textField.placeholder = \\"Setting Name\\"\\n    })\\n    alert.addAction(UIAlertAction(title: \\"Cancel\\", style: .cancel, handler: nil))\\n    alert.addAction(UIAlertAction(title: \\"Reset\\", style: .destructive, handler: { action in\\n        // Code to reset setting goes here\\n    }))\\n    present(alert, animated: true, completion: nil)\\n}\\nHow can this be modified to allow for the user to enter a specific setting and then reset it without losing any other customizations or preferences in the app?"
            }
        ],
        "chosen": {
            "from": "gpt",
            "value": "To modify the `resetSetting` function to allow the user to enter a specific setting and then reset it without losing other customizations or preferences in the app, you can use the following code:\\n```swift\\nimport UIKit\\n\\nfunc resetSetting(_ settingIdentifier: String) {\\n    let alert = UIAlertController(title: \\"Reset Setting\\", message: \\"Enter the setting you want to reset:\\", preferredStyle: .alert)\\n    alert.addTextField(configurationHandler: { textField in\\n        textField.placeholder = \\"Setting Name\\"\\n        textField.text = settingIdentifier\\n    })\\n    alert.addAction(UIAlertAction(title: \\"Cancel\\", style: .cancel, handler: nil))\\n    alert.addAction(UIAlertAction(title: \\"Reset\\", style: .destructive, handler: { action in\\n        if let settingName = alert.textFields?.first?.text {\\n            // Code to reset the specific setting goes here, using settingName as the identifier\\n        }\\n    }))\\n    present(alert, animated: true, completion: nil)\\n}\\n```\\nThis modified function now takes a `settingIdentifier` string parameter, which represents the specific setting you want to reset. The text field in the alert controller will now pre-fill with this setting identifier. When the user taps the \\"Reset\\" button, the code inside the `handler` for the `UIAlertAction` checks if the text field contains a setting name and resets the specified setting accordingly.\\n\\nMake sure to replace the comment inside the \\"Reset\\" action's handler with the actual code to reset the specific setting based on the `settingName` variable."
        },
        "rejected": {
            "from": "gpt",
            "value": "To add a text field to a UIAlertController in Swift, you can use the following code:\\n\\n```\\nlet alert = UIAlertController(title: \\"Reset Setting\\", message: \\"Enter the setting you want to reset:\\", preferredStyle:.alert)\\nlet textField = UITextField()\\nalert.addTextField(textField)\\n```\\n\\nTo add a \\"Reset\\" button to the UIAlertController that resets the specified setting, you can use the following code:\\n\\n```\\nlet alert = UIAlertController(title: \\"Reset Setting\\", message: \\"Enter the setting you want to reset:\\", preferredStyle:.alert)\\nlet textField = UITextField()\\nalert.addTextField(textField)\\n\\nlet resetAction = UIAlertAction(title: \\"Reset\\", style:.destructive, handler: { (action) in\\n   // Code to reset the specified setting goes here\\n})\\n\\nalert.addAction(resetAction)\\n```\\n\\nYou can modify the alert message and button text as needed. You can also add additional actions to the alert, such as a \\"Cancel\\" button or an \\"OK\\" button to apply the changes.\\n\\nCan you provide a basic example of the entire modified function?"
        }
    }
]

サンプル: https://github.com/fpt-corp/ai-studio-samples/tree/main/sample-datasets/sharegpt-dpo

サポートされているファイル形式: .json, .jsonlines, .zip, .parquet

c/ ShareGPT_Image

ShareGPT_Imageは、ShareGPTマルチターンチャット形式の拡張版であり、マルチモーダルトレーニング、すなわち会話においてテキストと画像の両方を扱うモデルのトレーニングを特に目的として設計されています。

これは、自然言語と並行して画像を処理する必要があるビジョン言語モデル（VLMs）の微調整に使用されます。

構造は以下の通りです：

以下のチャットターン一覧"message" （ShareGPTと同様）。
会話で使用される画像を指す "image" または"image_path" というフィールド（形式は png、jpg、jpeg を使用）

注意:

画像を表示させる必要があるチャットコンテンツには、必ず image トークンを含める必要があります。
画像が複数ある場合:
- 画像パスは images配列内で定義する必要があります。
- チャットフロー内の画像の位置は、 imageトークンで示されます
- チャット内の image トークンの数は、 images 配列内のアイテム数と一致する必要があります。
- 画像は出現順にマッピングされ、各 imageトークンはimages 配列の対応する画像に置き換えられます。

詳細な構造：

[
   {
      “message”: [
         {
            “role”: “string”, “content”: “<image>string”
         }
      ],
      “images”: [
         “images/0.jpg”
      ]
   }
]

例:

[
    {
        "messages": [
            {
                "role": "user",
                "content": "<image>How many baseball players are visible in the image?"
            },
            {
                "role": "assistant",
                "content": "There are three baseball players visible in the image."
            }
        ],
        "images": [
            "images/0.jpg"
        ]
    },
    {
        "messages": [
            {
                "role": "user",
                "content": "What is the man wearing in the image?<image>"
            },
            {
                "role": "assistant",
                "content": "The man in the image is wearing chaps, a cowboy hat, and is dressed like a cowboy."
            }
        ],
        "images": [
            "images/1.jpg"
        ]
    },
    {
        "messages": [
            {
                "role": "user",
                "content": "<image>Explain the visual content of the image in great detail."
            },
            {
                "role": "assistant",
                "content": "The image presents a view of a hotel room, showing both the bathroom and bedroom areas. In the bathroom, there is a white sink under a mirror, accompanied by several bottles placed on the counter, and two cups nearby. \\n\\nLooking through a doorway into the bedroom area, a large bed can be seen dominating the space. A chair is also visible within the room, situated near the foot of the bed. The hair drier is located in the foreground, implying that it likely belongs to the bathroom area as well. The arrangement of the objects and the furniture suggests a comfortable and well-equipped hotel room."
            }
        ],
        "images": [
            "images/2.jpg"
        ]
    }
]

サンプル: https://github.com/fpt-corp/ai-studio-samples/tree/main/sample-datasets/sharegpt-image

サポートされているファイル形式: .zip, .parquet

d/ Corpus

Corpusとは、言語モデルの学習や微調整に使用されるテキストの集合体である。

コーパスの各データポイントには、テキストの文字列を含む "text" フィールドが含まれます。この形式は、指示と出力を区別する必要がなく、モデルが学習するための生のテキストデータを提供したい場合に一般的に使用されます。

詳細な構造：

[
   {
      “text”: “string”
   }
]

例:

[
    {
        "text": "Film colorization (American English; or colourisation [British English], or colourization [Canadian English and Oxford English]) is any process that adds color to black-and-white, sepia, or other monochrome moving-picture images. It may be done as a special effect, to \\"modernize\\" black-and-white films, or to restore color films. The first examples date from the early 20th century, but colorization has become common with the advent of digital image processing. Early techniques Hand colorization The first film colorization methods were hand done by individuals. For example, at least 4% of George Méliès' output, including some prints of A Trip to the Moon from 1902 and other major films such as The Kingdom of the Fairies, The Impossible Voyage, and The Barber of Seville were individually hand-colored by Elisabeth Thuillier's coloring lab in Paris. Thuillier, a former colorist of glass and celluloid products, directed a studio of two hundred people painting directly on film stock with brushes, in the colors she chose and specified; each worker was assigned a different color in assembly line style, with more than twenty separate colors often used for a single film. Thuillier's lab produced about sixty hand-colored copies of A Trip to the Moon, but only one copy is known to still exist today. The first full-length feature film made by a hand-colored process was The Miracle of 1912. The process was always done by hand, sometimes using a stencil cut from a second print of the film, such as the Pathécolor process. As late as the 1920s, hand coloring processes were used for individual shots in Greed (1924) and The Phantom of the Opera (1925) (both utilizing the Handschiegl color process); and rarely, an entire feature-length movie such as Cyrano de Bergerac (1925) and The Last Days of Pompeii (1926). These colorization methods were employed until effective color film processes were developed. Around 1968-1972, black-and-white Betty Boop, Krazy Kat and Looney Tunes cartoons and among others were redistributed in color. Supervised by Fred Ladd, color was added by tracing the original black-and-white frames onto new animation cels, and then adding color to the new cels in South Korea. To cut time and expense, Ladd's process skipped every other frame, cutting the frame rate in half; this technique considerably degraded the quality and timing of the original animation, to the extent that some animation was not carried over or mistakenly altered. The most recent redrawn colorized black-and-white cartoons"
    },
    {
        "text": "examples where the doubling does affect the meaning in most accents: ten nails versus ten ales this sin versus this inn five valleys versus five alleys his zone versus his own mead day versus me-day unnamed versus unaimed forerunner versus foreigner (only in some varieties of General American) In some dialects gemination is also found for some words when the suffix -ly follows a root ending in -l or -ll, as in: solely but not usually In some varieties of Welsh English, the process takes place indiscriminately between vowels, e.g. in money but it also applies with graphemic duplication (thus, orthographically dictated), e.g. butter French In French, gemination is usually not phonologically relevant and therefore does not allow words to be distinguished: it mostly corresponds to an accent of insistence (\\"c'est terrifiant\\" realised [ˈtɛʁ.ʁi.fjɑ̃]), or meets hyper-correction criteria: one \\"corrects\\" one's pronunciation, despite the usual phonology, to be closer to a realization that one imagines to be more correct: thus, the word illusion is sometimes pronounced [il.lyˈzjɔ̃] by influence of the spelling. However, gemination is distinctive in a few cases. Statements such as She said ~ She said it /ɛl a di/ ~ /ɛl l‿a di/ can commonly be distinguished by gemination. In a more sustained pronunciation, gemination distinguishes the conditional (and possibly the future tense) from the imperfect: courrai (will run) /kuʁ.ʁɛ/ vs. courais (ran) /ku.ʁɛ/, or the indicative from the subjunctive, as in croyons (we believe) /kʁwa.jɔ̃ / vs. croyions (we believed) /kʁwaj.jɔ̃ /. Greek In Ancient Greek, consonant length was distinctive, e.g., 'I am of interest' vs. 'I am going to'. The distinction has been lost in the standard and most other varieties, with the exception of Cypriot (where it might carry over from Ancient Greek or arise from a number of synchronic and diachronic assimilatory processes, or even spontaneously), some varieties of the southeastern Aegean, and Italy. Hindustani Gemination is common in both Hindi and Urdu. It does not occur after long vowels and is found in words of both Indic and Arabic origin, but not in those of Persian origin. In Urdu, gemination is represented by the Shadda diacritic, which is usually omitted from writings, and mainly written to clear ambiguity. In Hindi, gemination is represented by doubling the geminated consonant, enjoined with the Virama diacritic. Aspirated consonants Gemination of aspirated consonants in Hindi are formed by combining the corresponding non-aspirated consonant followed by its"
    }
]

サンプル: https://github.com/fpt-corp/ai-studio-samples/tree/main/sample-datasets/corpus

サポートされているファイル形式: .txt, .json, .jsonlines, .zip, .parquet

Previous選択ボリューム Nextデータセットの選択

Last updated 27 days ago