# Overview

**Data Hub** is the centralized data management module within **AI Studio**.\
It enables users to **store, organize, version, and prepare datasets** used across the AI model lifecycle — including fine-tuning, testing, and benchmarking.

By integrating seamlessly with other AI Studio services such as **Model Fine-tuning** and **Model Testing**, Data Hub ensures that your datasets remain consistent, traceable, and reusable.

### **Key Capabilities**

| Feature                     | Description                                                                            |
| --------------------------- | -------------------------------------------------------------------------------------- |
| **Dataset Management**      | Upload, list, and organize datasets with metadata (name, description and data format). |
| **Secure Storage**          | Provides scalable, encrypted storage for structured and unstructured data.             |
| **Data Access Integration** | Supports direct linkage to fine-tuning and testing jobs without manual file handling.  |
| **Presigned URL Uploads**   | Enables large dataset uploads efficiently through presigned URLs or API endpoints.     |
| **Search & Filtering**      | Quickly find datasets by name or creation date using flexible filters.                 |

### **Supported Data Types**

Data Hub supports a wide range of file types commonly used in machine learning workflows:

* **Data format:** Alpaca, ShareGPT, ShareGPT\_Image, Corpus
* **Structured data:** CSV, JSON, Parquet
* **Text data:** TXT, JSONL
* **Unstructured data (optional):** Images or documents used for multimodal fine-tuning

Each dataset must include a defined **schema or format** compatible with your chosen trainer.

### **Integration Across AI Studio**

Data Hub serves as the **data foundation** for all modules in AI Studio:

| Module                | How It Uses Data Hub                                       |
| --------------------- | ---------------------------------------------------------- |
| **Model Fine-tuning** | Accesses datasets to train or adapt pretrained models.     |
| **Model Testing**     | Retrieves evaluation or benchmark datasets for validation. |

This tight integration ensures complete lineage tracking — from dataset to model to deployed endpoint.

### **Access Methods**

You can interact with Data Hub through multiple interfaces:

1. **AI Studio Console** – Web-based interface for uploading and managing datasets.
2. **AI Studio API** – RESTful API for programmatic dataset operations (upload, list, delete, etc.).

### **Typical Workflow**

1. Upload your dataset to **Data Hub**.
2. Describe it for easy identification.
3. Reference it when creating a fine-tuning or testing job.

### **Benefits**

* Centralized and secure data management
* Automated dataset versioning and lineage tracking
* Faster access for model training and testing
* Reduced duplication across teams and projects
* Simplified compliance and reproducibility

### **Next Steps**

* Learn how to **upload and organize datasets** in Data Hub Tutorial.
* Continue to **Fine-tune a model** using your dataset in the Quickstart Guide


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ai-docs.fptcloud.com/fpt-ai-studio/services/data-hub/overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
