Overview
Data Hub is the centralized data management module within AI Studio. It enables users to store, organize, version, and prepare datasets used across the AI model lifecycle — including fine-tuning, testing, and benchmarking.
By integrating seamlessly with other AI Studio services such as Model Fine-tuning and Model Testing, Data Hub ensures that your datasets remain consistent, traceable, and reusable.
Key Capabilities
Dataset Management
Upload, list, and organize datasets with metadata (name, description and data format).
Secure Storage
Provides scalable, encrypted storage for structured and unstructured data.
Data Access Integration
Supports direct linkage to fine-tuning and testing jobs without manual file handling.
Presigned URL Uploads
Enables large dataset uploads efficiently through presigned URLs or API endpoints.
Search & Filtering
Quickly find datasets by name or creation date using flexible filters.
Supported Data Types
Data Hub supports a wide range of file types commonly used in machine learning workflows:
Data format: Alpaca, ShareGPT, ShareGPT_Image, Corpus
Structured data: CSV, JSON, Parquet
Text data: TXT, JSONL
Unstructured data (optional): Images or documents used for multimodal fine-tuning
Each dataset must include a defined schema or format compatible with your chosen trainer.
Integration Across AI Studio
Data Hub serves as the data foundation for all modules in AI Studio:
Model Fine-tuning
Accesses datasets to train or adapt pretrained models.
Model Testing
Retrieves evaluation or benchmark datasets for validation.
This tight integration ensures complete lineage tracking — from dataset to model to deployed endpoint.
Access Methods
You can interact with Data Hub through multiple interfaces:
AI Studio Console – Web-based interface for uploading and managing datasets.
AI Studio API – RESTful API for programmatic dataset operations (upload, list, delete, etc.).
Typical Workflow
Upload your dataset to Data Hub.
Describe it for easy identification.
Reference it when creating a fine-tuning or testing job.
Benefits
Centralized and secure data management
Automated dataset versioning and lineage tracking
Faster access for model training and testing
Reduced duplication across teams and projects
Simplified compliance and reproducibility
Next Steps
Learn how to upload and organize datasets in Data Hub Tutorial.
Continue to Fine-tune a model using your dataset in the Quickstart Guide
Last updated
