For Data Science Teams — RosettaInsight¶
Managed notebooks, GPU compute, and collaborative ML workflows across clouds.
Overview¶
Data science teams need fast access to powerful compute, reproducible environments, and easy collaboration -- without waiting on infrastructure tickets. RosettaHub provides a multi-cloud Data Science Workbench with GPU support, spot instance hibernation, and shared datasets, all governed by real-time budget controls.
Key Capabilities¶
Data Science Workbench¶
Launch production-ready tools from pre-configured Docker formations:
| Tool | Use Case |
|---|---|
| Jupyter Notebook / JupyterLab | Interactive Python, R, Julia development |
| RStudio | Statistical analysis and visualization |
| VS Code | Full IDE with terminal, extensions, and Git |
| Shiny | Interactive R web applications |
| Apache Superset | SQL dashboards and data exploration |
Each tool runs on dedicated cloud compute -- not shared multi-tenant infrastructure. You control the instance type, region, and cloud provider.
GPU Instances Across Clouds¶
Access NVIDIA GPU instances on AWS, Azure, GCP, Alibaba Cloud, OVH, and OpenStack from a single interface. Common configurations:
| Provider | GPU Options | Use Case |
|---|---|---|
| AWS | P4d, P5, G5 (A100, H100, A10G) | Deep learning training, inference |
| Azure | NC, ND series (A100, V100, T4) | Large model training, rendering |
| GCP | A2, G2 (A100, L4) | ML training, batch inference |
| Alibaba Cloud | GN6, GN7 (V100, A10) | ML training, GPU-accelerated computing |
Formations abstract away provider-specific APIs -- switch between clouds by changing a cloud key, not rewriting infrastructure.
Spot Instance Hibernation¶
Run GPU and CPU workloads on spot/preemptible instances for 60-90% cost savings. RosettaHub adds:
- Hibernate -- suspend a spot instance to disk and resume later with full state
- Snapshot on Termination -- automatically capture machine state if reclaimed
- Fallback to On-Demand -- seamlessly switch pricing model for critical jobs
Cost Impact
A team running four p3.2xlarge GPU instances on spot with hibernation can save over $15,000 per month compared to on-demand pricing.
Reproducible ML Environments¶
Docker formations capture the complete software stack -- OS, CUDA drivers, Python packages, model code -- in a single portable template.
- Clone a team-standard formation to guarantee consistent environments
- Snapshot a running instance after installing new packages
- Share the updated formation with the team instantly
- Publish to a private marketplace for organization-wide reuse
Shared Datasets and Model Artifacts¶
Use RosettaHub Storages to manage data across clouds:
- Mount AWS S3 buckets, Azure Blob containers, or GCP Cloud Storage on any instance
- Cross-cloud data access -- use data stored on one provider in an environment running on another
- Fine-grained sharing -- grant read or read-write access to specific team members
Container and Kubernetes Support¶
For production ML pipelines and batch inference:
- Docker formations -- single-container deployments with GPU passthrough
- Kubernetes clusters -- orchestrate multi-container workloads
- EMR / Dataproc clusters -- Spark-based data processing at scale
Next Steps¶
- Quick Start -- connect your cloud accounts
- Formations -- create your first data science environment
- Cloud Keys -- set up credentials for GPU instances
- Tutorials -- step-by-step walkthroughs