For Data Science Teams — RosettaInsight¶

Managed notebooks, GPU compute, and collaborative ML workflows across clouds.

Overview¶

Data science teams need fast access to powerful compute, reproducible environments, and easy collaboration -- without waiting on infrastructure tickets. RosettaHub provides a multi-cloud Data Science Workbench with GPU support, spot instance hibernation, and shared datasets, all governed by real-time budget controls.

Key Capabilities¶

Data Science Workbench¶

Launch production-ready tools from pre-configured Docker formations:

Tool	Use Case
Jupyter Notebook / JupyterLab	Interactive Python, R, Julia development
RStudio	Statistical analysis and visualization
VS Code	Full IDE with terminal, extensions, and Git
Shiny	Interactive R web applications
Apache Superset	SQL dashboards and data exploration

Each tool runs on dedicated cloud compute -- not shared multi-tenant infrastructure. You control the instance type, region, and cloud provider.

GPU Instances Across Clouds¶

Access NVIDIA GPU instances on AWS, Azure, GCP, Alibaba Cloud, OVH, and OpenStack from a single interface. Common configurations:

Provider	GPU Options	Use Case
AWS	P4d, P5, G5 (A100, H100, A10G)	Deep learning training, inference
Azure	NC, ND series (A100, V100, T4)	Large model training, rendering
GCP	A2, G2 (A100, L4)	ML training, batch inference
Alibaba Cloud	GN6, GN7 (V100, A10)	ML training, GPU-accelerated computing

Formations abstract away provider-specific APIs -- switch between clouds by changing a cloud key, not rewriting infrastructure.

Spot Instance Hibernation¶

Run GPU and CPU workloads on spot/preemptible instances for 60-90% cost savings. RosettaHub adds:

Hibernate -- suspend a spot instance to disk and resume later with full state
Snapshot on Termination -- automatically capture machine state if reclaimed
Fallback to On-Demand -- seamlessly switch pricing model for critical jobs

Cost Impact

A team running four p3.2xlarge GPU instances on spot with hibernation can save over $15,000 per month compared to on-demand pricing.

Reproducible ML Environments¶

Docker formations capture the complete software stack -- OS, CUDA drivers, Python packages, model code -- in a single portable template.

Clone a team-standard formation to guarantee consistent environments
Snapshot a running instance after installing new packages
Share the updated formation with the team instantly
Publish to a private marketplace for organization-wide reuse

Shared Datasets and Model Artifacts¶

Use RosettaHub Storages to manage data across clouds:

Mount AWS S3 buckets, Azure Blob containers, or GCP Cloud Storage on any instance
Cross-cloud data access -- use data stored on one provider in an environment running on another
Fine-grained sharing -- grant read or read-write access to specific team members

Container and Kubernetes Support¶

For production ML pipelines and batch inference:

Docker formations -- single-container deployments with GPU passthrough
Kubernetes clusters -- orchestrate multi-container workloads
EMR / Dataproc clusters -- Spark-based data processing at scale

Next Steps¶

Quick Start -- connect your cloud accounts
Formations -- create your first data science environment
Cloud Keys -- set up credentials for GPU instances
Tutorials -- step-by-step walkthroughs