Skip to content

For Data Science Teams — RosettaInsight

Managed notebooks, GPU compute, and collaborative ML workflows across clouds.

Overview

Data science teams need fast access to powerful compute, reproducible environments, and easy collaboration -- without waiting on infrastructure tickets. RosettaHub provides a multi-cloud Data Science Workbench with GPU support, spot instance hibernation, and shared datasets, all governed by real-time budget controls.

Key Capabilities

Data Science Workbench

Launch production-ready tools from pre-configured Docker formations:

Tool Use Case
Jupyter Notebook / JupyterLab Interactive Python, R, Julia development
RStudio Statistical analysis and visualization
VS Code Full IDE with terminal, extensions, and Git
Shiny Interactive R web applications
Apache Superset SQL dashboards and data exploration

Each tool runs on dedicated cloud compute -- not shared multi-tenant infrastructure. You control the instance type, region, and cloud provider.

GPU Instances Across Clouds

Access NVIDIA GPU instances on AWS, Azure, GCP, Alibaba Cloud, OVH, and OpenStack from a single interface. Common configurations:

Provider GPU Options Use Case
AWS P4d, P5, G5 (A100, H100, A10G) Deep learning training, inference
Azure NC, ND series (A100, V100, T4) Large model training, rendering
GCP A2, G2 (A100, L4) ML training, batch inference
Alibaba Cloud GN6, GN7 (V100, A10) ML training, GPU-accelerated computing

Formations abstract away provider-specific APIs -- switch between clouds by changing a cloud key, not rewriting infrastructure.

Spot Instance Hibernation

Run GPU and CPU workloads on spot/preemptible instances for 60-90% cost savings. RosettaHub adds:

  • Hibernate -- suspend a spot instance to disk and resume later with full state
  • Snapshot on Termination -- automatically capture machine state if reclaimed
  • Fallback to On-Demand -- seamlessly switch pricing model for critical jobs

Cost Impact

A team running four p3.2xlarge GPU instances on spot with hibernation can save over $15,000 per month compared to on-demand pricing.

Reproducible ML Environments

Docker formations capture the complete software stack -- OS, CUDA drivers, Python packages, model code -- in a single portable template.

  • Clone a team-standard formation to guarantee consistent environments
  • Snapshot a running instance after installing new packages
  • Share the updated formation with the team instantly
  • Publish to a private marketplace for organization-wide reuse

Shared Datasets and Model Artifacts

Use RosettaHub Storages to manage data across clouds:

  • Mount AWS S3 buckets, Azure Blob containers, or GCP Cloud Storage on any instance
  • Cross-cloud data access -- use data stored on one provider in an environment running on another
  • Fine-grained sharing -- grant read or read-write access to specific team members

Container and Kubernetes Support

For production ML pipelines and batch inference:

  • Docker formations -- single-container deployments with GPU passthrough
  • Kubernetes clusters -- orchestrate multi-container workloads
  • EMR / Dataproc clusters -- Spark-based data processing at scale

Next Steps