Skip to content

Research Workflows

Concrete examples of how research teams use RosettaHub to compose data, compute, and collaboration into governed workflows.

Overview

RosettaHub's value for research comes from composability -- the ability to combine cloud compute, data, security, and sharing into workflows that serve specific research needs, without requiring cloud expertise. This page walks through concrete scenarios showing how different research personas use the platform.


Scenario 1: A New PhD Student Joins a Research Group

Personas: PhD student (Emily), Principal Investigator (Dr. James), IT Admin (Sarah)

The Problem

Emily is starting her PhD in environmental data science. She needs a computational environment with Python, GDAL, satellite imagery libraries, and access to shared datasets. She has never used a cloud platform.

The Workflow

Step 1 -- Onboarding (IT Admin)

Sarah, the department's IT admin, has already configured RosettaHub with institutional SSO. She doesn't need to do anything for Emily specifically -- Emily's department is already set up as an organization with budget allocations and approved formations.

Step 2 -- First Login (Emily)

Emily logs in with her university credentials. She sees a self-service portal with formations shared by her research group:

  • Geospatial Jupyter -- Python + GDAL + rasterio + xarray
  • R for Ecology -- RStudio with ecology packages
  • Linux Workstation -- Full desktop with pre-installed tools

Step 3 -- Launch and Work (Emily)

Emily clicks Launch on the Geospatial Jupyter formation. In two minutes she has a running Jupyter environment with:

  • Her group's shared S3 dataset bucket automatically mounted
  • A personal EFS storage for her own work
  • All required Python packages pre-installed

She never sees the AWS Console, doesn't create credentials, and doesn't know which region her instance runs in.

Step 4 -- Save and Share (Emily)

After installing additional packages for her specific research, Emily snapshots her environment. She can share this customized formation with her supervisor, who can review and launch an identical copy.

Step 5 -- Budget Governance (Dr. James)

Dr. James monitors his group's spending through real-time cost tracking. Emily's allocation is $50/month -- when it's exhausted, new launches are blocked automatically. No surprise bills against the NERC grant.

What Made This Possible

Capability How It Helped
SSO integration Emily logged in with university credentials -- no cloud account setup
Shared formations Pre-built environments were ready for Emily on day one
Auto-mounted storage Shared datasets and personal storage were configured in the formation
Budget enforcement Real-time limits protected the grant from overspend
Snapshots and sharing Emily's customizations are preserved and shareable

Scenario 2: A PI Sets Up a Multi-Grant Research Group

Personas: Principal Investigator (Dr. James), Department Administrator (Rachel)

The Problem

Dr. James runs a research group with three active grants, each with different budgets, team members, and compute requirements. He needs to track spending per grant, delegate management, and ensure researchers can't accidentally spend from the wrong grant.

The Workflow

Step 1 -- Organization Structure (Rachel)

Rachel, the department administrator, sets up the organization hierarchy in RosettaHub to mirror the group's structure:

Department of Environmental Science ($100,000)
 ├── James Group ($40,000)
 │    ├── NERC Climate Grant ($15,000)
 │    │    ├── Emily -- $50/month
 │    │    └── Tom -- $50/month
 │    ├── UKRI Biodiversity Grant ($20,000)
 │    │    ├── Sarah -- $100/month
 │    │    └── Ahmed -- $100/month
 │    └── Unallocated ($5,000)
 └── Other Groups (...)

Each grant is a project with its own cloud accounts, budget, and membership. Researchers assigned to a project can only spend from that project's budget.

Step 2 -- Formation Templates (Dr. James)

Dr. James creates formation templates tailored to each grant's needs:

  • Climate modelling -- HPC cluster formation with AWS ParallelCluster, shared climate datasets mounted
  • Biodiversity analysis -- Jupyter + R formation with species databases, configured for spot instances to maximize the budget
  • General purpose -- Linux workstation for exploratory work

He shares each formation with the appropriate project organization.

Step 3 -- Delegation (Dr. James)

Dr. James is set as ADMIN on his group's sub-organization. He can:

  • Transfer budget between grants when one is underspent
  • Add or remove researchers from projects
  • View real-time spending across all grants
  • Launch batch environments for workshops using Launch on Sharees

Step 4 -- Cross-Institutional Collaboration

A collaborator at another university needs to reproduce Dr. James's analysis. Dr. James shares the formation via URL. The collaborator clicks the link, selects their own cloud account, and launches an identical environment -- on their budget, in their region, governed by their organization's policies.

What Made This Possible

Capability How It Helped
Organization hierarchy Grants map to projects with isolated budgets
Budget delegation Transfer rights let Dr. James reallocate funds between grants
Project isolation Researchers can't accidentally spend from the wrong grant
Formation sharing Templates shared per project, URL sharing for collaborators
Batch operations Deploy environments for an entire team in one action

Scenario 3: A Research Software Engineer Builds a Domain Portal

Personas: Research Software Engineer (Alex), Researchers (various)

The Problem

Alex is a research software engineer tasked with creating a computational platform for the department's ecology researchers. The platform needs to offer pre-built analysis environments, shared datasets, and the ability for researchers to customize and share their own workflows.

The Workflow

Step 1 -- Build the Foundation (Alex)

Alex uses federated AWS console access for advanced infrastructure setup -- configuring VPCs, setting up shared S3 buckets with curated datasets, and testing IAM configurations. RosettaOps governance ensures Alex stays within the department's sandbox.

Step 2 -- Create the Service Catalog

Alex builds a library of formations covering common research workflows:

Formation Type Contents
Species Distribution Modelling Docker Formation R + MaxEnt + ENMeval + biodiversity databases
Remote Sensing Pipeline Cloud Formation Python + GDAL + Sentinel-2 data mount
Statistical Analysis Docker Formation RStudio + tidyverse + ecology packages
Genomics Workflow Docker Formation Nextflow + nf-core + reference genomes
Big Data Processing EMR Cluster Spark + PySpark + shared data lake

Each formation includes: - Pre-mounted shared datasets (read-only S3 mounts) - Per-user writable storage (personal EFS) - Spot instance configuration for cost optimization - Documentation in the formation description

Step 3 -- Publish to Marketplace

Alex publishes the formations to the department's private marketplace -- a curated catalog accessible to all researchers in the organization. Researchers browse the catalog, find the formation that fits their workflow, clone it, and launch.

Step 4 -- Researchers Self-Serve

Researchers browse the catalog, launch environments with a click, and customize as needed. When they build something useful, they snapshot it and share it back -- growing the catalog organically.

Step 5 -- Cross-Cloud Flexibility

The department's AWS credits are running low, but they have Azure credits from a Microsoft partnership. Alex reconfigures formations to deploy on Azure -- the same formation templates work on both clouds without modification.

What Made This Possible

Capability How It Helped
Federated console access Alex used native AWS tools for advanced setup
Formation types Docker, Cloud, EMR formations for different workloads
Marketplace Private catalog for curated research environments
Cross-cloud storage Shared datasets mounted from any cloud
Cloud-agnostic formations Same templates deploy on AWS, Azure, or GCP

Scenario 4: Running a Workshop for 50 Researchers

Personas: Workshop Organizer (Dr. Priya), IT Admin (Sarah), Attendees (50 researchers)

The Problem

Dr. Priya is running a week-long computational ecology workshop. She needs 50 identical Jupyter environments, each with pre-loaded datasets, accessible to researchers from multiple institutions.

The Workflow

Step 1 -- Prepare (Dr. Priya)

Dr. Priya creates a Docker formation with Jupyter, ecology packages, and shared datasets. She tests it, snapshots the final state, and shares it with the workshop organization.

Step 2 -- Onboard Attendees (Sarah)

Sarah registers all 50 attendees via Excel batch upload. Each attendee receives:

  • A RosettaHub account linked to their institutional email
  • Automatic assignment to the workshop project
  • A dedicated cloud account with $25 budget
  • The workshop formation shared to their dashboard

Step 3 -- Deploy (Dr. Priya)

On workshop day, Dr. Priya uses Launch on Sharees to deploy all 50 environments simultaneously. Each attendee gets their own isolated instance with:

  • Personal compute (no noisy-neighbor issues)
  • Shared read-only dataset mount
  • Personal writable storage
  • Spot instances at 70% savings

Step 4 -- During the Workshop

Attendees work in their environments. If a spot instance is reclaimed, RosettaHub automatically preserves the student's work and launches a replacement. Dr. Priya monitors the class from her dashboard -- she can see who's running, who's idle, and total spending.

Step 5 -- Cleanup

After the workshop, Dr. Priya uses Delete on Sharees to tear down all environments in one action. Attendees' personal storage is preserved for 30 days for follow-up work.

What Made This Possible

Capability How It Helped
Batch registration 50 users onboarded via Excel upload
Batch launch/stop/delete Deploy and manage 50 environments from a single menu
Dedicated cloud accounts Each attendee gets independent quotas
Spot instance recovery Automatic snapshot and replacement on interruption
Per-student budgets $25 hard cap per attendee

Scenario 5: Sensitive Data Analysis in a Trusted Research Environment

Personas: Data Custodian (NHS Trust), PI (Dr. Chen), Researcher (Maria)

The Problem

Dr. Chen's group has been granted access to anonymized NHS patient data for a health outcomes study. The data custodian requires a Trusted Research Environment aligned with the Five Safes framework -- the data cannot leave the secure boundary.

The Workflow

Step 1 -- Safe Projects and Safe People

The study is set up as an isolated project with:

  • Dedicated cloud accounts, network-isolated from other projects
  • Only approved researchers (Maria, Dr. Chen) are assigned to the project
  • SSO authentication with the university's SAML 2.0 identity provider
  • Role-based access: Maria as researcher, Dr. Chen as project manager

Step 2 -- Safe Settings and Safe Data

The TRE is configured with:

  • Private engine -- compute isolated from shared infrastructure
  • VPC isolation -- no internet access from research VMs
  • Encrypted storage -- S3 with managed KMS keys for the NHS dataset
  • Cloud Custodian policies -- automated compliance enforcement
  • Approved formations only -- researchers cannot create arbitrary environments

Step 3 -- Researcher Workflow (Maria)

Maria logs in and sees only the approved RStudio formation. She launches it into the secure boundary, where the NHS dataset is pre-mounted as a read-only encrypted volume. She can:

  • Run analysis scripts in RStudio
  • Save intermediate results to her project storage
  • Cannot copy data to her personal machine or email results

Step 4 -- Safe Output

When Maria completes her analysis, the results go through an egress review:

  • Maria submits outputs for approval via the sharing workflow
  • Dr. Chen reviews the outputs for disclosure risk
  • Approved outputs are exported; everything else stays in the boundary

What Made This Possible

Capability How It Helped
Five Safes alignment Platform maps to the TRE governance framework
Project isolation Dedicated cloud accounts with network isolation
SSO + RBAC Institutional identity provider, role-based permissions
Encrypted storage KMS-managed encryption for sensitive data
Compliance policies Cloud Custodian enforcement for automated compliance

Common Patterns

Across all scenarios, the same platform capabilities compose into different workflows:

Pattern How RosettaHub Implements It
Data + Compute bundled Formations combine storage mounts with compute configuration
Self-service with guardrails Users launch freely within administrator-defined boundaries
Reproduce and share Snapshot → Share → Clone cycle for any environment
Budget-per-grant Organization hierarchy maps to funding structure
Scale instantly Batch operations deploy environments for entire teams
Cross-cloud flexibility Same formations deploy on any connected cloud
Progressive trust Start with MetaCloud simplicity, graduate to federated console access

Next Steps