Research Workflows¶

Concrete examples of how research teams use RosettaHub to compose data, compute, and collaboration into governed workflows.

Overview¶

RosettaHub's value for research comes from composability -- the ability to combine cloud compute, data, security, and sharing into workflows that serve specific research needs, without requiring cloud expertise. This page walks through concrete scenarios showing how different research personas use the platform.

Scenario 1: A New PhD Student Joins a Research Group¶

Personas: PhD student (Emily), Principal Investigator (Dr. James), IT Admin (Sarah)

The Problem¶

Emily is starting her PhD in environmental data science. She needs a computational environment with Python, GDAL, satellite imagery libraries, and access to shared datasets. She has never used a cloud platform.

The Workflow¶

Step 1 -- Onboarding (IT Admin)

Sarah, the department's IT admin, has already configured RosettaHub with institutional SSO. She doesn't need to do anything for Emily specifically -- Emily's department is already set up as an organization with budget allocations and approved formations.

Step 2 -- First Login (Emily)

Emily logs in with her university credentials. She sees a self-service portal with formations shared by her research group:

Geospatial Jupyter -- Python + GDAL + rasterio + xarray
R for Ecology -- RStudio with ecology packages
Linux Workstation -- Full desktop with pre-installed tools

Step 3 -- Launch and Work (Emily)

Emily clicks Launch on the Geospatial Jupyter formation. In two minutes she has a running Jupyter environment with:

Her group's shared S3 dataset bucket automatically mounted
A personal EFS storage for her own work
All required Python packages pre-installed

She never sees the AWS Console, doesn't create credentials, and doesn't know which region her instance runs in.

Step 4 -- Save and Share (Emily)

After installing additional packages for her specific research, Emily snapshots her environment. She can share this customized formation with her supervisor, who can review and launch an identical copy.

Step 5 -- Budget Governance (Dr. James)

Dr. James monitors his group's spending through real-time cost tracking. Emily's allocation is $50/month -- when it's exhausted, new launches are blocked automatically. No surprise bills against the NERC grant.

What Made This Possible¶

Capability	How It Helped
SSO integration	Emily logged in with university credentials -- no cloud account setup
Shared formations	Pre-built environments were ready for Emily on day one
Auto-mounted storage	Shared datasets and personal storage were configured in the formation
Budget enforcement	Real-time limits protected the grant from overspend
Snapshots and sharing	Emily's customizations are preserved and shareable

Scenario 2: A PI Sets Up a Multi-Grant Research Group¶

Personas: Principal Investigator (Dr. James), Department Administrator (Rachel)

The Problem¶

Dr. James runs a research group with three active grants, each with different budgets, team members, and compute requirements. He needs to track spending per grant, delegate management, and ensure researchers can't accidentally spend from the wrong grant.

The Workflow¶

Step 1 -- Organization Structure (Rachel)

Rachel, the department administrator, sets up the organization hierarchy in RosettaHub to mirror the group's structure:

Department of Environmental Science ($100,000)
 ├── James Group ($40,000)
 │    ├── NERC Climate Grant ($15,000)
 │    │    ├── Emily -- $50/month
 │    │    └── Tom -- $50/month
 │    ├── UKRI Biodiversity Grant ($20,000)
 │    │    ├── Sarah -- $100/month
 │    │    └── Ahmed -- $100/month
 │    └── Unallocated ($5,000)
 └── Other Groups (...)

Each grant is a project with its own cloud accounts, budget, and membership. Researchers assigned to a project can only spend from that project's budget.

Step 2 -- Formation Templates (Dr. James)

Dr. James creates formation templates tailored to each grant's needs:

Climate modelling -- HPC cluster formation with AWS ParallelCluster, shared climate datasets mounted
Biodiversity analysis -- Jupyter + R formation with species databases, configured for spot instances to maximize the budget
General purpose -- Linux workstation for exploratory work

He shares each formation with the appropriate project organization.

Step 3 -- Delegation (Dr. James)

Dr. James is set as ADMIN on his group's sub-organization. He can:

Transfer budget between grants when one is underspent
Add or remove researchers from projects
View real-time spending across all grants
Launch batch environments for workshops using Launch on Sharees

Step 4 -- Cross-Institutional Collaboration

A collaborator at another university needs to reproduce Dr. James's analysis. Dr. James shares the formation via URL. The collaborator clicks the link, selects their own cloud account, and launches an identical environment -- on their budget, in their region, governed by their organization's policies.

What Made This Possible¶

Capability	How It Helped
Organization hierarchy	Grants map to projects with isolated budgets
Budget delegation	Transfer rights let Dr. James reallocate funds between grants
Project isolation	Researchers can't accidentally spend from the wrong grant
Formation sharing	Templates shared per project, URL sharing for collaborators
Batch operations	Deploy environments for an entire team in one action

Scenario 3: A Research Software Engineer Builds a Domain Portal¶

Personas: Research Software Engineer (Alex), Researchers (various)

The Problem¶

Alex is a research software engineer tasked with creating a computational platform for the department's ecology researchers. The platform needs to offer pre-built analysis environments, shared datasets, and the ability for researchers to customize and share their own workflows.

The Workflow¶

Step 1 -- Build the Foundation (Alex)

Alex uses federated AWS console access for advanced infrastructure setup -- configuring VPCs, setting up shared S3 buckets with curated datasets, and testing IAM configurations. RosettaOps governance ensures Alex stays within the department's sandbox.

Step 2 -- Create the Service Catalog

Alex builds a library of formations covering common research workflows:

Formation	Type	Contents
Species Distribution Modelling	Docker Formation	R + MaxEnt + ENMeval + biodiversity databases
Remote Sensing Pipeline	Cloud Formation	Python + GDAL + Sentinel-2 data mount
Statistical Analysis	Docker Formation	RStudio + tidyverse + ecology packages
Genomics Workflow	Docker Formation	Nextflow + nf-core + reference genomes
Big Data Processing	EMR Cluster	Spark + PySpark + shared data lake

Each formation includes: - Pre-mounted shared datasets (read-only S3 mounts) - Per-user writable storage (personal EFS) - Spot instance configuration for cost optimization - Documentation in the formation description

Step 3 -- Publish to Marketplace

Alex publishes the formations to the department's private marketplace -- a curated catalog accessible to all researchers in the organization. Researchers browse the catalog, find the formation that fits their workflow, clone it, and launch.

Step 4 -- Researchers Self-Serve

Researchers browse the catalog, launch environments with a click, and customize as needed. When they build something useful, they snapshot it and share it back -- growing the catalog organically.

Step 5 -- Cross-Cloud Flexibility

The department's AWS credits are running low, but they have Azure credits from a Microsoft partnership. Alex reconfigures formations to deploy on Azure -- the same formation templates work on both clouds without modification.

What Made This Possible¶

Capability	How It Helped
Federated console access	Alex used native AWS tools for advanced setup
Formation types	Docker, Cloud, EMR formations for different workloads
Marketplace	Private catalog for curated research environments
Cross-cloud storage	Shared datasets mounted from any cloud
Cloud-agnostic formations	Same templates deploy on AWS, Azure, or GCP

Scenario 4: Running a Workshop for 50 Researchers¶

Personas: Workshop Organizer (Dr. Priya), IT Admin (Sarah), Attendees (50 researchers)

The Problem¶

Dr. Priya is running a week-long computational ecology workshop. She needs 50 identical Jupyter environments, each with pre-loaded datasets, accessible to researchers from multiple institutions.

The Workflow¶

Step 1 -- Prepare (Dr. Priya)

Dr. Priya creates a Docker formation with Jupyter, ecology packages, and shared datasets. She tests it, snapshots the final state, and shares it with the workshop organization.

Step 2 -- Onboard Attendees (Sarah)

Sarah registers all 50 attendees via Excel batch upload. Each attendee receives:

A RosettaHub account linked to their institutional email
Automatic assignment to the workshop project
A dedicated cloud account with $25 budget
The workshop formation shared to their dashboard

Step 3 -- Deploy (Dr. Priya)

On workshop day, Dr. Priya uses Launch on Sharees to deploy all 50 environments simultaneously. Each attendee gets their own isolated instance with:

Personal compute (no noisy-neighbor issues)
Shared read-only dataset mount
Personal writable storage
Spot instances at 70% savings

Step 4 -- During the Workshop

Attendees work in their environments. If a spot instance is reclaimed, RosettaHub automatically preserves the student's work and launches a replacement. Dr. Priya monitors the class from her dashboard -- she can see who's running, who's idle, and total spending.

Step 5 -- Cleanup

After the workshop, Dr. Priya uses Delete on Sharees to tear down all environments in one action. Attendees' personal storage is preserved for 30 days for follow-up work.

What Made This Possible¶

Capability	How It Helped
Batch registration	50 users onboarded via Excel upload
Batch launch/stop/delete	Deploy and manage 50 environments from a single menu
Dedicated cloud accounts	Each attendee gets independent quotas
Spot instance recovery	Automatic snapshot and replacement on interruption
Per-student budgets	$25 hard cap per attendee

Scenario 5: Sensitive Data Analysis in a Trusted Research Environment¶

Personas: Data Custodian (NHS Trust), PI (Dr. Chen), Researcher (Maria)

The Problem¶

Dr. Chen's group has been granted access to anonymized NHS patient data for a health outcomes study. The data custodian requires a Trusted Research Environment aligned with the Five Safes framework -- the data cannot leave the secure boundary.

The Workflow¶

Step 1 -- Safe Projects and Safe People

The study is set up as an isolated project with:

Dedicated cloud accounts, network-isolated from other projects
Only approved researchers (Maria, Dr. Chen) are assigned to the project
SSO authentication with the university's SAML 2.0 identity provider
Role-based access: Maria as researcher, Dr. Chen as project manager

Step 2 -- Safe Settings and Safe Data

The TRE is configured with:

Private engine -- compute isolated from shared infrastructure
VPC isolation -- no internet access from research VMs
Encrypted storage -- S3 with managed KMS keys for the NHS dataset
Cloud Custodian policies -- automated compliance enforcement
Approved formations only -- researchers cannot create arbitrary environments

Step 3 -- Researcher Workflow (Maria)

Maria logs in and sees only the approved RStudio formation. She launches it into the secure boundary, where the NHS dataset is pre-mounted as a read-only encrypted volume. She can:

Run analysis scripts in RStudio
Save intermediate results to her project storage
Cannot copy data to her personal machine or email results

Step 4 -- Safe Output

When Maria completes her analysis, the results go through an egress review:

Maria submits outputs for approval via the sharing workflow
Dr. Chen reviews the outputs for disclosure risk
Approved outputs are exported; everything else stays in the boundary

What Made This Possible¶

Capability	How It Helped
Five Safes alignment	Platform maps to the TRE governance framework
Project isolation	Dedicated cloud accounts with network isolation
SSO + RBAC	Institutional identity provider, role-based permissions
Encrypted storage	KMS-managed encryption for sensitive data
Compliance policies	Cloud Custodian enforcement for automated compliance

Common Patterns¶

Across all scenarios, the same platform capabilities compose into different workflows:

Pattern	How RosettaHub Implements It
Data + Compute bundled	Formations combine storage mounts with compute configuration
Self-service with guardrails	Users launch freely within administrator-defined boundaries
Reproduce and share	Snapshot → Share → Clone cycle for any environment
Budget-per-grant	Organization hierarchy maps to funding structure
Scale instantly	Batch operations deploy environments for entire teams
Cross-cloud flexibility	Same formations deploy on any connected cloud
Progressive trust	Start with MetaCloud simplicity, graduate to federated console access

Next Steps¶

For Research Teams -- full overview of research capabilities
Formations -- how to build your own workflows
Trusted Research Environments -- Five Safes framework alignment
Tutorials -- step-by-step walkthroughs