Research Workflows¶
Concrete examples of how research teams use RosettaHub to compose data, compute, and collaboration into governed workflows.
Overview¶
RosettaHub's value for research comes from composability -- the ability to combine cloud compute, data, security, and sharing into workflows that serve specific research needs, without requiring cloud expertise. This page walks through concrete scenarios showing how different research personas use the platform.
Scenario 1: A New PhD Student Joins a Research Group¶
Personas: PhD student (Emily), Principal Investigator (Dr. James), IT Admin (Sarah)
The Problem¶
Emily is starting her PhD in environmental data science. She needs a computational environment with Python, GDAL, satellite imagery libraries, and access to shared datasets. She has never used a cloud platform.
The Workflow¶
Step 1 -- Onboarding (IT Admin)
Sarah, the department's IT admin, has already configured RosettaHub with institutional SSO. She doesn't need to do anything for Emily specifically -- Emily's department is already set up as an organization with budget allocations and approved formations.
Step 2 -- First Login (Emily)
Emily logs in with her university credentials. She sees a self-service portal with formations shared by her research group:
- Geospatial Jupyter -- Python + GDAL + rasterio + xarray
- R for Ecology -- RStudio with ecology packages
- Linux Workstation -- Full desktop with pre-installed tools
Step 3 -- Launch and Work (Emily)
Emily clicks Launch on the Geospatial Jupyter formation. In two minutes she has a running Jupyter environment with:
- Her group's shared S3 dataset bucket automatically mounted
- A personal EFS storage for her own work
- All required Python packages pre-installed
She never sees the AWS Console, doesn't create credentials, and doesn't know which region her instance runs in.
Step 4 -- Save and Share (Emily)
After installing additional packages for her specific research, Emily snapshots her environment. She can share this customized formation with her supervisor, who can review and launch an identical copy.
Step 5 -- Budget Governance (Dr. James)
Dr. James monitors his group's spending through real-time cost tracking. Emily's allocation is $50/month -- when it's exhausted, new launches are blocked automatically. No surprise bills against the NERC grant.
What Made This Possible¶
| Capability | How It Helped |
|---|---|
| SSO integration | Emily logged in with university credentials -- no cloud account setup |
| Shared formations | Pre-built environments were ready for Emily on day one |
| Auto-mounted storage | Shared datasets and personal storage were configured in the formation |
| Budget enforcement | Real-time limits protected the grant from overspend |
| Snapshots and sharing | Emily's customizations are preserved and shareable |
Scenario 2: A PI Sets Up a Multi-Grant Research Group¶
Personas: Principal Investigator (Dr. James), Department Administrator (Rachel)
The Problem¶
Dr. James runs a research group with three active grants, each with different budgets, team members, and compute requirements. He needs to track spending per grant, delegate management, and ensure researchers can't accidentally spend from the wrong grant.
The Workflow¶
Step 1 -- Organization Structure (Rachel)
Rachel, the department administrator, sets up the organization hierarchy in RosettaHub to mirror the group's structure:
Department of Environmental Science ($100,000)
├── James Group ($40,000)
│ ├── NERC Climate Grant ($15,000)
│ │ ├── Emily -- $50/month
│ │ └── Tom -- $50/month
│ ├── UKRI Biodiversity Grant ($20,000)
│ │ ├── Sarah -- $100/month
│ │ └── Ahmed -- $100/month
│ └── Unallocated ($5,000)
└── Other Groups (...)
Each grant is a project with its own cloud accounts, budget, and membership. Researchers assigned to a project can only spend from that project's budget.
Step 2 -- Formation Templates (Dr. James)
Dr. James creates formation templates tailored to each grant's needs:
- Climate modelling -- HPC cluster formation with AWS ParallelCluster, shared climate datasets mounted
- Biodiversity analysis -- Jupyter + R formation with species databases, configured for spot instances to maximize the budget
- General purpose -- Linux workstation for exploratory work
He shares each formation with the appropriate project organization.
Step 3 -- Delegation (Dr. James)
Dr. James is set as ADMIN on his group's sub-organization. He can:
- Transfer budget between grants when one is underspent
- Add or remove researchers from projects
- View real-time spending across all grants
- Launch batch environments for workshops using Launch on Sharees
Step 4 -- Cross-Institutional Collaboration
A collaborator at another university needs to reproduce Dr. James's analysis. Dr. James shares the formation via URL. The collaborator clicks the link, selects their own cloud account, and launches an identical environment -- on their budget, in their region, governed by their organization's policies.
What Made This Possible¶
| Capability | How It Helped |
|---|---|
| Organization hierarchy | Grants map to projects with isolated budgets |
| Budget delegation | Transfer rights let Dr. James reallocate funds between grants |
| Project isolation | Researchers can't accidentally spend from the wrong grant |
| Formation sharing | Templates shared per project, URL sharing for collaborators |
| Batch operations | Deploy environments for an entire team in one action |
Scenario 3: A Research Software Engineer Builds a Domain Portal¶
Personas: Research Software Engineer (Alex), Researchers (various)
The Problem¶
Alex is a research software engineer tasked with creating a computational platform for the department's ecology researchers. The platform needs to offer pre-built analysis environments, shared datasets, and the ability for researchers to customize and share their own workflows.
The Workflow¶
Step 1 -- Build the Foundation (Alex)
Alex uses federated AWS console access for advanced infrastructure setup -- configuring VPCs, setting up shared S3 buckets with curated datasets, and testing IAM configurations. RosettaOps governance ensures Alex stays within the department's sandbox.
Step 2 -- Create the Service Catalog
Alex builds a library of formations covering common research workflows:
| Formation | Type | Contents |
|---|---|---|
| Species Distribution Modelling | Docker Formation | R + MaxEnt + ENMeval + biodiversity databases |
| Remote Sensing Pipeline | Cloud Formation | Python + GDAL + Sentinel-2 data mount |
| Statistical Analysis | Docker Formation | RStudio + tidyverse + ecology packages |
| Genomics Workflow | Docker Formation | Nextflow + nf-core + reference genomes |
| Big Data Processing | EMR Cluster | Spark + PySpark + shared data lake |
Each formation includes: - Pre-mounted shared datasets (read-only S3 mounts) - Per-user writable storage (personal EFS) - Spot instance configuration for cost optimization - Documentation in the formation description
Step 3 -- Publish to Marketplace
Alex publishes the formations to the department's private marketplace -- a curated catalog accessible to all researchers in the organization. Researchers browse the catalog, find the formation that fits their workflow, clone it, and launch.
Step 4 -- Researchers Self-Serve
Researchers browse the catalog, launch environments with a click, and customize as needed. When they build something useful, they snapshot it and share it back -- growing the catalog organically.
Step 5 -- Cross-Cloud Flexibility
The department's AWS credits are running low, but they have Azure credits from a Microsoft partnership. Alex reconfigures formations to deploy on Azure -- the same formation templates work on both clouds without modification.
What Made This Possible¶
| Capability | How It Helped |
|---|---|
| Federated console access | Alex used native AWS tools for advanced setup |
| Formation types | Docker, Cloud, EMR formations for different workloads |
| Marketplace | Private catalog for curated research environments |
| Cross-cloud storage | Shared datasets mounted from any cloud |
| Cloud-agnostic formations | Same templates deploy on AWS, Azure, or GCP |
Scenario 4: Running a Workshop for 50 Researchers¶
Personas: Workshop Organizer (Dr. Priya), IT Admin (Sarah), Attendees (50 researchers)
The Problem¶
Dr. Priya is running a week-long computational ecology workshop. She needs 50 identical Jupyter environments, each with pre-loaded datasets, accessible to researchers from multiple institutions.
The Workflow¶
Step 1 -- Prepare (Dr. Priya)
Dr. Priya creates a Docker formation with Jupyter, ecology packages, and shared datasets. She tests it, snapshots the final state, and shares it with the workshop organization.
Step 2 -- Onboard Attendees (Sarah)
Sarah registers all 50 attendees via Excel batch upload. Each attendee receives:
- A RosettaHub account linked to their institutional email
- Automatic assignment to the workshop project
- A dedicated cloud account with $25 budget
- The workshop formation shared to their dashboard
Step 3 -- Deploy (Dr. Priya)
On workshop day, Dr. Priya uses Launch on Sharees to deploy all 50 environments simultaneously. Each attendee gets their own isolated instance with:
- Personal compute (no noisy-neighbor issues)
- Shared read-only dataset mount
- Personal writable storage
- Spot instances at 70% savings
Step 4 -- During the Workshop
Attendees work in their environments. If a spot instance is reclaimed, RosettaHub automatically preserves the student's work and launches a replacement. Dr. Priya monitors the class from her dashboard -- she can see who's running, who's idle, and total spending.
Step 5 -- Cleanup
After the workshop, Dr. Priya uses Delete on Sharees to tear down all environments in one action. Attendees' personal storage is preserved for 30 days for follow-up work.
What Made This Possible¶
| Capability | How It Helped |
|---|---|
| Batch registration | 50 users onboarded via Excel upload |
| Batch launch/stop/delete | Deploy and manage 50 environments from a single menu |
| Dedicated cloud accounts | Each attendee gets independent quotas |
| Spot instance recovery | Automatic snapshot and replacement on interruption |
| Per-student budgets | $25 hard cap per attendee |
Scenario 5: Sensitive Data Analysis in a Trusted Research Environment¶
Personas: Data Custodian (NHS Trust), PI (Dr. Chen), Researcher (Maria)
The Problem¶
Dr. Chen's group has been granted access to anonymized NHS patient data for a health outcomes study. The data custodian requires a Trusted Research Environment aligned with the Five Safes framework -- the data cannot leave the secure boundary.
The Workflow¶
Step 1 -- Safe Projects and Safe People
The study is set up as an isolated project with:
- Dedicated cloud accounts, network-isolated from other projects
- Only approved researchers (Maria, Dr. Chen) are assigned to the project
- SSO authentication with the university's SAML 2.0 identity provider
- Role-based access: Maria as researcher, Dr. Chen as project manager
Step 2 -- Safe Settings and Safe Data
The TRE is configured with:
- Private engine -- compute isolated from shared infrastructure
- VPC isolation -- no internet access from research VMs
- Encrypted storage -- S3 with managed KMS keys for the NHS dataset
- Cloud Custodian policies -- automated compliance enforcement
- Approved formations only -- researchers cannot create arbitrary environments
Step 3 -- Researcher Workflow (Maria)
Maria logs in and sees only the approved RStudio formation. She launches it into the secure boundary, where the NHS dataset is pre-mounted as a read-only encrypted volume. She can:
- Run analysis scripts in RStudio
- Save intermediate results to her project storage
- Cannot copy data to her personal machine or email results
Step 4 -- Safe Output
When Maria completes her analysis, the results go through an egress review:
- Maria submits outputs for approval via the sharing workflow
- Dr. Chen reviews the outputs for disclosure risk
- Approved outputs are exported; everything else stays in the boundary
What Made This Possible¶
| Capability | How It Helped |
|---|---|
| Five Safes alignment | Platform maps to the TRE governance framework |
| Project isolation | Dedicated cloud accounts with network isolation |
| SSO + RBAC | Institutional identity provider, role-based permissions |
| Encrypted storage | KMS-managed encryption for sensitive data |
| Compliance policies | Cloud Custodian enforcement for automated compliance |
Common Patterns¶
Across all scenarios, the same platform capabilities compose into different workflows:
| Pattern | How RosettaHub Implements It |
|---|---|
| Data + Compute bundled | Formations combine storage mounts with compute configuration |
| Self-service with guardrails | Users launch freely within administrator-defined boundaries |
| Reproduce and share | Snapshot → Share → Clone cycle for any environment |
| Budget-per-grant | Organization hierarchy maps to funding structure |
| Scale instantly | Batch operations deploy environments for entire teams |
| Cross-cloud flexibility | Same formations deploy on any connected cloud |
| Progressive trust | Start with MetaCloud simplicity, graduate to federated console access |
Next Steps¶
- For Research Teams -- full overview of research capabilities
- Formations -- how to build your own workflows
- Trusted Research Environments -- Five Safes framework alignment
- Tutorials -- step-by-step walkthroughs