Trusted Research Environments¶

Secure computing for sensitive research data, aligned to the Five Safes framework.

What Is a TRE?¶

A Trusted Research Environment (TRE) is a secure computing environment designed to allow researchers to analyse sensitive data -- such as patient health records, genomic data, or confidential government datasets -- without the data ever leaving a controlled boundary.

TREs are increasingly required by funding bodies, data custodians, and regulatory frameworks (GDPR, NHS Digital, UKRI) as a condition of data access.

The Five Safes Framework¶

RosettaHub maps its platform capabilities to the Five Safes -- the widely adopted model for governing access to sensitive data:

Safe	Principle	RosettaHub Capability
Safe People	Only authorized users access the environment	IAM integration, SSO (SAML 2.0, LDAP, OAuth), role-based access control
Safe Projects	Work is governed by approved project scope	Project isolation with dedicated accounts and budgets
Safe Settings	Compute runs in a secure, controlled environment	Private engines, VPC-isolated formations, encrypted networking
Safe Data	Data is protected at rest and in transit	Full encryption key lifecycle, encrypted storages, access-controlled mounts
Safe Output	Results are reviewed before leaving the environment	Egress controls, output review workflows

RosettaHub VRE Capabilities¶

RosettaHub's Virtual Research Environment (VRE) provides the compute and collaboration layer within a TRE architecture:

Formations -- reproducible, auditable environment templates
Private Engines -- dedicated compute isolated from shared infrastructure
Workbenches -- Jupyter, RStudio, VS Code running inside the secure boundary
Docker Formations -- containerized analysis pipelines with pinned dependencies
Kubernetes Clusters -- orchestrated multi-container workloads for large-scale analysis
Encrypted Storages -- object, file, and block storage with managed encryption keys

Formations as TRE Workspace Delivery¶

RosettaHub Formations are a natural mechanism for TRE-centric workspace delivery. Formations deliver a variety of infrastructure patterns, allowing the workspace to control a container, a machine, a cluster, or any other infrastructure built using CloudFormation, Terraform, CDK, or Pulumi.

This means a TRE workspace is not limited to a single VM -- it can be a fully orchestrated environment with multiple components, all defined as code and deployed consistently.

Machine Agents¶

RosettaHub machine agents provide fine-grained access management within workspaces and advanced container orchestration for delivering complex research environments. Agents enable:

Session management -- manage researcher sessions centrally across all TRE workspaces
Collaboration controls -- allow or restrict collaboration between researchers within the secure boundary
Container orchestration -- deliver multi-container workspaces with precise resource and access controls

Architecture¶

A RosettaHub-based TRE is organized into three zones:

┌─────────────────────────────────────────────────┐
│                  Management Zone                 │
│  Organization admin, user onboarding, budgets,   │
│  compliance policies, audit logs                 │
├─────────────────────────────────────────────────┤
│                   Portal Zone (VRE)              │
│  Formations, workbenches, containers,            │
│  Kubernetes, encrypted storage                   │
├─────────────────────────────────────────────────┤
│                  Airlock Zone (Egress)           │
│  Output review, data classification,             │
│  approved export workflows                       │
└─────────────────────────────────────────────────┘

Management Zone -- administrators manage users, cloud accounts, budgets, and compliance via Cloud Operations
Portal Zone -- researchers work within governed compute environments using the MetaCloud
Airlock Zone -- results pass through review and classification before leaving the environment

RosettaHub-Operated Airlock¶

The Airlock Zone implements controlled egress using RosettaHub's platform components. The architecture is modelled on the DRTC (Data-Return Transfer Controller) pattern:

Gitea-based output review -- researchers submit results via pull requests; data custodians review and approve before data leaves the boundary
Amazon Macie integration -- automated classification scans output for sensitive data (PII, PHI, credentials) before release
Audit trail -- every export request, review decision, and data transfer is logged

Airlock Roadmap

Gitea airlock workflows -- March 2026
Amazon Macie data detection -- May 2026

The TRE Trilogy¶

RosettaHub's TRE approach combines three code-driven pillars:

Pillar	What It Delivers
Compliance-as-Code	207 Cloud Custodian policies enforcing HIPAA (564 controls), ISO 27001 (138 controls), CIS, and NIST continuously
Infrastructure-as-Code	Formations define reproducible, auditable environments using CF, TF, CDK, or Pulumi
Frontend-as-Code	RosettaHub dashboard perspectives, views, and marketplace are configurable per institution

Together, these ensure that security, infrastructure, and user experience are all version-controlled, auditable, and repeatable.

Compliance and Data Protection¶

Capability	Description
Cloud Custodian	Automated policy enforcement across all cloud accounts
GDPR Alignment	Anonymous cloud accounts decouple researcher identity from cloud-level billing
Encryption Key Lifecycle	Full key creation, rotation, and revocation managed through the platform
Audit Logging	All user actions, launches, and data access events are recorded
Budget Governance	Real-time enforcement prevents uncontrolled resource creation

Platform Strengths for TRE¶

Why RosettaHub

Mature platform -- 7+ years in production with research institutions
User onboarding in seconds -- SSO integration, no manual account setup
Real-time cost governance -- event-driven budget enforcement, not billing-lag
Flexible hierarchy -- organizations, sub-organizations, projects map to any institutional structure
Multi-cloud -- deploy TREs on AWS, Azure, or GCP without changing the workflow
Research institution experience -- 8+ years in production with research institutions

Competitive Advantages¶

Differentiator	RosettaHub TRE	Typical TRE Providers
Multi-cloud support	AWS, Azure, GCP, Alibaba Cloud, OVH, OpenStack	Usually single-cloud
Governance + compute	Closed-loop in one platform	Separate tools, manual integration
Formation-based environments	Cloud-agnostic, reproducible, shareable	Provider-specific templates
Real-time cost control	Event-driven enforcement	Billing-lag reporting
User onboarding	Seconds via SSO	Days/weeks with manual provisioning
Organization hierarchy	Unlimited nesting, budget delegation	Flat or two-level structures

Next Steps¶

The RosettaOps Model -- understand tiered governance
Formations -- learn how environments are defined
Projects -- isolate work by study or grant
Enterprise & SMB -- governance capabilities in depth