Blue Origin BlueGPT Turned Lunar Hardware Design into a Multi-Agent Loop

Blue Origin built an internal AI ecosystem called BlueGPT to design hardware that can survive the 14-day lunar night faster. BlueGPT is not just a chatbot. It is an operating layer that combines a secure LLM gateway, an agent marketplace, and a multi-agent orchestration platform. On top of this platform, Blue Origin's In-Space Systems team designed the Thermal Energy Advanced Regolith Exchanger (TEAREx). TEAREx is a thermal battery concept that uses lunar soil, or regolith, as a heat storage medium so systems can survive the harsh lunar night.

The core point is not that AI replaced engineers. The important shift is that a team of agents can run requirements interpretation, internal knowledge retrieval, design generation, physics simulation, result evaluation, and iterative redesign in parallel. According to the public case study, Blue Origin reduced TEAREx hardware development from years to days, a 90% reduction, and cut an analysis workflow from four days to four hours, a 6x acceleration. Two to three human engineers worked with an agent team composed of supervisor, librarian, requirements, design, and analysis agents.

1. AI Technologies Used

Secure LLM gateway and agent marketplace: Employees can create and reuse specialized agents connected to internal knowledge and tools. Based on the public case study, BlueGPT had more than 2,700 agents deployed, 3.5 million monthly interactions, and 70% company-wide adoption.
Model access through Amazon Bedrock: Foundation models can be called with security and governance controls, and agentic applications can be deployed on that foundation.
Amazon Bedrock Knowledge Bases + OpenSearch RAG: Internal expertise that public models cannot know, such as aerospace, manufacturing, and thermal design knowledge, is retrieved and added to agent context.
Amazon Bedrock AgentCore memory: Session memory and persistent memory are separated so agents can keep short-term task context and long-term design knowledge. The AWS case study mentions hierarchical memory, insight extraction, and namespace-based security and access control.
Strands Agents SDK: Agent orchestration is built around model reasoning rather than a fixed workflow. A supervisor agent decomposes work, while domain agents handle requirements, design, and analysis tasks.
Amazon EKS-based agent runtime: Containerized agent runtimes, MCP servers, manufacturing execution system integrations, data extraction, and risk management microservices can operate together.
AWS Lambda workflow automation: Serverless automation and service-to-service glue logic can be expanded for agent tool calls.
Amazon EC2 P5/G5 GPU simulation: Designs generated by agents are validated through complex physics simulation and topology optimization loops. The design, analysis, and revision loop repeats until requirements are met.

2. Implementable System Architecture

The structure below is an implementation pattern based on the Blue Origin case published by AWS and a typical enterprise manufacturing and aerospace AI environment. The original source explicitly mentions Amazon Bedrock, Bedrock Knowledge Bases, Bedrock AgentCore, Strands Agents SDK, Amazon EKS, OpenSearch RAG, Amazon RDS, AWS Lambda, and Amazon EC2 P5/G5. Details such as portal delivery, design artifact storage, and validation queues should be treated as practical implementation examples needed in a real operating environment.

Connect requirements and internal knowledge

BlueGPT connects thermal design requirements, manufacturing constraints, material and test data, and prior design knowledge as searchable context.

Assemble a specialized agent team

Supervisor, librarian, requirements, design, and analysis agents split the work by role and are coordinated through the Strands Agents SDK.

Generate designs and retrieve evidence

Agents use Bedrock models and OpenSearch RAG to produce design candidates and supporting evidence that match the requirements.

Run GPU simulation and evaluation

Physics simulation and topology optimization run on EC2 P5/G5, and the results are compared against the requirements.

Iterate the design and require human approval

The agentic loop repeats until requirements are met, and human engineers review the design rationale and analysis results.

If Built as a Real Service: AWS Reference Architecture

If this logical flow is deployed on AWS, the architecture can look like the pattern below. The agent runtime sits on EKS, while knowledge retrieval, memory, model calls, and simulation are loosely coupled so the design loop can repeat.

Blue Origin BlueGPT AWS reference architecture — AWS architecture that connects internal requirements and knowledge retrieval, multi-agent orchestration, Bedrock and AgentCore inference and memory, EC2 P5/G5 simulation, design review, and manufacturing artifact generation. The bronze path highlights the design-simulation loop where cost, latency, and accuracy concentrate.

BlueGPT portal / gateway - CloudFront + S3, ALB, Amazon EKS: Provides the internal portal and gateway where employees access the agent marketplace and design workflows.
Agent runtime - Amazon EKS: Runs Strands Agents SDK-based supervisor and domain agents, MCP servers, and connectors for manufacturing and analysis tools.
Foundation models & memory - Amazon Bedrock, Amazon Bedrock AgentCore: Handles foundation model calls, agent deployment, session and persistent memory, and hierarchical context management.
Knowledge retrieval - Amazon Bedrock Knowledge Bases, Amazon OpenSearch Service: Retrieves internal technical documents, requirements, test data, and prior design rationale through RAG.
Operational data - Amazon RDS: Stores relational data such as requirements, agent metadata, task state, and design review records.
Design artifacts - Amazon S3: Stores CAD exports, simulation input and output, reports, and manufacturing artifacts.
Workflow automation - AWS Lambda: Runs automated steps such as agent tool calls, data extraction, simulation submission, and result collection.
Simulation compute - Amazon EC2 P5/G5: Runs GPU-accelerated physics simulation and topology optimization.
Review & manufacturing handoff - Internal review UI/API and manufacturing execution system integration: Human engineers approve evidence and analysis results before handoff to 3D printing, manufacturing execution, and supplier communication.
Security & Audit - IAM, KMS, CloudTrail, CloudWatch: Aerospace design data may involve export controls and confidential engineering data, so permissions, encryption, audit logs, and monitoring must apply across the whole system.

The bronze path highlights the core path where cost, latency, and accuracy concentrate: Bedrock and AgentCore inference, RAG retrieval, EC2 P5/G5 simulation, and result evaluation. Cross-cutting security and audit services such as IAM, KMS, CloudTrail, and CloudWatch are not drawn in the diagram and are summarized in the list above. AWS is an example implementation; the same pattern can move to another cloud or on-premises environment if security requirements are met.

3. Workflow

Enter the design goal

An engineer enters performance requirements, constraints, and manufacturability criteria for hardware such as TEAREx.

Retrieve knowledge and decompose requirements

The librarian agent retrieves internal knowledge, and the requirements agent turns it into verifiable design conditions.

Generate design candidates

The design agent creates candidates that reflect thermal, structural, and mass constraints, while the supervisor agent assigns the next tasks.

Run simulation

The analysis agent calls EC2 P5/G5-based analysis tools to run physics simulation and topology optimization.

Evaluate and iterate

If the result does not meet the requirements, the agent team revises the design and runs the simulation again.

Approve and hand off to manufacturing

Human engineers review the final design, rationale, and simulation results before moving to manufacturing and testing.

In this workflow, AI is not the final design authority. It operates as an engineering operations layer that automates expert knowledge retrieval and iterative analysis. Important decisions still belong to human engineers, but AI can sharply reduce waiting time and repetition cost across the requirements, design, analysis, and revision loop.

4. Build and Operating Cost

Blue Origin has not disclosed its internal operating cost. A comparable cost structure can be divided into the following areas.

Model and agent cost: Bedrock model calls, AgentCore memory, and Knowledge Bases/RAG synchronization all add cost. As the agentic loop gets longer, token use, memory calls, and retrieval calls grow together.
GPU simulation cost: Physics simulation and topology optimization on EC2 P5/G5 are likely to be the largest variable cost. Job queues, spot capacity, checkpointing, and early-stop criteria are critical for cost control.
Runtime and retrieval infrastructure cost: EKS clusters, OpenSearch indexes, RDS, S3 artifact storage, and Lambda automation add baseline operating cost.
Security and governance cost: Aerospace design data has strong export control, trade secret, and supply chain security requirements. Access control, audit logs, data separation, and model/tool call retention are required.
Expert review cost: Human review does not disappear. The benefit is that engineers can focus on requirements judgment, design trade-offs, and approval instead of repeated document gathering and analysis execution.

5. Business Benefits

Shorter Hardware Development Cycles

Based on the public case study, TEAREx development was reduced from years to days from concept to printed part, a 90% reduction. This was not document automation. It came from reducing actual hardware design and simulation iteration time.

6x Faster Analysis Workflows

An analysis task that previously took four days was reduced to four hours. The agent team automated the repeated loop of interpreting requirements, running analysis tools, and evaluating results.

Scaled Expert Engineering Capacity

Two to three human engineers worked with supervisor, librarian, requirements, design, and analysis agents to perform work that would normally require a much larger team. Blue Origin describes this as a way for small teams to execute large missions.

Reusable Agent Patterns

The agents used for TEAREx are not one-off scripts. They remain in the BlueGPT marketplace and can be recombined for tubes, barrels, harnesses, assemblies, and eventually vehicle-level design. This is the difference between one-time automation and platform-based AI adoption.

Expansion into Manufacturing and Operations

BlueGPT is also used beyond engineering for manufacturing work order improvement, non-conformance resolution, and supplier communication. The public case study says manufacturing teams are resolving non-conformance issues 70% faster.

6. Implications for Manufacturing and B2B Companies

This case shows that AI can play a meaningful role even in industries with strict regulation and physical constraints, such as aerospace. The success pattern is not "deploy a general chatbot." The value appears when internal knowledge, requirements, analysis tools, manufacturing data, and approval workflows are connected inside an agentic loop.

Manufacturing and B2B companies can start with areas such as:

Requirement decomposition and analysis automation for equipment and component designs
Root-cause candidate retrieval and action plan drafting for quality issues and non-conformance
Explanation and impact analysis for supplier changes
Troubleshooting agents based on equipment manuals, test data, and historical failure records
Simulation run orchestration across material and process conditions
Automatic generation of design change approval documents and audit trails

The common pattern is that the work is long, internal knowledge is scattered, and repeated analysis and approval steps are common. If AI remains only a free-form conversation tool, its effect will be limited. Companies should first decompose requirements into machine-evaluable criteria, expose tools through APIs or job queues, and define the operating boundary where humans approve the results.

7. Adoption Checklist

Are the design, quality, or analysis tasks that AI will handle defined with measurable requirements?
Can internal documents, test data, design history, and manufacturing knowledge be retrieved through permission-aware RAG?
Are CAD, CAE, simulation, MES, and PLM tools available to agents through APIs or job queues?
Are the roles of each agent, such as supervisor, librarian, requirements, design, and analysis, clearly separated?
Does every design result keep an audit trail of requirements, evidence documents, tool calls, model versions, and simulation results?
Is anything blocked from automatically reaching manufacturing, delivery, or supply chain systems before human engineer approval?
Are there queues, spot policies, checkpoints, retry rules, and early-stop criteria to control EC2 GPU cost?
Are export control, confidential design data permissions, and external model call policies explicit?
Are successful agent patterns registered and managed in a reusable marketplace?