TheThe UseUse ofof CarrierCarrier GradeGrade LinuxLinux inin SpaceSpace

David Bryan The PTR Group, Inc. August 2007 AgendaAgenda What is Carrier Grade ? xCarrier Grade xStandards xEvolution xServices Applicability to Spacecraft A usage scenario Conclusion

Small Satellite Conference 2007 WhatWhat isis CarrierCarrier Grade?Grade? From the Telecommunications Industry Carrier Grade refers to product availability xTypically 5 nines or better (99.999%) xLess than 5 minutes downtime per year

Small Satellite Conference 2007 ChallengesChallenges facedfaced inin TelecomTelecom Demanding Environment xQuality of Service xHigh Availability Custom hardware Custom / Proprietary OS Upgrades and Implementation of New Services and Features xTied to proprietary vendor xLong Lead times to implement / limited development support xVery expensive

Small Satellite Conference 2007 ChallengesChallenges forfor FlightFlight SoftwareSoftware Harsh environment xRadiation xLimited Resources Custom Hardware xLong Lead times xVery expensive xLimited development support Technology Gap xMoore’s Law

Small Satellite Conference 2007 Solution:Solution: CarrierCarrier GradeGrade StandardsStandards Hardware Standards: ATCA xEnables development of portable software applications xOne board may be swapped for another Software Standards: xMove the autonomy requirements from the application into middleware xRequires standardized middleware interfaces

Small Satellite Conference 2007 WhatWhat isis CarrierCarrier GradeGrade Linux?Linux?

Linux Flight Heritage

Photo Courtesy NRL Illustration Courtesy AirForce

LinuxLinux OSOS

Small Satellite Conference 2007 WhatWhat isis CarrierCarrier GradeGrade Linux?Linux?

Promoting and Maintaining Standards

HAHA MiddlewareMiddleware LinuxLinux OSOS

Small Satellite Conference 2007 WhatWhat isis CarrierCarrier GradeGrade LinuxLinux

Increased Mission Availability Tools, Technologies, and APIs Use of COTS processors and boards for space applications

HAHA ApplicationsApplications

HA Middleware LinuxLinux OSOS

Small Satellite Conference 2007 CGLCGL MiddlewareMiddleware ServicesServices

Service Name Feature Availability Management Fault detection, fault Framework recovery, cluster coordination Cluster Membership Infrastructure, Publish- Service Subscribe – Node joins cluster, node leaves cluster Checkpoint Assists with fault recovery – reduce downtime

Event Infrastructure, Publish- Subscribe state changes

Lock Infrastructure, Shared resource control

Messaging Infrastructure, inter-node, intra-node messaging

Information Model Standard model of cluster Management Service entities

Logging Storage of data for system administrators

Notification Failure notifications

Small Satellite Conference 2007 AA CGLCGL BusBus Design...Design... Hardware components x Processors, Redundant Interconnects, Sensors, Actuators Software responsibilities x Command and Data Sensor Node A0 Handling Y

x Sensor data manipulation Sensor Node B0 and processing X Actuator • Actuator control Sensor X’ • Inter-processor Node B1 communication Sensor Node A1 Z

High Speed Interconnect

Small Satellite Conference 2007 FailoverFailover ScenarioScenario

Primary

Sensor Node A0 X

Management Node Backup

Sensor Node A1 X’

High Speed Interconnect

Small Satellite Conference 2007 FaultFault OccursOccurs

Primary Node Fails

Primary

Sensor Node A0 X

Management Node Backup

Sensor Node A1 X’

High Speed Interconnect

Small Satellite Conference 2007 FaultFault DetectionDetection

Primary Fault Detected: Sensor Node A0 Node A0 Heartbeat X

Management Node Backup

Sensor Node A1 X’

High Speed Interconnect

Small Satellite Conference 2007 FaultFault RecoveryRecovery

Primary A0 Shutdown Sensor Node A0 X

Management Node Backup

Sensor Node A1 X’

High Speed Interconnect

Small Satellite Conference 2007 MissionMission RestoredRestored

Primary A1 Brought Online Sensor Node A0 as Primary X

Management Node Primary Sensor Node A1 X’

High Speed Interconnect

Small Satellite Conference 2007 RedundancyRedundancy RestoredRestored

Backup A0 Restored as Sensor Node A0 Backup X

Management Node Primary Sensor Node A1 X’

High Speed Interconnect

Small Satellite Conference 2007 ConclusionConclusion is an enabler for high availability space applications xLow Cost - significant functionality is available out of the box xPortable - CGL is an implementation of a standard xReduces the development lead time

Small Satellite Conference 2007 ForFor MoreMore InformationInformation…… Carrier Grade Linux xhttp://www.linux-foundation.org Service Availability Forum xhttp://www.saforum.org SCOPE Alliance xhttp://www.scope-alliance.org FlightLinux xhttp://www.openflightlinux.org

Small Satellite Conference 2007