Open System Firmware

Server Base Manageability Guide for SBSA Compliant Arm (AARCH64) Servers

Supreeth Venkatesh, Staff System Architect Arm Agenda

• Rationale • History • Introduction • Proof of Concept • Roadmap/Plan • Call to Action Rationale • Arm Ecosystem Partners value standardized common server manageability capabilities with scope for flexible customizations which add value to the end user. • Standardization is key to ensure that Arm Ecosystem does not get fragmented by point solutions that plague the industry today. • Leverage industry standard system management specifications including but not limited to Redfish, Platform Level Data Model (PLDM), Management Component Transport Protocol (MCTP) as defined by the Distributed Management Task Force (DMTF). • Leverage Hardware Management Specifications and designs as defined by Open Compute Group (OCP). History

• Server Base System Architecture (SBSA) -Hardware system architecture for servers based on 64-bit ARM processors. -Standardizes processor element features and key aspects of system architecture. -http://infocenter.arm.com/help/topic/com.arm.doc.den0029b/Server_Base_System_Architecture_v5_0_ARM_DEN_0029B.pdf

• Server Base Boot Requirements (SBBR) -Defines the Boot and Runtime Services expected by an enterprise platform Operating System or hypervisor, for an SBSA-compliant Arm AArch64 server. -Based on UEFI, SMBIOS and ACPI specifications. -http://infocenter.arm.com/help/topic/com.arm.doc.den0044c/Server_Base_Boot_Requirements_v1_1_Arm_DEN_0044C.pdf History

• ARM Enterprise ACS -Architecture Compliance Suite tests SBSA and SBBR specifications. https://github.com/ARM-software/arm-enterprise-acs

FWTS LuvOS SBBR SBSA

SBSA SBBR PAL SCT UEFI TF-A

FVP

Arm Partner OSS Introduction

• Server Base Manageability is a specification that is under development in conjunction with partners across the industry. • Together with SBSA and SBBR, the SBMG provides a standard based approach to building Arm servers, their firmware and their server management capabilities. • SBMG is developed within the Arm ServerAC community. • Engineering change request process similar to other standards bodies. • Anybody in the community can raise a request for a change. Public to community by sending a mail to [email protected] Or raising a public ticket on the mantis DB https://atg-mantis.arm.com/ Introduction – Compliance Levels

• Defines several levels of manageability compliance (e.g., M0, M1, M2)

BMC-Platform SoC-BMC Elements BMC Management BMC-IO Device Interfaces Interfaces Services Interfaces Host Interface Interface

IMPLEMENTATION IMPLEMENTATION IMPLEMENTATION IMPLEMENTATION IMPLEMENTATION LEVEL M0 DEFINED DEFINED DEFINED DEFINED DEFINED

IPMI based Host Interface and Redfish Host IMPLEMENTATION IMPLEMENTATION Interface IMPLEMENTATION LEVEL M1 DEFINED DEFINED Redfish and IPMI /MCTP Host Interface DEFINED

IPMI based Host Interface and Redfish Host IMPLEMENTATION Interface LEVEL M2 DEFINED Redfish/PLDM/MCTP Redfish and IPMI /MCTP Host Interface NC-SI Introduction – Sub teams

• To help guide the Arm server designers to provide common manageability functions to the end users. • To accelerate development of Server Base Manageability Guide (SBMG), three different teams within Arm ServerAC community with participation from several different Arm Ecosystem Partners have been formed. -Reliability, Availability, Serviceability (RAS) team. -Platform Monitoring and Control team. -Remote Debug team. Introduction – RAS

• Define error record formats for RAS errors leveraging existing common platform error record (CPER) as specified in unified extensible firmware interface specification (UEFI). • Define the SOC - BMC manageability interface requirements for RAS errors including in-band and out-of-band interfaces. • Define fault notification signal. • Define the interface and mechanism for injecting RAS hardware errors. Satellite Application BMC Management Processor (SOC) Controller

CPER/Async Event RAS Errror Event

Generate/Store CPER EventMessageSupported(Version, TID)

return(EventClass) RAS Event Supported? EventMessageBufferSize(CperSize) yes

return(MaxBufferSize) MaxBufferSize < CperSize? Yes PlatformEventMessage(RasPollEvent)

return(PLDM_BASE_CODE)

PollForPlatformEventMessage (GetFirstPart, 0x0000) Introduction return(nextDataHandle, Start=0x01, Ras PollEvent, – RAS EventDataSize, EventData )

PollForPlatformEventMessage (GetNextPart, 0xFFFF)

return(nextDataHandle, Middle=0x02, Ras PollEvent, EventDataSize, EventData )

PollForPlatformEventMessage (GetNextPart, 0xFFFF) return(nextDataHandle, End=0x04, Ras PollEvent, EventDataSize, EventData, EventDataIntegrityChecksum )

PollForPlatformEventMessage (AcknowledgementOnly, 0xFFFF) return(nextDataHandle, StartAndE nd=0x05, EventID=0x0000 ) Introduction – Monitoring & Control

• Define interfaces and protocol needed for BMC – SOC communication in the scope of Platform Monitoring leveraging Open Hardware Management Specification for Remote Machine Management defined by OCP. • List of use cases analyzed for Standardizing -BMC to Multiple SOC communication. -BMC assisted SOC power actions. -BMC to monitor critical health of SOC. -BMC to monitor SOC boot progress. -BMC watchdog use cases. Satellite BMC Management Controller

RunInitAgent( RunInitAgent( PLDMTerminusOnline) SystemHardReset)

GetPDRRepositoryInfo()

return(PDRInfo)

GetPDR(0x0000,..)

return(nextRecordHandle)

GetPDR( recordHandle,..)

return(0x0000)

Create Central Introduction PDR Repo – Monitor & Control GetSensorReading(SensorID, ...)

return(presentReading, ...)

Calculate Reading

Reading Conversion formula: Y = (m * X + B)

Where:

Y = converted reading in Units

X = reading from sensor

m = resolution from PDR in Units

B = offset from PDR in Units

Units = sensor/effecter Units, based on the Units and auxUnits fields from the PDR for the numeric sensor Introduction – Remote Debug

• Server Remote Debug is the act of gaining visibility of the hardware and software behaviors of an SoC, using a debugger which is not directly connected to the Server SoC. • Define protocols for communicating between the debugger and the BMC. • Define physical interfaces between BMC and SoC. • Define protocols for communicating between the BMC and the SoC. • Define mechanisms for ensuring only suitable debuggers can access the SoC. Introduction – Remote Debug Proof of Concept

• OpenBMC is a project. It is a highly extensible framework for BMC software and implement for data-center computer systems. • OpenBMC implementation will be used as a medium to realize Server Base Manageability Guide. • Arm and Arm SiPs are participating in the OpenBMC development so that there will be an open source implementation to the SBMG requirements. • Arm has joined OpenBMC Technical Steering Committee (Arm, IBM, , , & ). https://github.com/openbmc/docs/blob/master/README.md#technical-steering-committee

Implementation libMCTP libMCTP libPLDM RAS Driver RAS App(BMC) libPLDM (BMC) (BMC) (Satellite MC) (Satellite MC) (Satellite MC)

Daemon Driver Starts Starts mctp_init Init(Blocking/Non-Blocking)

Init(Blocking/Non-Blocking) mctp_init return (InitCode)

return (InitCode) mctp_binding_int() Success mctp_binding_int() return (BindInitCode) return (InitCode) return (BindInitCode)

encode_pldm_cmd( PlatformEventMessage, RAS PollEvent) return (pldm_msg)

mctp_message_tx( PldmRequest PldmRequest send_msg( mctp_eids , Binding Phys ical La yer msg, Notification Notification pldm_msg) Proof of len)

Generate Concept – PLDM Res ponse RAS mctp_message_tx( send_msg( mctp_eids , PldmResponse msg, Binding Phys ical La yer return (pldm_response) pldm_msg) Notification len)

decode_pldm_cmd( pldm_response, encode_pldm_cmd( ) PollforPlatformEventMessage, GetFirstPart) return (pldm_msg) mctp_message_tx( send_msg( mctp_eids , PldmRequest PldmRequest msg, Binding Phys ical La yer pldm_msg) Notification Notification len) Generate PLDM Res ponse Proof of Concept – Remote Debug • Proposed Design proposal is to integrate OpenOCD within OpenBMC stack. https://lists.ozlabs.org/pipermail/openbmc/2019-July/017122.html • OpenOCD is an open source on-chip debugging solution for JTAG connected processors. It enables source level debugging with GNU gdb. It can also integrate with and GDB aware IDE, such as eclipse. • Completed first phase of implementation which includes the demonstration of GDB connection on port 3333 to OpenOCD debug server daemon running on OpenBMC. • Next phase of implementation to utilize a wrapper which adds JTAG master core infrastructure by defining new JTAG class and provide generic JTAG interface to allow hardware specific drivers to connect this interface. https://patchwork.ozlabs.org/cover/848652/ • This will enable all JTAG drivers to use the common interface part and will have separate drivers for hardware implementation. Roadmap/Plan

. Management Console .Reference Implementation .Reference Implementation .Reference .Reference Implementation . Security Transport Protocol to demo Control and to demo Remote Firmware Implementation to demo to demo Remote Debug requirements/Root of Enablement. Monitor on board and on upgrade Capability on a BMC RAS Errors on a Capability on a chosen Trust. SOC sensors of a chosen chosen reference platform. chosen reference reference platform. reference platform. .Leverage existing UEFI platform. .JTAG Interface. .Reference PLDM capsule update and arm-tf .Leverage existing CPER enablement to send sensor update methods. records. data when requested by Host Firmware MC.

. Pick/Choose Reference .Reference Implementation .Reference Implementation .Reference .Reference Implementation . Sync up with Security Hardware Platform for to demo Control and to demo Remote Firmware Implementation to demo to demo Remote Debug team. OpenBMC Enablement. Monitor on board and on upgrade Capability on a transfer of Binary Capability from BMC. . Enable OpenBMC. SOC sensors of a chosen chosen reference platform encoded Json (BEJ) RAS .Integrate OpenOCD in BMC . Demonstrate reference platform. from BMC. errors from SOC to BMC. stack. Reference .IPMI/REDFISH Command .Leverage PLDM for Firmware .IPMI/REDFISH IPMI/REDFISH implementation to gather Update standard. implementation to Commands. and control on board and transfer RAS errors on SOC sensor. Management Controller Management

. PMCI WG Participation – . PMCI WG Participation – . Arm Specific Requirement . Arm Specific . https://www.dmtf.org/stan . PMCI Security MCTP, PLDM, Redfish MCTP, PLDM, Redfish Updates to MCTP, PLDM and Requirement Updates to dards/pmci (Target PMCI Updates. . Whitepaper/Reference Redfish. MCTP, PLDM and Redfish. documents.) . Interface Solution/ECR - Control and . Whitepaper/Reference . Whitepaper/Reference . Whitepaper/Reference Standardization. Monitor on board and on Solution/ECR - Remote Solution/ECR – BMC RAS. Solution/ECR - Remote SOC sensors on arm Firmware upgrade Debug from BMC. servers. Capability Standards Call to Action • Participate in OpenBMC to enable reference implementation and open source delivery option. • Participate in OCP to influence Hardware Management Specifications and designs. • Participate in ServerAC to help define SBMG. • Send an email to Arm ([email protected]). • Participate in Redfish, PMCI and other DMTF Workgroups.