Edge Node Platform Update Agent#

Background#

This document provides high-level design and implementation guidelines. Refer to Platform Update Agent in Edge Node Agents’ GitHub* repository for implementation details.

Target Audience#

The target audience for this document is:

  • Developers interested in contributing to the implementation of the Platform Update Agent.

  • Administrators and System Architects interested in the architecture, design and functionality of the Platform Update Agent.

Overview#

Platform Update Agent is part of the Open Edge Platform’s Edge Node Zero Touch Provisioning. It is installed, configured and automatically executed at Provisioning time.

The main responsibility of the agent is to provide system level (Operating System packages, Edge Node Agents, Kernel Command-line) update capabilities and installation of new packages during Day 2 operation of the Edge Node.

Platform Update Agent reports its status and makes requests for Update Source list and Update Schedule to Maintenance Manager in Edge Infrastructure Manager. It will perform updates based on the received schedule.

Platform Update Agent leverages Intel® In-Band Manageability software to update OSes and install new packages.

Architecture Diagram#

The Platform Update Agent follows the architecture and design principles set out in High-Level Architecture

High-Level Architecture of the Platform Update Agent

Figure 1: High-Level Architecture of Platform Update Agent#

Key Components#

  1. The Platform Update Agent is a system daemon packaged as a .deb or .rpm package (depending on target Operating System).

  2. platform-update-agent.yaml file stores Platform Update Agent’s configuration.

  3. platform-update-agent-metadata file stores Platform Update Agent’s metadata.

  4. The Platform Update Agent requires a designated JSON Web Token (JWT).

  5. Intel® In-Band Manageability framework is leveraged by Platform Update Agent to perform updates.

Data Flow#

The data flow of the Platform Update Agent can be broken down into multiple concepts called out in Workflow Stages section.

Workflow Stages#

  1. Update Sequence Diagram - Ubuntu* OS detailed view:

    Sequence diagram showcasing the communication between the Maintenance Manager in Edge Orchestrator and Platform Update Agent on the Edge Node.

    Ubuntu OS update as an example with focus on the internal components of the Platform Update Agent.

            sequenceDiagram
    %%{wrap}%%
    autonumber
    participant mm as "Maintenance Manager"
    box Edge Node
    participant pua as Platform Update Agent
    participant inbc as INBC
    participant grub as GRUB (Kernel Commandline)
    participant apt as APT (Tool and filesystem)
    participant ur as Upstream APT Repo (OS)
    participant pr as Private APT Repo (Open Edge Platform)
    end
    
    pua->>pua: read configuration file
    note over pua: metadata will indicate if PUA was restarted during/due to update, it will indicate if certain steps will be skipped because they were already performed as part of updating
    pua->>pua: read/init metadata
    
    note over pua: if INBM has not been provision
    pua->>pua: provision INBM
    
    loop periodically
       pua->>mm: PlatformUpdateStatusRequest(guid, UP_TO_DATE)
       mm->>pua: PlatformUpdateStatusInd (update_source, update_schedule)
       pua->>pua: update/watch the schedule on EN
       pua->>pua: update metadata
    end
    
    note over  pua, mm: reach maintenance schedule start time
       pua-->>mm: PlatformUpdateStatusRequest(guid, STARTED)
    mm->>pua: PlatformUpdateStatusInd (update_source, update_schedule)
       pua->>pua: update metadata
    
    note over  pua, mm: UPDATE APT SOURCES
    pua->>inbc: update Ubuntu sources through INBM config (ConfigureOsAptRepo(osRepoURL))
    inbc->>apt: inbc source os update (--sources osRepoURL)
    inbc->>pua: success
    pua->>inbc: update Open Edge Platform sources through INBM config (ConfigureCustomAptRepos(CustomRepos))
    inbc->>apt: inbc source application add (--sources CustomRepos)
    inbc->>pua: success
    
    note over  pua, mm: SELF PUA UPDATE
    pua->>apt: SelfUpdate() - apt "NEEDRESTART_MODE=a" install --only-upgrade platfrom-update-agent
    apt->>pr: get latest package
    pr->>apt: return and install latest package
    apt->>pua: if package available = success, PUA restarts, if no package available =success, continue
    
    note over  pua, mm: UPDATE INBM
    pua->>apt: updateINBM() - apt install --only-upgrade inbm***
    apt->>pr: get latest packages
    pr->>apt: return and install latest packages
    apt->>pua: success
    
    note over pua, mm: UPDATE GRUB CONFIG
    pua->>pua: get new GRUB config version
    pua->>grub: Update Kernel Commandline boot parameters /etc/default/grub
    pua->>grub: update-grub
    grub->>grub: updating grub config
    
    note over  pua, mm: INSTALL NEW OS PACKAGES AND AGENTS
    pua->>inbc: inbc sota --packa_list package_1 -m download-only --reboot no
    inbc->>apt: apt-get install package1 --download-only
    apt->>pr: get latest packages
    pr->>apt: return and download latest packages
    apt->>inbc: success
    inbc->>pua: success
    pua->>inbc: inbc sota --packa_list package_1 -m no-download --reboot no
    inbc->>apt: apt-get install packages -n no-download -no-reboot
    apt->>inbc: success
    inbc->>pua: success
    
    note over pua, mm: UPDATE OS PACKAGES AND AGENTS
    pua->>inbc: download packages - inbc sota -m download-only -no-reboot
    inbc->>apt: apt update && apt-upgrade --download-only
    apt->>pr: get latest packages
    pr->>apt: return and download latest packages
    inbc->>pua: success
    pua->>inbc: inbc sota -m no-download --reboot yes
    pua->>apt: update OS and Agents: apt-upgrade --no-download --reboot yes
    apt->>inbc: success
    inbc->>pua: success
    
    note over pua: INBM REBOOTS THE NODE
    pua->>pua: verify OS/Agents update
    Note over mm, pua: update done/failed
    pua->>pua: change status to 'UPDATED'/'FAILED' and update metadata
    pua->> mm: PlatformUpdateStatusRequest(guid, UPDATED/FAILED)
       mm->>pua: PlatformUpdateStatusInd (update_source, update_schedule)
       pua->>pua: change status to 'UP-TO-DATE' (if update is not FAILED) and update metadata
        

Figure 2: Platform Update Agent - Ubuntu OS detailed view

  1. Platform Update Agent sequence - interaction with Edge Infrastructure Manager - Edge Microvisor Toolkit detailed view:

    Sequence diagram showcasing the communication between the Platform Update Agent and the Edge Orchestrator.

    Edge Microvisor Toolkit update as an example with focus on the communication between Edge Infrastructure Manager and Platform Update Agent.

            sequenceDiagram
    %%{wrap}%%
    autonumber
    
    actor a as Admin
    participant reg as Release Service
    participant ui as User Interface
    participant inv as Inventory
    participant hm as Host Manager
    participant nm as New OS Resource Manager
    participant mm as Maintenance Manager
    box LightCyan Edge Node
    participant pua as Platform Update Agent / INBC
    participant na as Node Agent
    end
    
    note over pua, na: EN OS is installed on partition A and all EN components are up
    
    par
       loop periodically
          na->>hm: Send EN heartbeat
          opt Host status change
             hm->>inv: Update host status
          end
       end
       loop daily
          nm->>reg: download new Curated Profile manifests
          reg-->>nm: return
          nm->>nm: parse the manifests
          nm->>inv: create new OS Resources for new Curated Profiles
          opt manualOSImageUpdate=false
             nm->>inv: update desired_os of all instances with latest OS where instance.desired_os.profile_name=manifest.profile_name
          end
       end
       loop periodically
          pua->>mm: PlatformUpdateStatusRequest(guid, UP_TO_DATE)
          mm->>inv: Set Instance UpdateStatus(UP_TO_DATE)
          mm->>pua: PlatformUpdateStatusResponce (os_type, os_image_source, update_source, update_schedule)
          pua->>pua: update metadata
       end
       opt manualOSImageUpdate=true
          a->>inv: update desired_os to a selected OS Resource in chosen Instances
       end
       ui->>inv: per instance, get the ResourceID of current_os and desired_os if the current_os's osType == immutable
       inv-->>ui: return
       ui->>ui: display 'Update available' in host details if osType == immutable and current_os.resourceId != desired_os.resourceId
    end
    note over  pua, mm: OS image update start time reached
    pua->>mm: PlatformUpdateStatusRequest(guid, STARTED)
    mm->>inv: Update Instance UpdateStatus (inst_id, UPDATE_IN_PROGRESS)
    pua->>pua: read metadata
    note over  pua, mm: UPDATE OF IMMUTABLE OS IMAGE
    pua->>pua: read metadata
    pua->>pua: compare sha and version of the installed image to the sha and version in the metadata
    alt versions are the same
       pua->>mm: UpdateStatus=UP_TO_DATE
       mm->>inv: UpdateStatus=UP_TO_DATE
    else versions are different
       pua->>reg: download image on partition B using os_image_url
       reg-->>pua: return
       alt download fail
          pua->>mm: UpdateStatus=FAILED FailureReason="DownloadFail"
          mm->>inv: UpdateStatus=FAIL
       else download success
          pua->>mm: UpdateStatus=STARTED
          pua->>pua: install OS on partition B
          pua->>pua: verify installation before reboot
          alt installation fail
             pua->>mm: UpdateStatus=FAILED StatusDetail.Status=Failed FailureReason=InstallationFail
             mm->>inv: UpdateStatus=FAIL
          else installation success
             pua->>mm: UpdateStatus=STARTED
             pua->>pua: set partition B as one-time bootable
             pua->>pua: reboot node
             alt node fails to boot up from partition B, successful boot up from partition A (rollback success)
                pua->>mm:  UpdateStatus=FAILED StatusDetail.Status=Rolledback FailureReason=BootloaderFail
                mm->>inv: UpdateStatus=FAIL
             else node fails to boot up from partition B and partition A (rollback failure)
                hm->>inv: HostStatus=CONNECTION_LOST
             else node boots up from partition B
                note over pua: PUA and INBM start
                pua->>pua: verify update completion and set partition B as bootable
                alt update fail
                   pua->>mm: UpdateStatus=FAILED StatusDetail.Status=Failed e.g. FailureReason=OSCommitFail
                   mm->>inv: UpdateStatus=FAIL
                   pua->>pua: reboot (rollback to partition A)
                   pua->>mm: UpdateStatus=FAILED StatusDetail.Status=Rolledback e.g. FailureReason=OSCommitFail
                   mm->>inv: UpdateStatus=FAIL
                else update success
                   pua->>mm: UpdateStatus=UPDATED StatusDetail.Status=SUCCESS FailureReason=NoFailure, sends installed profile_name, profile_version
                   mm->>inv: Filter OSResources by profile_name and profile_version=x, get one (A)
                   inv-->>mm: return
                   mm->>inv: Set Instance UpdateStatus=DONE, current_os=A
                   pua->>mm: UpdateStatus=UP_TO_DATE
                   mm->>inv: UpdateStatus=RUNNING
                end
             end
          end
       end
    end
        

Figure 3: Platform Update Agent sequence - interaction with Edge Infrastructure Manager - Edge Microvisor Toolkit detailed view

  1. Platform Update Agent integration with JWT:

    Since APT does not nativity support JWT for authentication, it is necessary to introduce forward proxy that will act as an intermediary between APT and Release Service file server.

    Caddy* server is a third party proxy server being used as a forward proxy on the Edge Node.

    It appends JWT to requests from APT client.

    Platform Update Agent integration with JWT

    Figure 4: Platform Update Agent integration with JWT

Extensibility#

The Platform Update Agent supports installation of new Ubuntu OS packages, to install new packages follow Edge Node update instructions.

Deployment#

The Platform Update Agent is deployed as a system daemon via installation of a .deb package during the provisioning or .rpm package as part of the Edge Microvisor Toolkit.

Technology Stack#

Below sections provide an overview of various aspects of the Platform Update Agent’s technology stack.

Implementation#

The Platform Update Agent is written in the Go* programming language. Platform Update Agent persists metadata file across updates/reboots to keep track of update stages.

Platform Update Agent pulls a Platform Update Schedule from the Maintenance Manager residing in the Edge Infrastructure Manager.

The schedule comes as part of the API call response from Maintenance Manager, it contains update_schedule (time to perform update), update_source (the sources information) and installed_packages (new packages to be installed) fields.

Internally PUA call INBM software to initiate the download and installation of new or updated OS level packages.

System Diagram#

Platform Update Agent depends on Edge Node’s Maintenance Manager. It is also depends on the Intel In-Band Manageability framework (<intel/intel-inb-manageability>) to perform update flow inside the code.

Update artifacts are published on APT server as part of a Release Service.

Platform Update Agent system diagram

Figure 5: Platform Update Agent system diagram#

Integrations#

Platform Update Agent does not expose an API, rather it is consuming APIs from both Edge Cluster Manager.

Platform Update Agent polls (over gRPC) the Maintenance Manager in Edge Infrastructure Manager periodically to:

  • Obtain schedules for Edge Node updates.

  • Obtain the list/source URL of apt mirrors in which the potential updates reside.

  • Platform Update Agent keeps track of internal Edge Node status in relation to system updates and saves it in the metadata file.

Platform Update Agent statuses (communicated to Maintenance Manager):

  • STATUS_TYPE_DOWNLOADED 6 - Status when the EN completes downloading update artifacts

  • STATUS_TYPE_DOWNLOADING 5 - Status when the EN is downloading update artifacts

  • STATUS_TYPE_FAILED 4 - Status when the EN update fails; a detailed log is also sent

  • STATUS_TYPE_UPDATED 3 - Status when the EN update is completed successfully

  • STATUS_TYPE_STARTED 2 - Status when the update process of EN has started

  • STATUS_TYPE_UP_TO_DATE 1 - Status when EN is not performing any update related actions

  • STATUS_TYPE_UNSPECIFIED 0 - Default value, status not specified

Platform Update Agent call to Maintenance Manager:

  • PlatformUpdateStatusRequest - Periodic request that sends Edge Node UUID and its update status, and receives update schedules and update source list.

            stateDiagram
       [*] --> UP_TO_DATE
    
       UP_TO_DATE --> DOWNLOADING: Download starts
       DOWNLOADING --> DOWNLOADED : Download succeeds
       DOWNLOADED --> DOWNLOADING: New version available
       DOWNLOADING --> FAILED: Download fails, maint window is over
       DOWNLOADING --> UP_TO_DATE: Download canceled
    
       DOWNLOADED --> STARTED: Update started
       STARTED --> UPDATED: Update succeeds
       STARTED --> FAILED: Update fails
    
       FAILED --> DOWNLOADING: Retry download with new maint window
    
       UPDATED --> UP_TO_DATE
        

Figure 6: Platform Update Agent integration

Security#

Security Policies#

Platform Update Agent adheres to Edge Node Agents High-Level Architecture security design principle.

Auditing#

Platform Update Agent adheres to Edge Node Agents High-Level Architecture observability design principle.

Upgrades#

Platform Update Agent adheres to Edge Node Agents High-Level Architecture upgrade design principle.