EMF On-Prem Upgrade Guide#

Upgrade Path: EMF On-Prem v3.0 → v3.1 Document Version: 1.0

Overview#

This document provides step-by-step instructions to upgrade On-Prem Edge Manageability Framework (EMF) from version 3.0 to 3.1.

Important Notes#

Warning

DISRUPTIVE UPGRADE WARNING This upgrade requires edge node re-onboarding due to architecture changes (RKE2 → K3s). Plan for edge nodes service downtime and manual data backup/restore procedures in edge nodes.

Prerequisites#

System Requirements#

  • Current EMF On-Prem installation version 3.0

  • Root/sudo privileges on orchestrator node

  • PostgreSQL service running and accessible

  • Sufficient disk space for backups ~200+GB

  • docker user credential if any pull limit hit

Pre-Upgrade Checklist#

Upgrade Procedure#

Step 1: Copy Latest OnPrem Upgrade Script#

On the orchestrator deployed node, copy the latest upgrade script:

cd
cp edge-manageability-framework/on-prem-installers/onprem/*.sh ~/
chmod +x onprem_upgrade.sh

Step 2: Open Two Terminals#

You will need two terminals for this upgrade process:

  • Terminal 1: To run the upgrade script

  • Terminal 2: To update proxy and load balancer configurations when prompted

Step 3: Terminal 1 - Set Environment Variables#

In Terminal 1, set the required environment variables:

# get LB IP
kubectl get svc argocd-server -n argocd
kubectl get svc traefik -n orch-gateway
kubectl get svc ingress-nginx-controller -n orch-boots

# Set Environment

export RELEASE_SERVICE_URL=registry-rs.edgeorchestration.intel.com
export ORCH_INSTALLER_PROFILE=onprem
export CLUSTER_DOMAIN=cluster.onprem
export GITEA_IMAGE_REGISTRY='docker.io'
export DOCKER_USERNAME=<docker-username>
export DOCKER_PASSWORD=<docker-password>
export ARGO_IP=<ARGO_LodeBalanceIP>
export TRAEFIK_IP=<TRAEFIK_LodeBalanceIP>
export NGINX_IP=<NGINX_LodeBalanceIP>

Note: if any docker limit hit issue user should set docker login credential as env

# Unset PROCEED to allow manual confirmation
unset PROCEED

# Set deployment version (replace with your actual version tag)
export DEPLOY_VERSION=v3.1.0-rc1

Step 4: Terminal 1 - Run OnPrem Upgrade Script#

In Terminal 1, execute the upgrade script:

./onprem_upgrade.sh

The script will:

  • Validate current installation

  • Check PostgreSQL status

  • Download packages and artifacts

  • Eventually prompt for confirmation:

Ready to proceed with installation? (yes/no)
  • DO NOT enter “yes” yet - proceed to Step 5 first

Step 5: Terminal 2 - Update Configuration#

Before confirming in Terminal 1, open Terminal 2 and update configurations:

  1. Update proxy settings (if applicable):

    file:repo_archives/tmp/edge-manageability-framework/orch-configs/profiles/proxy-none.yaml
    
    argo:
     proxy:
       httpProxy: ""
       httpsProxy: ""
       noProxy: ""
       enHttpProxy: ""
       enHttpsProxy: ""
       enFtpProxy: ""
       enSocksProxy: ""
       enNoProxy: ""
    

    Note: Update the proxy settings according to your network configuration.

  2. Verify load balancer IP configuration:

    # Check current LoadBalancer IPs
    kubectl get svc argocd-server -n argocd
    kubectl get svc traefik -n orch-gateway
    kubectl get svc ingress-nginx-controller -n orch-boots
    
    # Verify LB IP configuration are updated
    nano repo_archives/tmp/edge-manageability-framework/orch-configs/clusters/onprem.yaml
    
  3. Ensure all configurations are correct

Step 6: Terminal 1 - Confirm and Continue#

Once proxy and load balancer configurations are updated in Terminal 2, switch back to Terminal 1 and enter:

yes

The upgrade will then proceed automatically through all components.

Step 7: Monitor Upgrade Progress#

The upgrade process includes:

  • OS Configuration upgrade

  • Gitea upgrade

  • ArgoCD upgrade

  • Edge Orchestrator upgrade

  • Unseal Vault

Post-Upgrade Verification#

System Health Check#

# Verify package versions
dpkg -l | grep onprem-

# Check cluster status
kubectl get nodes
kubectl get pods -A

# Verify ArgoCD applications
kubectl get applications -A

Service Validation#

  • Watch ArgoCD applications until they are in ‘Healthy’ state

Web UI Access Verification#

After successful EMF upgrade, verify you can access the web UI with the same project/user/credentials used in before upgrade.

ArgoCD#

  • Username: admin

  • Retrieve argocd password:

    kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
    

Gitea#

  • Retrieve Gitea username:

    kubectl get secret gitea-cred -n gitea -o jsonpath="{.data.username}" | base64 -d
    
  • Reset Gitea password

    # Get Gitea pod name
    GITEA_POD=$(kubectl get pods -n gitea -l app=gitea -o jsonpath='{.items[0].metadata.name}')
    
    # Reset password (replace 'test12345' with your desired password)
    kubectl exec -n gitea $GITEA_POD -- \
      bash -c 'export GITEAPASSWORD=test12345 && gitea admin user change-password --username gitea_admin --password $GITEAPASSWORD'
    
  • Login to Gitea web UI:

    kubectl -n gitea port-forward svc/gitea-http 3000:443 --address 0.0.0.0
    # Then open https://localhost:3000 in your browser and use the above credentials.
    

Troubleshooting#

Symptom: Sometimes the infra-managers application in ArgoCD may show as Not Healthy or Out of Sync. This can impact dependent components or cluster state.

Resolution Steps:

  1. Delete the application from ArgoCD: and resync reoo-app

During the onprem_upgrade, if Vault appears sealed or becomes unavailable, manual intervention may be required.

Symptom:

  • Vault Unseal Problem

    Vault pod status shows sealed, causing issues with secret access or platform services. After running the on-prem upgrade script, if you see the following vault waiting output: then further vault unseal require

    Deleting Vault pod: vault-0 in namespace: orch-platform
    pod "vault-0" deleted
    Waiting for pod 'vault-0' in namespace 'orch-platform' to be in Running state...
    
  • Check Vault status

    kubectl get pod -A | grep vault-0
    kubectl -n orch-platform exec -i vault-0 -- vault status
    
  • Vault Unseal Procedure

    # Run the Vault unseal script
    source ./vault_unseal.sh
    vault_unseal
    

Open Issues:#

API Gateway does not reflect API changes from v1 to v2 automatically Workaround: Manually delete the nexus-api-gw pod to recover API changes.

After upgrade, both RKE2 and K3s Cluster Templates are labeled as default Workaround: Manually delete all old cluster templates related to 3.0 release RKE2 base.

Deployment package extensions are not updated after upgrade Workaround: Manually delete the app-orch-tenant-controller pod.

Automation Script for Workarounds#

To simplify post-upgrade recovery, the following script should be executed as part of the upgrade validation steps:

Script Name: after_upgrade_restart.sh Purpose: Automates the following workaround actions: - Restarts the nexus-api-gw pod to reflect API changes from v1 to v2 - Deletes outdated RKE2-based cluster templates from the 3.0 release - Restarts the app-orch-tenant-controller pod to trigger deployment extension updates

Note

Run the script after the on-prem upgrade using:

./after_upgrade_restart.sh

EdgeNode local SSH connection error RPS pod Postgres DB query failure Host filter in the UI is not functioning correctly Docker rate limit encountered despite using valid credentials

Post-Upgrade Steps EdgeNode onboarding process#

After a successful upgrade, follow the EN onboarding process as outlined in the official documentation: Set Up Edge Infrastructure – Intel Open Edge Platform