# Enabling Intel HW Key Generation & Cryptography with OpenSSL ## Overview Intel x86 processors expose dedicated hardware instructions for cryptographic key generation and symmetric cipher acceleration. OpenSSL (version 3.x) transparently maps these x86 instructions through its standard APIs, so your application gains hardware-backed security and performance with no code changes beyond using the normal OpenSSL calls. The three x86 instruction groups leveraged are: - **RDRAND / RDSEED** — x86 instructions that read entropy directly from the on-chip hardware random number generator; used internally by OpenSSL for all key material generation - **AES-NI** — x86 instructions (`AESENC`, `AESDEC`, `AESKEYGENASSIST`, etc.) that execute AES rounds in a single CPU cycle, accelerating encryption and decryption - **SHA-NI** — x86 instructions (`SHA256RNDS2`, `SHA1RNDS4`, etc.) that perform SHA compression function steps in hardware, accelerating hashing OpenSSL probes CPU feature flags at startup (via CPUID) and automatically routes all relevant operations through these instructions when present. No application-level configuration is required. ## Validation Status - Successfully validated on an Intel Arrow Lake system (Ubuntu, OpenSSL 3.x) - OpenSSL queries the CPU at startup using the `CPUID` instruction to detect which x86 hardware features (RDRAND, RDSEED, AES-NI, SHA-NI) are present, then selects the hardware-accelerated code path automatically ## Step 1: Check Platform Support Before integrating OpenSSL, confirm that the required x86 hardware instructions are exposed by your CPU. Run the provided HW Detection script — it probes RDRAND, RDSEED, AES-NI, and SHA-NI support via `/proc/cpuinfo` flags and a compiled CPUID inline-asm program: ```bash #!/usr/bin/env bash # Copyright (C) 2026 Intel Corporation # SPDX-License-Identifier: Apache-2.0 # # detect_hw_features.sh — Detect RDRAND, RDSEED, AES-NI and SHA-NI support # via four independent methods: # 1. /proc/cpuinfo kernel flag strings # 2. CPUID inline-asm probe (compiled on-the-fly) # 3. OpenSSL's own capability view (OPENSSL_ia32cap / openssl speed) # 4. GCC compiler flag acceptance # # Usage: bash detect_hw_features.sh # No root privileges required. set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" TMP_DIR="$(mktemp -d)" trap 'rm -rf "$TMP_DIR"' EXIT # ── Colour helpers ──────────────────────────────────────────────────── RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m' CYAN='\033[0;36m'; BOLD='\033[1m'; NC='\033[0m' pass() { echo -e " ${GREEN}[YES]${NC} $*"; } fail() { echo -e " ${RED}[NO] ${NC} $*"; } warn() { echo -e " ${YELLOW}[WARN]${NC} $*"; } header() { echo -e "\n${BOLD}${CYAN}$*${NC}"; echo "$(echo "$*" | sed 's/./-/g')"; } # ── Feature list ────────────────────────────────────────────────────── # Each entry: FLAG_NAME | /proc/cpuinfo substring | CPUID leaf | reg | bit | description FEATURES=( "RDRAND|rdrand|1|ECX|30|HW DRBG output (CPUID.1:ECX[30])" "RDSEED|rdseed|7|EBX|18|Raw entropy seed (CPUID.7:EBX[18])" "AES-NI|aes |1|ECX|25|AES acceleration (CPUID.1:ECX[25])" "SHA-NI|sha_ni |7|EBX|29|SHA-NI accel. (CPUID.7:EBX[29])" ) # ── Print system info ───────────────────────────────────────────────── echo "" echo -e "${BOLD}Hardware Feature Detection${NC}" echo -e "System : $(uname -srm)" echo -e "CPU : $(grep -m1 'model name' /proc/cpuinfo | cut -d: -f2 | sed 's/^ *//')" # ═══════════════════════════════════════════════════════════════════════ # METHOD 1 — /proc/cpuinfo kernel flag strings # ═══════════════════════════════════════════════════════════════════════ header "Method 1 — /proc/cpuinfo kernel flags" printf " %-10s %-12s %s\n" "Feature" "Status" "Details" printf " %-10s %-12s %s\n" "-------" "------" "-------" CPUINFO_RESULTS=() for entry in "${FEATURES[@]}"; do IFS='|' read -r name flag leaf reg bit desc <<< "$entry" flag_trim="${flag// /}" if grep -qw "$flag_trim" /proc/cpuinfo 2>/dev/null; then printf " ${GREEN}%-10s [YES]${NC} %s\n" "$name" "$desc" CPUINFO_RESULTS+=("$name:1") else printf " ${RED}%-10s [NO] ${NC} %s\n" "$name" "$desc" CPUINFO_RESULTS+=("$name:0") fi done # ═══════════════════════════════════════════════════════════════════════ # METHOD 2 — CPUID inline-asm probe (compiled C program) # ═══════════════════════════════════════════════════════════════════════ header "Method 2 — CPUID inline-asm probe (compiled C)" CPUID_SRC="$TMP_DIR/cpuid_probe.c" CPUID_BIN="$TMP_DIR/cpuid_probe" cat > "$CPUID_SRC" << 'EOF' #include #include int main(void) { unsigned int eax, ebx, ecx, edx; /* ── Leaf 1 ── */ int leaf1_ok = __get_cpuid(1, &eax, &ebx, &ecx, &edx); int rdrand = leaf1_ok ? ((ecx >> 30) & 1) : -1; /* ECX[30] */ int aesni = leaf1_ok ? ((ecx >> 25) & 1) : -1; /* ECX[25] */ /* ── Leaf 7 subleaf 0 ── */ int leaf7_ok = __get_cpuid_count(7, 0, &eax, &ebx, &ecx, &edx); int rdseed = leaf7_ok ? ((ebx >> 18) & 1) : -1; /* EBX[18] */ int shani = leaf7_ok ? ((ebx >> 29) & 1) : -1; /* EBX[29] */ printf("RDRAND=%d\n", rdrand); printf("RDSEED=%d\n", rdseed); printf("AESNI=%d\n", aesni); printf("SHANI=%d\n", shani); return 0; } EOF if ! command -v gcc &>/dev/null; then warn "gcc not found — skipping CPUID inline-asm probe." elif ! gcc -O0 "$CPUID_SRC" -o "$CPUID_BIN" 2>/dev/null; then warn "CPUID probe failed to compile — skipping." else CPUID_OUT=$("$CPUID_BIN") declare -A CPUID_MAP while IFS='=' read -r key val; do CPUID_MAP["$key"]="$val" done <<< "$CPUID_OUT" # Map output keys to display names and CPUID references declare -A DISP=([RDRAND]="RDRAND CPUID.1:ECX[30]" [RDSEED]="RDSEED CPUID.7:EBX[18]" [AESNI]="AES-NI CPUID.1:ECX[25]" [SHANI]="SHA-NI CPUID.7:EBX[29]") printf " %-10s %-12s %s\n" "Feature" "Status" "CPUID Reference" printf " %-10s %-12s %s\n" "-------" "------" "---------------" for key in RDRAND RDSEED AESNI SHANI; do val="${CPUID_MAP[$key]:-?}" ref="${DISP[$key]}" case "$val" in 1) printf " ${GREEN}%-10s [YES]${NC} %s\n" "${key/-NI/}" "$ref" ;; 0) printf " ${RED}%-10s [NO] ${NC} %s\n" "${key/-NI/}" "$ref" ;; *) printf " ${YELLOW}%-10s [ERR]${NC} %s (leaf unavailable)\n" "${key/-NI/}" "$ref" ;; esac done fi # ═══════════════════════════════════════════════════════════════════════ # Summary # ═══════════════════════════════════════════════════════════════════════ header "Summary" ALL_OK=1 for entry in "${CPUINFO_RESULTS[@]}"; do feat="${entry%%:*}" val="${entry##*:}" if [ "$val" -eq 1 ]; then pass "$feat : hardware support confirmed" else fail "$feat : NOT supported by this CPU" ALL_OK=0 fi done echo "" if [ "$ALL_OK" -eq 1 ]; then echo -e " ${GREEN}${BOLD}All four features present — hardware acceleration fully available.${NC}" else echo -e " ${YELLOW}${BOLD}One or more features missing — some demos will use software fallback.${NC}" fi echo "" ``` If the output shows **"YES"** for all four features, your CPU exposes the required x86 hardware instructions and OpenSSL will use them automatically. ## Step 2: Use OpenSSL in Your Application The sections below show how each x86 hardware instruction group is exercised through the standard OpenSSL API. In every case the hardware path is selected automatically by OpenSSL — the calling code is identical to a purely software implementation. ### Hardware Key Generation — RDRAND & RDSEED Use `RAND_bytes` to generate cryptographically secure random bytes for key material. Under the hood, OpenSSL calls the x86 **`RDRAND`** instruction (hardware DRBG output) and seeds its internal entropy pool via **`RDSEED`** (raw hardware entropy), giving you key bytes that never pass through a software PRNG: ```c #include unsigned char key[32]; /* 256-bit key */ if (RAND_bytes(key, sizeof(key)) != 1) { /* handle error */ } ``` When `RAND_bytes` is called, OpenSSL's entropy engine issues the x86 **`RDRAND`** instruction in a retry loop to collect conditioned DRBG output, and periodically reseeds using **`RDSEED`** to pull fresh raw entropy from the hardware source. The two instructions serve distinct roles: - **`RDRAND`** — returns a random value produced by the on-chip hardware DRBG (conditioned, ready to use as key material) - **`RDSEED`** — returns a raw entropy sample from the hardware source; intended for seeding software DRBGs rather than direct use No special setup or tuning is required — OpenSSL selects the hardware path automatically at runtime. ### Hardware AES Acceleration — AES-NI Intel AES-NI introduces x86 instructions (`AESENC`, `AESENCLAST`, `AESDEC`, `AESDECLAST`, `AESKEYGENASSIST`, `AESIMC`) that execute a full AES round in a single low-latency CPU instruction, eliminating the software lookup-table approach. OpenSSL's EVP layer automatically dispatches to these instructions when AES-NI is present. The example below performs AES-256-GCM encryption — every `EVP_EncryptUpdate` call internally issues `AESENC`/`AESENCLAST` instructions rather than software AES rounds: ```c #include unsigned char ciphertext[128]; unsigned char tag[16]; int len, ciphertext_len; /* 1. Allocate and initialise context */ EVP_CIPHER_CTX *ctx = EVP_CIPHER_CTX_new(); /* 2. Initialise encryption operation: AES-256-GCM */ EVP_EncryptInit_ex(ctx, EVP_aes_256_gcm(), NULL, NULL, NULL); /* 3. Set IV length (default 12 bytes for GCM) */ EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_GCM_SET_IVLEN, 12, NULL); /* 4. Provide key and IV */ EVP_EncryptInit_ex(ctx, NULL, NULL, key, iv); /* 5. Encrypt plaintext — may be called multiple times */ EVP_EncryptUpdate(ctx, ciphertext, &len, plaintext, plaintext_len); ciphertext_len = len; /* 6. Finalise encryption (flushes any remaining output) */ EVP_EncryptFinal_ex(ctx, ciphertext + len, &len); ciphertext_len += len; /* 7. Retrieve the GCM authentication tag */ EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_GCM_GET_TAG, 16, tag); /* 8. Free the context */ EVP_CIPHER_CTX_free(ctx); ``` At startup, OpenSSL checks the `CPUID.1:ECX[25]` flag to confirm AES-NI availability, then permanently redirects all AES encrypt/decrypt operations — including key schedule expansion — through the x86 AES-NI instruction set. No special configuration is required. ### Hardware SHA Acceleration — SHA-NI Intel SHA-NI introduces x86 instructions that execute SHA compression function steps in hardware, replacing the multi-instruction software implementations. SHA-NI covers only the 32-bit word SHA family; 64-bit word variants (SHA-384, SHA-512) fall back to software: | Algorithm | SHA-NI Accelerated | Notes | |-----------|-------------------|-------| | SHA-1 | Yes | Accelerated via `SHA1RNDS4`, `SHA1NEXTE`, `SHA1MSG1/2` instructions | | SHA-224 | Yes | Uses the SHA-256 hardware pipeline internally | | SHA-256 | Yes | Accelerated via `SHA256RNDS2`, `SHA256MSG1/2` instructions | | SHA-384 | No | Software fallback (64-bit word size not covered by SHA-NI) | | SHA-512 | No | Software fallback (64-bit word size not covered by SHA-NI) | OpenSSL checks `CPUID.7:EBX[29]` at startup to detect SHA-NI and, when present, replaces its SHA-1 and SHA-256 compression rounds with the corresponding x86 instructions. The example below computes a SHA-256 digest — every `EVP_DigestUpdate` call internally issues `SHA256RNDS2` and `SHA256MSG1/2` instructions rather than software rounds: ```c #include unsigned char digest[EVP_MAX_MD_SIZE]; unsigned int digest_len; /* 1. Allocate and initialise context */ EVP_MD_CTX *ctx = EVP_MD_CTX_new(); /* 2. Initialise digest operation: SHA-256 (SHA-NI accelerated) */ EVP_DigestInit_ex(ctx, EVP_sha256(), NULL); /* 3. Feed data — may be called multiple times for streaming input */ EVP_DigestUpdate(ctx, data, data_len); /* 4. Finalise and retrieve the digest */ EVP_DigestFinal_ex(ctx, digest, &digest_len); /* 5. Free the context */ EVP_MD_CTX_free(ctx); ``` To use SHA-1 or SHA-224 instead, replace `EVP_sha256()` with `EVP_sha1()` or `EVP_sha224()` respectively — both are hardware-accelerated via the `SHA1RNDS4` / `SHA1NEXTE` / `SHA1MSG1/2` x86 instructions when SHA-NI is present. OpenSSL selects the hardware or software SHA path once at startup based on CPUID results. No application-level changes are needed — the same EVP API calls automatically use x86 SHA-NI instructions on supported hardware.