Enabling Intel HW Key Generation & Cryptography with OpenSSL#
Overview#
Intel x86 processors expose dedicated hardware instructions for cryptographic key generation and symmetric cipher acceleration. OpenSSL (version 3.x) transparently maps these x86 instructions through its standard APIs, so your application gains hardware-backed security and performance with no code changes beyond using the normal OpenSSL calls.
The three x86 instruction groups leveraged are:
RDRAND / RDSEED — x86 instructions that read entropy directly from the on-chip hardware random number generator; used internally by OpenSSL for all key material generation
AES-NI — x86 instructions (
AESENC,AESDEC,AESKEYGENASSIST, etc.) that execute AES rounds in a single CPU cycle, accelerating encryption and decryptionSHA-NI — x86 instructions (
SHA256RNDS2,SHA1RNDS4, etc.) that perform SHA compression function steps in hardware, accelerating hashing
OpenSSL probes CPU feature flags at startup (via CPUID) and automatically routes all relevant operations through these instructions when present. No application-level configuration is required.
Validation Status#
Successfully validated on an Intel Arrow Lake system (Ubuntu, OpenSSL 3.x)
OpenSSL queries the CPU at startup using the
CPUIDinstruction to detect which x86 hardware features (RDRAND, RDSEED, AES-NI, SHA-NI) are present, then selects the hardware-accelerated code path automatically
Step 1: Check Platform Support#
Before integrating OpenSSL, confirm that the required x86 hardware instructions are exposed by your CPU. Run the provided HW Detection script — it probes RDRAND, RDSEED, AES-NI, and SHA-NI support via /proc/cpuinfo flags and a compiled CPUID inline-asm program:
#!/usr/bin/env bash
# Copyright (C) 2026 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#
# detect_hw_features.sh — Detect RDRAND, RDSEED, AES-NI and SHA-NI support
# via four independent methods:
# 1. /proc/cpuinfo kernel flag strings
# 2. CPUID inline-asm probe (compiled on-the-fly)
# 3. OpenSSL's own capability view (OPENSSL_ia32cap / openssl speed)
# 4. GCC compiler flag acceptance
#
# Usage: bash detect_hw_features.sh
# No root privileges required.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
TMP_DIR="$(mktemp -d)"
trap 'rm -rf "$TMP_DIR"' EXIT
# ── Colour helpers ────────────────────────────────────────────────────
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'
CYAN='\033[0;36m'; BOLD='\033[1m'; NC='\033[0m'
pass() { echo -e " ${GREEN}[YES]${NC} $*"; }
fail() { echo -e " ${RED}[NO] ${NC} $*"; }
warn() { echo -e " ${YELLOW}[WARN]${NC} $*"; }
header() { echo -e "\n${BOLD}${CYAN}$*${NC}"; echo "$(echo "$*" | sed 's/./-/g')"; }
# ── Feature list ──────────────────────────────────────────────────────
# Each entry: FLAG_NAME | /proc/cpuinfo substring | CPUID leaf | reg | bit | description
FEATURES=(
"RDRAND|rdrand|1|ECX|30|HW DRBG output (CPUID.1:ECX[30])"
"RDSEED|rdseed|7|EBX|18|Raw entropy seed (CPUID.7:EBX[18])"
"AES-NI|aes |1|ECX|25|AES acceleration (CPUID.1:ECX[25])"
"SHA-NI|sha_ni |7|EBX|29|SHA-NI accel. (CPUID.7:EBX[29])"
)
# ── Print system info ─────────────────────────────────────────────────
echo ""
echo -e "${BOLD}Hardware Feature Detection${NC}"
echo -e "System : $(uname -srm)"
echo -e "CPU : $(grep -m1 'model name' /proc/cpuinfo | cut -d: -f2 | sed 's/^ *//')"
# ═══════════════════════════════════════════════════════════════════════
# METHOD 1 — /proc/cpuinfo kernel flag strings
# ═══════════════════════════════════════════════════════════════════════
header "Method 1 — /proc/cpuinfo kernel flags"
printf " %-10s %-12s %s\n" "Feature" "Status" "Details"
printf " %-10s %-12s %s\n" "-------" "------" "-------"
CPUINFO_RESULTS=()
for entry in "${FEATURES[@]}"; do
IFS='|' read -r name flag leaf reg bit desc <<< "$entry"
flag_trim="${flag// /}"
if grep -qw "$flag_trim" /proc/cpuinfo 2>/dev/null; then
printf " ${GREEN}%-10s [YES]${NC} %s\n" "$name" "$desc"
CPUINFO_RESULTS+=("$name:1")
else
printf " ${RED}%-10s [NO] ${NC} %s\n" "$name" "$desc"
CPUINFO_RESULTS+=("$name:0")
fi
done
# ═══════════════════════════════════════════════════════════════════════
# METHOD 2 — CPUID inline-asm probe (compiled C program)
# ═══════════════════════════════════════════════════════════════════════
header "Method 2 — CPUID inline-asm probe (compiled C)"
CPUID_SRC="$TMP_DIR/cpuid_probe.c"
CPUID_BIN="$TMP_DIR/cpuid_probe"
cat > "$CPUID_SRC" << 'EOF'
#include <stdio.h>
#include <cpuid.h>
int main(void)
{
unsigned int eax, ebx, ecx, edx;
/* ── Leaf 1 ── */
int leaf1_ok = __get_cpuid(1, &eax, &ebx, &ecx, &edx);
int rdrand = leaf1_ok ? ((ecx >> 30) & 1) : -1; /* ECX[30] */
int aesni = leaf1_ok ? ((ecx >> 25) & 1) : -1; /* ECX[25] */
/* ── Leaf 7 subleaf 0 ── */
int leaf7_ok = __get_cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
int rdseed = leaf7_ok ? ((ebx >> 18) & 1) : -1; /* EBX[18] */
int shani = leaf7_ok ? ((ebx >> 29) & 1) : -1; /* EBX[29] */
printf("RDRAND=%d\n", rdrand);
printf("RDSEED=%d\n", rdseed);
printf("AESNI=%d\n", aesni);
printf("SHANI=%d\n", shani);
return 0;
}
EOF
if ! command -v gcc &>/dev/null; then
warn "gcc not found — skipping CPUID inline-asm probe."
elif ! gcc -O0 "$CPUID_SRC" -o "$CPUID_BIN" 2>/dev/null; then
warn "CPUID probe failed to compile — skipping."
else
CPUID_OUT=$("$CPUID_BIN")
declare -A CPUID_MAP
while IFS='=' read -r key val; do
CPUID_MAP["$key"]="$val"
done <<< "$CPUID_OUT"
# Map output keys to display names and CPUID references
declare -A DISP=([RDRAND]="RDRAND CPUID.1:ECX[30]"
[RDSEED]="RDSEED CPUID.7:EBX[18]"
[AESNI]="AES-NI CPUID.1:ECX[25]"
[SHANI]="SHA-NI CPUID.7:EBX[29]")
printf " %-10s %-12s %s\n" "Feature" "Status" "CPUID Reference"
printf " %-10s %-12s %s\n" "-------" "------" "---------------"
for key in RDRAND RDSEED AESNI SHANI; do
val="${CPUID_MAP[$key]:-?}"
ref="${DISP[$key]}"
case "$val" in
1) printf " ${GREEN}%-10s [YES]${NC} %s\n" "${key/-NI/}" "$ref" ;;
0) printf " ${RED}%-10s [NO] ${NC} %s\n" "${key/-NI/}" "$ref" ;;
*) printf " ${YELLOW}%-10s [ERR]${NC} %s (leaf unavailable)\n" "${key/-NI/}" "$ref" ;;
esac
done
fi
# ═══════════════════════════════════════════════════════════════════════
# Summary
# ═══════════════════════════════════════════════════════════════════════
header "Summary"
ALL_OK=1
for entry in "${CPUINFO_RESULTS[@]}"; do
feat="${entry%%:*}"
val="${entry##*:}"
if [ "$val" -eq 1 ]; then
pass "$feat : hardware support confirmed"
else
fail "$feat : NOT supported by this CPU"
ALL_OK=0
fi
done
echo ""
if [ "$ALL_OK" -eq 1 ]; then
echo -e " ${GREEN}${BOLD}All four features present — hardware acceleration fully available.${NC}"
else
echo -e " ${YELLOW}${BOLD}One or more features missing — some demos will use software fallback.${NC}"
fi
echo ""
If the output shows “YES” for all four features, your CPU exposes the required x86 hardware instructions and OpenSSL will use them automatically.
Step 2: Use OpenSSL in Your Application#
The sections below show how each x86 hardware instruction group is exercised through the standard OpenSSL API. In every case the hardware path is selected automatically by OpenSSL — the calling code is identical to a purely software implementation.
Hardware Key Generation — RDRAND & RDSEED#
Use RAND_bytes to generate cryptographically secure random bytes for key material. Under the hood, OpenSSL calls the x86 RDRAND instruction (hardware DRBG output) and seeds its internal entropy pool via RDSEED (raw hardware entropy), giving you key bytes that never pass through a software PRNG:
#include <openssl/rand.h>
unsigned char key[32]; /* 256-bit key */
if (RAND_bytes(key, sizeof(key)) != 1) {
/* handle error */
}
When RAND_bytes is called, OpenSSL’s entropy engine issues the x86 RDRAND instruction in a retry loop to collect conditioned DRBG output, and periodically reseeds using RDSEED to pull fresh raw entropy from the hardware source. The two instructions serve distinct roles:
RDRAND— returns a random value produced by the on-chip hardware DRBG (conditioned, ready to use as key material)RDSEED— returns a raw entropy sample from the hardware source; intended for seeding software DRBGs rather than direct use
No special setup or tuning is required — OpenSSL selects the hardware path automatically at runtime.
Hardware AES Acceleration — AES-NI#
Intel AES-NI introduces x86 instructions (AESENC, AESENCLAST, AESDEC, AESDECLAST, AESKEYGENASSIST, AESIMC) that execute a full AES round in a single low-latency CPU instruction, eliminating the software lookup-table approach. OpenSSL’s EVP layer automatically dispatches to these instructions when AES-NI is present. The example below performs AES-256-GCM encryption — every EVP_EncryptUpdate call internally issues AESENC/AESENCLAST instructions rather than software AES rounds:
#include <openssl/evp.h>
unsigned char ciphertext[128];
unsigned char tag[16];
int len, ciphertext_len;
/* 1. Allocate and initialise context */
EVP_CIPHER_CTX *ctx = EVP_CIPHER_CTX_new();
/* 2. Initialise encryption operation: AES-256-GCM */
EVP_EncryptInit_ex(ctx, EVP_aes_256_gcm(), NULL, NULL, NULL);
/* 3. Set IV length (default 12 bytes for GCM) */
EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_GCM_SET_IVLEN, 12, NULL);
/* 4. Provide key and IV */
EVP_EncryptInit_ex(ctx, NULL, NULL, key, iv);
/* 5. Encrypt plaintext — may be called multiple times */
EVP_EncryptUpdate(ctx, ciphertext, &len, plaintext, plaintext_len);
ciphertext_len = len;
/* 6. Finalise encryption (flushes any remaining output) */
EVP_EncryptFinal_ex(ctx, ciphertext + len, &len);
ciphertext_len += len;
/* 7. Retrieve the GCM authentication tag */
EVP_CIPHER_CTX_ctrl(ctx, EVP_CTRL_GCM_GET_TAG, 16, tag);
/* 8. Free the context */
EVP_CIPHER_CTX_free(ctx);
At startup, OpenSSL checks the CPUID.1:ECX[25] flag to confirm AES-NI availability, then permanently redirects all AES encrypt/decrypt operations — including key schedule expansion — through the x86 AES-NI instruction set. No special configuration is required.
Hardware SHA Acceleration — SHA-NI#
Intel SHA-NI introduces x86 instructions that execute SHA compression function steps in hardware, replacing the multi-instruction software implementations. SHA-NI covers only the 32-bit word SHA family; 64-bit word variants (SHA-384, SHA-512) fall back to software:
Algorithm |
SHA-NI Accelerated |
Notes |
|---|---|---|
SHA-1 |
Yes |
Accelerated via |
SHA-224 |
Yes |
Uses the SHA-256 hardware pipeline internally |
SHA-256 |
Yes |
Accelerated via |
SHA-384 |
No |
Software fallback (64-bit word size not covered by SHA-NI) |
SHA-512 |
No |
Software fallback (64-bit word size not covered by SHA-NI) |
OpenSSL checks CPUID.7:EBX[29] at startup to detect SHA-NI and, when present, replaces its SHA-1 and SHA-256 compression rounds with the corresponding x86 instructions. The example below computes a SHA-256 digest — every EVP_DigestUpdate call internally issues SHA256RNDS2 and SHA256MSG1/2 instructions rather than software rounds:
#include <openssl/evp.h>
unsigned char digest[EVP_MAX_MD_SIZE];
unsigned int digest_len;
/* 1. Allocate and initialise context */
EVP_MD_CTX *ctx = EVP_MD_CTX_new();
/* 2. Initialise digest operation: SHA-256 (SHA-NI accelerated) */
EVP_DigestInit_ex(ctx, EVP_sha256(), NULL);
/* 3. Feed data — may be called multiple times for streaming input */
EVP_DigestUpdate(ctx, data, data_len);
/* 4. Finalise and retrieve the digest */
EVP_DigestFinal_ex(ctx, digest, &digest_len);
/* 5. Free the context */
EVP_MD_CTX_free(ctx);
To use SHA-1 or SHA-224 instead, replace EVP_sha256() with EVP_sha1() or EVP_sha224() respectively — both are hardware-accelerated via the SHA1RNDS4 / SHA1NEXTE / SHA1MSG1/2 x86 instructions when SHA-NI is present.
OpenSSL selects the hardware or software SHA path once at startup based on CPUID results. No application-level changes are needed — the same EVP API calls automatically use x86 SHA-NI instructions on supported hardware.