# Critical Importance of Core Pinning on Intel Edge Platforms Today's Intel edge processors are designed around a fundamental principle: **power is a shared, finite resource**. The processor's total power budget (package power) is dynamically distributed between: **CPU cores** (P-cores, E-cores, and LPE-cores), **Uncore components** including the **GPU**, **NPU** (Neural Processing Unit), and **Memory controllers and I/O**. **With proper core pinning**, you can precisely control which cores are active, preventing the operating system's default scheduler from spreading your application across all available cores, for example, when a single-threaded application wakes up multiple cores unnecessarily. Core pinning mitigates such problems as: - **Increased power consumption** from activating unnecessary cores. - **Reduced turbo frequencies** as the processor throttles to stay within thermal limits. - **GPU and NPU power starvation** when CPU cores consume the bulk of the power budget. - **Cache pollution and memory bandwidth contention** from thread migration. On modern hybrid processors like Intel's Arrow Lake, Lunar Lake, and Panther Lake platforms, the effect of core pinning is even more pronounced. - **Heterogeneous core types**: P-cores (Performance), E-cores (Efficient), and LPE-cores (Low Power Efficient) have drastically different power and performance characteristics. - **Integrated accelerators**: GPU and NPU share the same package power budget with CPU cores. - **AI workloads**: Vision inference, video analytics, and ML pipelines often combine CPU, GPU, and NPU—power competition becomes critical. ## Tools for Detection and Monitoring ### 1. Detecting Core Types: obtain_cores.sh Intel provides a comprehensive script to detect and enumerate core types on hybrid Intel platforms. The script is available in the [edge-workloads-and-benchmarks](https://github.com/open-edge-platform/edge-workloads-and-benchmarks) repository. **Location:** [`utils/obtain_cores.sh`](https://github.com/open-edge-platform/edge-workloads-and-benchmarks/blob/main/utils/obtain_cores.sh) **Usage:** ```bash cd utils/ ./obtain_cores.sh ``` **Example output:** ``` pcore:0,1,2,3,4,5,6,7 ecore:8,9,10,11,12,13,14,15,16,17,18,19,20,21 lpecore:22,23,24,25 ``` This script uses multiple detection methods with fallbacks: 1. **Multi-socket Xeon detection** — assigns all cores as P-cores on server platforms. 2. **CPUID-based detection** — uses the `cpuid` instruction to identify Intel Core vs Intel Atom cores. 3. **sysfs validation** — reads `/sys/devices/cpu_core/`, `/sys/devices/cpu_atom/`, `/sys/devices/cpu_lowpower/`. 4. **L1d cache drop detection** — identifies E-cores by detecting cache size transitions. 5. **SMT pair detection** — classifies remaining cores based on hyperthreading topology. The script outputs comma-separated core IDs for each type, which can be directly used with the `taskset` command to pin workloads. **Pinning examples:** ```bash # Pin to P-cores only (for latency-sensitive workloads) taskset -c 0,1,2,3,4,5,6,7 ./your_application # Pin to E-cores only (for throughput workloads) taskset -c 8,9,10,11,12,13,14,15,16,17,18,19,20,21 ./your_application # Pin to LPE-cores (for background tasks) taskset -c 22,23,24,25 ./background_service ``` For full documentation and usage examples, see the [utils README](https://github.com/open-edge-platform/edge-workloads-and-benchmarks/blob/main/utils/README.md). ### 2. Power Monitoring Tools To verify that core pinning is actually improving your power efficiency and performance, you need to monitor power consumption across all compute resources. #### get_package_power.sh — Package Power Monitoring via RAPL and hwmon Intel provides a dedicated script to sample platform package power consumption using RAPL (Running Average Power Limit) sysfs interfaces and hardware monitoring sensors. The script is available in the [edge-workloads-and-benchmarks](https://github.com/open-edge-platform/edge-workloads-and-benchmarks) repository. **Location:** [`utils/get_package_power.sh`](https://github.com/open-edge-platform/edge-workloads-and-benchmarks/blob/main/utils/get_package_power.sh) **Usage:** ```bash sudo ./get_package_power.sh -i -s -d ``` **Options:** - `-s ` — Sampling interval in seconds (default: 1) - `-i ` — Total duration in seconds (default: 60) - `-d ` — Start delay in seconds (default: 0) **Example:** ```bash # Measure package power for 60 seconds at 1-second intervals sudo ./get_package_power.sh -i 60 -s 1 # Measure for 30 seconds with a 5-second delay before starting sudo ./get_package_power.sh -i 30 -s 1 -d 5 ``` **Example output:** ``` [ Info ] Monitoring for 60s after a 0s delay [rapl] card0 (xe @ 0000:00:02.0): 15.23 W [rapl] card0 (xe @ 0000:00:02.0): 14.87 W [rapl] card0 (xe @ 0000:00:02.0): 15.45 W ... [ Info ] Monitoring complete ``` **Output format:** ``` [source] card# (driver @ pci): power W ``` - **source**: Either `rapl` (RAPL energy counters) or `hwmon` (hardware monitoring sensors) - **card#**: DRM card identifier (e.g., `card0`) - **driver**: Graphics driver (`i915` or `xe`) - **pci**: PCI device address - **power**: Instantaneous power consumption in watts **How it works:** 1. **Discovers Intel graphics devices** via `/sys/class/drm/card*/` (i915 or xe drivers) 2. **Detects power sensors** using either: - Hardware monitoring sensors (`hwmon`) with package/card energy or power labels - RAPL energy counters (`/sys/class/powercap/intel-rapl:*/energy_uj`) 3. **Samples power** by reading energy counters at the start and end of each interval, computing power as: ``` power (W) = (end_energy - start_energy) / interval_duration ``` **Use cases:** - **Compare before/after core pinning**: Run the script during your workload with and without core pinning to quantify power savings - **Monitor GPU power availability**: Check if freeing up CPU cores allows the GPU to consume more power (higher frequency) - **Long-term profiling**: Use with longer durations to understand power patterns over time **Example workflow:** ```bash # Baseline: workload without core pinning sudo ./get_package_power.sh -i 60 -s 1 & ./my_workload # Optimized: workload with E-core pinning sudo ./get_package_power.sh -i 60 -s 1 & taskset -c 8-21 ./my_workload ``` For full documentation, see the [utils README](https://github.com/open-edge-platform/edge-workloads-and-benchmarks/blob/main/utils/README.md). #### turbostat — Detailed Package, Core, and Graphics Power Linux's built-in `turbostat` utility provides more granular power breakdowns: ```bash sudo turbostat --interval 1 ``` **Key metrics to watch:** - **PkgWatt**: Total package power (CPU + GPU + uncore) - **CorWatt**: Power consumed by CPU cores only - **GFXWatt**: Graphics (GPU) power consumption - **RAMWatt**: DRAM power Example output snippet: ``` Core CPU Avg_MHz Busy% Bzy_MHz PkgWatt CorWatt GFXWatt - - 2100 50.0 4200 25.0 15.0 5.0 0 0 4200 100.0 4200 1 1 4200 100.0 4200 ``` By comparing power metrics **before and after core pinning**, you can quantify the impact on package power and GPU power availability. Use `get_package_power.sh` for simple package-level measurements, and `turbostat` when you need detailed per-core and component-level breakdowns. #### npu-monitor-tool.py — NPU Power and Utilization For workloads using the Intel NPU (Neural Processing Unit), monitor NPU-specific metrics using the NPU monitoring tool from the [edge-ai-libraries](https://github.com/open-edge-platform/edge-ai-libraries) repository. **Location:** [`tools/npu-monitor-tool/npu-monitor-tool.py`](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/tools/npu-monitor-tool/npu-monitor-tool.py) **Usage:** ```bash sudo python3 npu-monitor-tool.py -i 1000 ``` **Example output:** ``` +-----------------------------------------------------------------------------------------------+ | INTEL NPU Device: 0x7d1d | version: 1.0.0 | | Firmware version: IVPU_MTL_20240112_v2024.01 | +===============================================================================================+ | Power Usage | DPU Freq | NPU DDR Average Bandwidth | Tile Conf | | 2.5 [W] | 1400 [Hz] | 123.45 [MB/s] | 4 | +===============================================================================================+ | NPU Temperature | NPU Utilization | Memory Usage | | 45 [°C] | 25% | 512.00 [MB] | +-----------------------------------------------------------------------------------------------+ ``` **CSV export** is available for long-term analysis: ```bash sudo python3 npu-monitor-tool.py --csv -i 1000 ``` This generates timestamped CSV files in `npu_output/` with the following columns: `timestamp`, `power`, `frequency`, `bandwidth`, `tile_config`, `temperature`, `utilization`, `memory_usage` For complete documentation, see the [npu-monitor-tool README](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/tools/npu-monitor-tool/README.md). ## Core Pinning with DL Streamer Pipeline Server For AI video analytics workloads using the [DL Streamer Pipeline Server](https://github.com/open-edge-platform/edge-ai-libraries/tree/main/microservices/dlstreamer-pipeline-server), Intel provides built-in support for core pinning via the **`CORE_PINNING` environment variable**. This eliminates the need to manually wrap the server with `taskset` and provides a declarative way to specify core affinity in Docker Compose or Kubernetes deployments. ### Using the CORE_PINNING Environment Variable The `CORE_PINNING` environment variable accepts two types of values: 1. **Explicit core list or range** (taskset-compatible syntax): - Comma-delimited list: `10,12,14` - Range: `10-14` - Range with step: `10-14/2` (cores 10, 12, 14) 2. **Core type specification** (automatic detection): - `p-cores` — Pin to Performance cores - `e-cores` — Pin to Efficient cores - `lp-cores` — Pin to Low Power Efficient cores The server automatically detects the appropriate cores using the same detection logic as `obtain_cores.sh` and applies `taskset` internally. ### Docker Compose Example ::::{tab-set} :::{tab-item} **Pin to P-cores for low-latency inference** ```yaml version: '3.8' services: dlstreamer-pipeline-server: image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22 environment: CORE_PINNING: p-cores devices: - /dev/dri:/dev/dri ports: - "8080:8080" volumes: - ./pipelines:/home/pipeline-server/pipelines ``` ::: :::{tab-item} **Pin to E-cores for high-throughput batch processing** ```yaml version: '3.8' services: dlstreamer-pipeline-server: image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22 environment: CORE_PINNING: e-cores devices: - /dev/dri:/dev/dri ports: - "8080:8080" volumes: - ./pipelines:/home/pipeline-server/pipelines ``` ::: :::{tab-item} **Pin to specific cores (manual control)** ```yaml version: '3.8' services: dlstreamer-pipeline-server: image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22 environment: CORE_PINNING: "8-15" # E-cores 8 through 15 devices: - /dev/dri:/dev/dri ports: - "8080:8080" volumes: - ./pipelines:/home/pipeline-server/pipelines ``` ::: :::: ### Kubernetes Example For Kubernetes deployments, set the environment variable in the pod spec: ```yaml apiVersion: v1 kind: Pod metadata: name: dlstreamer-pipeline-server spec: containers: - name: dlstreamer image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22 env: - name: CORE_PINNING value: "p-cores" resources: limits: gpu.intel.com/i915: 1 ``` ### CORE_PINNING vs Manual taskset **Recommendation:** Use `CORE_PINNING` for DL Streamer Pipeline Server deployments to simplify configuration and enable portable deployments across different platforms. | Approach | Pros | Cons | |----------------------------|------|------| | **`CORE_PINNING` env var** | Declarative, container-native, works in Docker Compose/K8s, automatic core detection | Specific to DL Streamer Pipeline Server | | **Manual `taskset`** | Universal (works with any application), explicit control | Requires shell wrapper, harder to manage in orchestration, manual core discovery | ### Combining Core Pinning with GPU/NPU Offload A common optimization pattern for AI pipelines: 1. **Pin the Pipeline Server to E-cores** — reduces CPU power consumption 2. **Offload inference to GPU or NPU** — leaves more power budget for accelerators 3. **Monitor power distribution** — verify GPU/NPU frequencies increase **Example Docker Compose with GPU + E-core pinning:** ```yaml version: '3.8' services: dlstreamer-pipeline-server: image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22 environment: CORE_PINNING: e-cores DEVICE: GPU # Offload inference to GPU devices: - /dev/dri:/dev/dri ports: - "8080:8080" volumes: - ./pipelines:/home/pipeline-server/pipelines ``` **Verify the optimization:** ```bash # Terminal 1: Monitor package power sudo ./get_package_power.sh -i 120 -s 1 # Terminal 2: Start the pipeline server docker-compose up # Terminal 3: Run a pipeline curl -X POST http://localhost:8080/pipelines/object_detection/1 ``` Check that package power decreases while GPU power (visible in `turbostat` GFXWatt) increases or remains stable. For complete documentation on DL Streamer Pipeline Server core pinning, see the [official guide](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/dlstreamer-pipeline-server/docs/user-guide/advanced-guide/detailed_usage/how-to-advanced/performance/core-pinning.md). ## Recommendations: Which Cores to Pin? The optimal core pinning strategy depends on your workload characteristics: - E-cores for throughput workloads. - P-cores for latency-constrained workloads. - LPE-cores for background tasks. ::::{tab-set} :::{tab-item} **E-cores** **Use E-cores when:** - Your workload is parallelizable and scales with core count - Throughput (tasks/second) matters more than individual task latency - You want to leave more power budget for GPU/NPU - Examples: video encoding, batch inference, data processing pipelines **Why E-cores?** - More E-cores are available (typically 2-3× the number of P-cores) - Lower per-core power consumption allows more cores to run simultaneously - Leaves power headroom for GPU and NPU to maintain high frequencies - Better aggregate throughput per watt **Example:** ```bash # Video transcoding pipeline on E-cores taskset -c 8-21 ffmpeg -i input.mp4 -c:v h264_vaapi -vf 'scale_vaapi=1920:1080' output.mp4 # Batch inference on NPU with E-cores handling preprocessing taskset -c 8-21 python batch_inference.py --device NPU # DL Streamer Pipeline Server for high-throughput video analytics # (using CORE_PINNING environment variable) CORE_PINNING=e-cores docker-compose up ``` ::: :::{tab-item} **P-cores** **Use P-cores when:** - Low latency is critical (interactive applications, real-time control) - Single-threaded or lightly-threaded workloads - You need maximum per-thread performance - Examples: UI rendering, game engines, real-time analytics, control loops **Why P-cores?** - Higher per-core clock speeds (often 2× E-core frequency) - Larger caches (L2 and shared L3) - Better single-threaded performance for latency-critical paths - Ideal for "main thread" logic that orchestrates parallel work **Example:** ```bash # Real-time object detection with DL Streamer taskset -c 0-7 gst-launch-1.0 filesrc location=video.mp4 ! \ qtdemux ! h264parse ! vah264dec ! gvadetect model=yolov5.xml device=GPU ! \ gvafpscounter ! fakesink # Industrial control loop on P-cores taskset -c 0-3 ./motion_control_app --realtime # DL Streamer Pipeline Server for low-latency inference # (using CORE_PINNING environment variable) CORE_PINNING=p-cores docker-compose up ``` ::: :::{tab-item} **LPE-cores** **Use LPE-cores when:** - Tasks are low priority or non-latency-sensitive - You want to minimize interference with foreground workloads - Power efficiency is paramount - Examples: telemetry collection, logging, health checks **Example:** ```bash # Background telemetry agent on LPE-cores taskset -c 22-25 ./telemetry_agent --interval 5s # DL Streamer Pipeline Server for monitoring/logging pipelines # (using CORE_PINNING environment variable) CORE_PINNING=lp-cores docker-compose up ``` ::: :::: ## Summarry and Best Practices The tools provided in Intel's open-edge-platform repositories — `obtain_cores.sh` for core detection, `get_package_power.sh` for package power monitoring, and `npu-monitor-tool.py` for NPU monitoring, combined with `turbostat` for detailed power tracking — give you everything you need to implement effective core pinning strategies. For containerized AI workloads, the DL Streamer Pipeline Server's `CORE_PINNING` environment variable provides a declarative, orchestration-friendly way to apply core affinity. Here are some recommendations on how to proeed: 1. **Profile first, optimize second**: Use `get_package_power.sh`, `turbostat`, and `npu-monitor-tool.py` to establish baselines before pinning. 2. **Match workload to core type**: - Latency-sensitive → P-cores - Throughput-oriented → E-cores - Background tasks → LPE-cores 3. **Leave cores idle when possible**: Don't spread workloads across all cores. Idle cores consume minimal power and leave more budget for accelerators. 4. **Combine CPU pinning with GPU/NPU offload**: For AI pipelines, pin CPU preprocessing to E-cores and run inference on GPU/NPU. 5. **Use `CORE_PINNING` for containerized workloads**: When using DL Streamer Pipeline Server, prefer the `CORE_PINNING` environment variable over manual `taskset` wrappers. 6. **Monitor power distribution**: Verify that your pinning strategy increases GPU/NPU power availability: ```bash # Before pinning sudo ./get_package_power.sh -i 60 -s 1 > baseline.log & ./workload_no_pinning # After pinning sudo ./get_package_power.sh -i 60 -s 1 > optimized.log & taskset -c 8-15 ./workload_pinned # Compare results diff baseline.log optimized.log ``` 7. **Use CSV export for long-term analysis**: Collect metrics over hours or days to understand power trends and workload characteristics. --- ## Additional Resources - [Edge Workloads and Benchmarks Repository](https://github.com/open-edge-platform/edge-workloads-and-benchmarks) - [Edge AI Libraries Repository](https://github.com/open-edge-platform/edge-ai-libraries) - [DL Streamer Pipeline Server](https://github.com/open-edge-platform/edge-ai-libraries/tree/main/microservices/dlstreamer-pipeline-server) - [DL Streamer Pipeline Server Core Pinning Guide](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/dlstreamer-pipeline-server/docs/user-guide/advanced-guide/detailed_usage/how-to-advanced/performance/core-pinning.md) - [Intel Platform Monitoring Technology Specification](https://www.intel.com/content/www/us/en/content-details/710389/intel-platform-monitoring-technology-intel-pmt-technical-specification.html)