# Critical Importance of Core Pinning on Intel Edge Platforms

Today's Intel edge processors are designed around a fundamental principle:
**power is a shared, finite resource**.
The processor's total power budget (package power) is dynamically distributed between:
**CPU cores** (P-cores, E-cores, and LPE-cores), **Uncore components** including the **GPU**,
**NPU** (Neural Processing Unit), and **Memory controllers and I/O**.

**With proper core pinning**, you can precisely control which cores are active, preventing
the operating system's default scheduler from spreading your application across all available
cores, for example, when a single-threaded application wakes up multiple cores unnecessarily.

Core pinning mitigates such problems as:

- **Increased power consumption** from activating unnecessary cores.
- **Reduced turbo frequencies** as the processor throttles to stay within thermal limits.
- **GPU and NPU power starvation** when CPU cores consume the bulk of the power budget.
- **Cache pollution and memory bandwidth contention** from thread migration.


On modern hybrid processors like Intel's Arrow Lake, Lunar Lake, and Panther Lake platforms,
the effect of core pinning is even more pronounced.

- **Heterogeneous core types**:
  P-cores (Performance), E-cores (Efficient), and LPE-cores (Low Power Efficient) have
  drastically different power and performance characteristics.
- **Integrated accelerators**:
  GPU and NPU share the same package power budget with CPU cores.
- **AI workloads**:
  Vision inference, video analytics, and ML pipelines often combine CPU, GPU, and NPU—power
  competition becomes critical.


## Tools for Detection and Monitoring

### 1. Detecting Core Types: obtain_cores.sh

Intel provides a comprehensive script to detect and enumerate core types on hybrid Intel platforms.
The script is available in the
[edge-workloads-and-benchmarks](https://github.com/open-edge-platform/edge-workloads-and-benchmarks)
repository.

**Location:**
[`utils/obtain_cores.sh`](https://github.com/open-edge-platform/edge-workloads-and-benchmarks/blob/main/utils/obtain_cores.sh)

**Usage:**

```bash
cd utils/
./obtain_cores.sh
```

**Example output:**

```
pcore:0,1,2,3,4,5,6,7
ecore:8,9,10,11,12,13,14,15,16,17,18,19,20,21
lpecore:22,23,24,25
```

This script uses multiple detection methods with fallbacks:

1. **Multi-socket Xeon detection** — assigns all cores as P-cores on server platforms.
2. **CPUID-based detection** — uses the `cpuid` instruction to identify Intel Core vs Intel Atom cores.
3. **sysfs validation** — reads `/sys/devices/cpu_core/`, `/sys/devices/cpu_atom/`, `/sys/devices/cpu_lowpower/`.
4. **L1d cache drop detection** — identifies E-cores by detecting cache size transitions.
5. **SMT pair detection** — classifies remaining cores based on hyperthreading topology.

The script outputs comma-separated core IDs for each type, which can be directly used with
the `taskset` command to pin workloads.

**Pinning examples:**

```bash
# Pin to P-cores only (for latency-sensitive workloads)
taskset -c 0,1,2,3,4,5,6,7 ./your_application

# Pin to E-cores only (for throughput workloads)
taskset -c 8,9,10,11,12,13,14,15,16,17,18,19,20,21 ./your_application

# Pin to LPE-cores (for background tasks)
taskset -c 22,23,24,25 ./background_service
```

For full documentation and usage examples, see the
[utils README](https://github.com/open-edge-platform/edge-workloads-and-benchmarks/blob/main/utils/README.md).

### 2. Power Monitoring Tools

To verify that core pinning is actually improving your power efficiency and performance, you
need to monitor power consumption across all compute resources.

#### get_package_power.sh — Package Power Monitoring via RAPL and hwmon

Intel provides a dedicated script to sample platform package power consumption using RAPL
(Running Average Power Limit) sysfs interfaces and hardware monitoring sensors. The script is
available in the
[edge-workloads-and-benchmarks](https://github.com/open-edge-platform/edge-workloads-and-benchmarks) repository.

**Location:** [`utils/get_package_power.sh`](https://github.com/open-edge-platform/edge-workloads-and-benchmarks/blob/main/utils/get_package_power.sh)

**Usage:**

```bash
sudo ./get_package_power.sh -i <duration (seconds)> -s <sampling interval (seconds)> -d <delay (seconds)>
```

**Options:**

- `-s <seconds>` — Sampling interval in seconds (default: 1)
- `-i <seconds>` — Total duration in seconds (default: 60)
- `-d <seconds>` — Start delay in seconds (default: 0)

**Example:**

```bash
# Measure package power for 60 seconds at 1-second intervals
sudo ./get_package_power.sh -i 60 -s 1

# Measure for 30 seconds with a 5-second delay before starting
sudo ./get_package_power.sh -i 30 -s 1 -d 5
```

**Example output:**

```
[ Info ] Monitoring for 60s after a 0s delay

[rapl] card0 (xe @ 0000:00:02.0): 15.23 W
[rapl] card0 (xe @ 0000:00:02.0): 14.87 W
[rapl] card0 (xe @ 0000:00:02.0): 15.45 W
...

[ Info ] Monitoring complete
```

**Output format:**

```
[source] card# (driver @ pci): power W
```

- **source**: Either `rapl` (RAPL energy counters) or `hwmon` (hardware monitoring sensors)
- **card#**: DRM card identifier (e.g., `card0`)
- **driver**: Graphics driver (`i915` or `xe`)
- **pci**: PCI device address
- **power**: Instantaneous power consumption in watts

**How it works:**

1. **Discovers Intel graphics devices** via `/sys/class/drm/card*/` (i915 or xe drivers)
2. **Detects power sensors** using either:
   - Hardware monitoring sensors (`hwmon`) with package/card energy or power labels
   - RAPL energy counters (`/sys/class/powercap/intel-rapl:*/energy_uj`)
3. **Samples power** by reading energy counters at the start and end of each interval, computing power as:
   ```
   power (W) = (end_energy - start_energy) / interval_duration
   ```

**Use cases:**

- **Compare before/after core pinning**: Run the script during your workload with and without core pinning to quantify power savings
- **Monitor GPU power availability**: Check if freeing up CPU cores allows the GPU to consume more power (higher frequency)
- **Long-term profiling**: Use with longer durations to understand power patterns over time

**Example workflow:**

```bash
# Baseline: workload without core pinning
sudo ./get_package_power.sh -i 60 -s 1 &
./my_workload

# Optimized: workload with E-core pinning
sudo ./get_package_power.sh -i 60 -s 1 &
taskset -c 8-21 ./my_workload
```

For full documentation, see the [utils README](https://github.com/open-edge-platform/edge-workloads-and-benchmarks/blob/main/utils/README.md).

#### turbostat — Detailed Package, Core, and Graphics Power

Linux's built-in `turbostat` utility provides more granular power breakdowns:

```bash
sudo turbostat --interval 1
```

**Key metrics to watch:**

- **PkgWatt**: Total package power (CPU + GPU + uncore)
- **CorWatt**: Power consumed by CPU cores only
- **GFXWatt**: Graphics (GPU) power consumption
- **RAMWatt**: DRAM power

Example output snippet:

```
Core CPU  Avg_MHz Busy%  Bzy_MHz  PkgWatt  CorWatt  GFXWatt
-    -    2100    50.0   4200     25.0     15.0     5.0
0    0    4200    100.0  4200
1    1    4200    100.0  4200
```

By comparing power metrics **before and after core pinning**, you can quantify the impact on
package power and GPU power availability. Use `get_package_power.sh` for simple package-level
measurements, and `turbostat` when you need detailed per-core and component-level breakdowns.

#### npu-monitor-tool.py — NPU Power and Utilization

For workloads using the Intel NPU (Neural Processing Unit), monitor NPU-specific metrics using
the NPU monitoring tool from the
[edge-ai-libraries](https://github.com/open-edge-platform/edge-ai-libraries) repository.

**Location:** [`tools/npu-monitor-tool/npu-monitor-tool.py`](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/tools/npu-monitor-tool/npu-monitor-tool.py)

**Usage:**

```bash
sudo python3 npu-monitor-tool.py -i 1000
```

**Example output:**

```
+-----------------------------------------------------------------------------------------------+
| INTEL NPU Device: 0x7d1d   | version: 1.0.0                                                   |
| Firmware version: IVPU_MTL_20240112_v2024.01                                                  |
+===============================================================================================+
|       Power Usage        |      DPU Freq        | NPU DDR Average Bandwidth   |    Tile Conf  |
|                2.5 [W]   |        1400 [Hz]     |               123.45 [MB/s] |             4 |
+===============================================================================================+
|       NPU Temperature    |      NPU Utilization       |      Memory Usage                     |
|              45 [°C]     |                      25%   |                         512.00 [MB]   |
+-----------------------------------------------------------------------------------------------+
```

**CSV export** is available for long-term analysis:

```bash
sudo python3 npu-monitor-tool.py --csv -i 1000
```

This generates timestamped CSV files in `npu_output/` with the following columns:

`timestamp`, `power`, `frequency`, `bandwidth`, `tile_config`, `temperature`, `utilization`, `memory_usage`

For complete documentation, see the
[npu-monitor-tool README](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/tools/npu-monitor-tool/README.md).


## Core Pinning with DL Streamer Pipeline Server

For AI video analytics workloads using the
[DL Streamer Pipeline Server](https://github.com/open-edge-platform/edge-ai-libraries/tree/main/microservices/dlstreamer-pipeline-server),
Intel provides built-in support for core pinning via the **`CORE_PINNING` environment variable**.
This eliminates the need to manually wrap the server with `taskset` and provides a declarative
way to specify core affinity in Docker Compose or Kubernetes deployments.

### Using the CORE_PINNING Environment Variable

The `CORE_PINNING` environment variable accepts two types of values:

1. **Explicit core list or range** (taskset-compatible syntax):
   - Comma-delimited list: `10,12,14`
   - Range: `10-14`
   - Range with step: `10-14/2` (cores 10, 12, 14)

2. **Core type specification** (automatic detection):
   - `p-cores` — Pin to Performance cores
   - `e-cores` — Pin to Efficient cores
   - `lp-cores` — Pin to Low Power Efficient cores

The server automatically detects the appropriate cores using the same detection logic as
`obtain_cores.sh` and applies `taskset` internally.

### Docker Compose Example

::::{tab-set}
:::{tab-item} **Pin to P-cores for low-latency inference**

```yaml
version: '3.8'
services:
  dlstreamer-pipeline-server:
    image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22
    environment:
      CORE_PINNING: p-cores
    devices:
      - /dev/dri:/dev/dri
    ports:
      - "8080:8080"
    volumes:
      - ./pipelines:/home/pipeline-server/pipelines
```

:::
:::{tab-item} **Pin to E-cores for high-throughput batch processing**

```yaml
version: '3.8'
services:
  dlstreamer-pipeline-server:
    image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22
    environment:
      CORE_PINNING: e-cores
    devices:
      - /dev/dri:/dev/dri
    ports:
      - "8080:8080"
    volumes:
      - ./pipelines:/home/pipeline-server/pipelines
```

:::
:::{tab-item} **Pin to specific cores (manual control)**

```yaml
version: '3.8'
services:
  dlstreamer-pipeline-server:
    image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22
    environment:
      CORE_PINNING: "8-15"  # E-cores 8 through 15
    devices:
      - /dev/dri:/dev/dri
    ports:
      - "8080:8080"
    volumes:
      - ./pipelines:/home/pipeline-server/pipelines
```

:::
::::

### Kubernetes Example

For Kubernetes deployments, set the environment variable in the pod spec:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: dlstreamer-pipeline-server
spec:
  containers:
  - name: dlstreamer
    image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22
    env:
    - name: CORE_PINNING
      value: "p-cores"
    resources:
      limits:
        gpu.intel.com/i915: 1
```


### CORE_PINNING vs Manual taskset


**Recommendation:** Use `CORE_PINNING` for DL Streamer Pipeline Server deployments to simplify
configuration and enable portable deployments across different platforms.


| Approach                   | Pros | Cons |
|----------------------------|------|------|
| **`CORE_PINNING` env var** | Declarative, container-native, works in Docker Compose/K8s, automatic core detection | Specific to DL Streamer Pipeline Server |
| **Manual `taskset`**       | Universal (works with any application), explicit control | Requires shell wrapper, harder to manage in orchestration, manual core discovery |


### Combining Core Pinning with GPU/NPU Offload

A common optimization pattern for AI pipelines:

1. **Pin the Pipeline Server to E-cores** — reduces CPU power consumption
2. **Offload inference to GPU or NPU** — leaves more power budget for accelerators
3. **Monitor power distribution** — verify GPU/NPU frequencies increase

**Example Docker Compose with GPU + E-core pinning:**

```yaml
version: '3.8'
services:
  dlstreamer-pipeline-server:
    image: intel/dlstreamer-pipeline-server:2025.2.0-ubuntu22
    environment:
      CORE_PINNING: e-cores
      DEVICE: GPU  # Offload inference to GPU
    devices:
      - /dev/dri:/dev/dri
    ports:
      - "8080:8080"
    volumes:
      - ./pipelines:/home/pipeline-server/pipelines
```

**Verify the optimization:**

```bash
# Terminal 1: Monitor package power
sudo ./get_package_power.sh -i 120 -s 1

# Terminal 2: Start the pipeline server
docker-compose up

# Terminal 3: Run a pipeline
curl -X POST http://localhost:8080/pipelines/object_detection/1
```

Check that package power decreases while GPU power (visible in `turbostat` GFXWatt) increases
or remains stable.

For complete documentation on DL Streamer Pipeline Server core pinning, see the
[official guide](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/dlstreamer-pipeline-server/docs/user-guide/advanced-guide/detailed_usage/how-to-advanced/performance/core-pinning.md).

## Recommendations: Which Cores to Pin?

The optimal core pinning strategy depends on your workload characteristics:

- E-cores for throughput workloads.
- P-cores for latency-constrained workloads.
- LPE-cores for background tasks.

::::{tab-set}
:::{tab-item} **E-cores**

**Use E-cores when:**

- Your workload is parallelizable and scales with core count
- Throughput (tasks/second) matters more than individual task latency
- You want to leave more power budget for GPU/NPU
- Examples: video encoding, batch inference, data processing pipelines

**Why E-cores?**

- More E-cores are available (typically 2-3× the number of P-cores)
- Lower per-core power consumption allows more cores to run simultaneously
- Leaves power headroom for GPU and NPU to maintain high frequencies
- Better aggregate throughput per watt

**Example:**

```bash
# Video transcoding pipeline on E-cores
taskset -c 8-21 ffmpeg -i input.mp4 -c:v h264_vaapi -vf 'scale_vaapi=1920:1080' output.mp4

# Batch inference on NPU with E-cores handling preprocessing
taskset -c 8-21 python batch_inference.py --device NPU

# DL Streamer Pipeline Server for high-throughput video analytics
# (using CORE_PINNING environment variable)
CORE_PINNING=e-cores docker-compose up
```

:::
:::{tab-item} **P-cores**

**Use P-cores when:**

- Low latency is critical (interactive applications, real-time control)
- Single-threaded or lightly-threaded workloads
- You need maximum per-thread performance
- Examples: UI rendering, game engines, real-time analytics, control loops

**Why P-cores?**

- Higher per-core clock speeds (often 2× E-core frequency)
- Larger caches (L2 and shared L3)
- Better single-threaded performance for latency-critical paths
- Ideal for "main thread" logic that orchestrates parallel work

**Example:**

```bash
# Real-time object detection with DL Streamer
taskset -c 0-7 gst-launch-1.0 filesrc location=video.mp4 ! \
    qtdemux ! h264parse ! vah264dec ! gvadetect model=yolov5.xml device=GPU ! \
    gvafpscounter ! fakesink

# Industrial control loop on P-cores
taskset -c 0-3 ./motion_control_app --realtime

# DL Streamer Pipeline Server for low-latency inference
# (using CORE_PINNING environment variable)
CORE_PINNING=p-cores docker-compose up
```
:::
:::{tab-item} **LPE-cores**

**Use LPE-cores when:**

- Tasks are low priority or non-latency-sensitive
- You want to minimize interference with foreground workloads
- Power efficiency is paramount
- Examples: telemetry collection, logging, health checks

**Example:**

```bash
# Background telemetry agent on LPE-cores
taskset -c 22-25 ./telemetry_agent --interval 5s

# DL Streamer Pipeline Server for monitoring/logging pipelines
# (using CORE_PINNING environment variable)
CORE_PINNING=lp-cores docker-compose up
```
:::
::::


## Summarry and Best Practices

The tools provided in Intel's open-edge-platform repositories — `obtain_cores.sh` for core
detection, `get_package_power.sh` for package power monitoring, and `npu-monitor-tool.py` for
NPU monitoring, combined with `turbostat` for detailed power tracking — give you everything you
need to implement effective core pinning strategies. For containerized AI workloads, the
DL Streamer Pipeline Server's `CORE_PINNING` environment variable provides a declarative,
orchestration-friendly way to apply core affinity. Here are some recommendations on how to
proeed:


1. **Profile first, optimize second**:
   Use `get_package_power.sh`, `turbostat`, and `npu-monitor-tool.py` to establish baselines before pinning.

2. **Match workload to core type**:
   - Latency-sensitive → P-cores
   - Throughput-oriented → E-cores
   - Background tasks → LPE-cores

3. **Leave cores idle when possible**:
   Don't spread workloads across all cores. Idle cores consume minimal power and leave more
   budget for accelerators.

4. **Combine CPU pinning with GPU/NPU offload**:
   For AI pipelines, pin CPU preprocessing to E-cores and run inference on GPU/NPU.

5. **Use `CORE_PINNING` for containerized workloads**:
   When using DL Streamer Pipeline Server, prefer the `CORE_PINNING` environment variable
   over manual `taskset` wrappers.

6. **Monitor power distribution**:
   Verify that your pinning strategy increases GPU/NPU power availability:

   ```bash
   # Before pinning
   sudo ./get_package_power.sh -i 60 -s 1 > baseline.log &
   ./workload_no_pinning

   # After pinning
   sudo ./get_package_power.sh -i 60 -s 1 > optimized.log &
   taskset -c 8-15 ./workload_pinned

   # Compare results
   diff baseline.log optimized.log
   ```

7. **Use CSV export for long-term analysis**:
   Collect metrics over hours or days to understand power trends and workload characteristics.


---

## Additional Resources

- [Edge Workloads and Benchmarks Repository](https://github.com/open-edge-platform/edge-workloads-and-benchmarks)
- [Edge AI Libraries Repository](https://github.com/open-edge-platform/edge-ai-libraries)
- [DL Streamer Pipeline Server](https://github.com/open-edge-platform/edge-ai-libraries/tree/main/microservices/dlstreamer-pipeline-server)
- [DL Streamer Pipeline Server Core Pinning Guide](https://github.com/open-edge-platform/edge-ai-libraries/blob/main/microservices/dlstreamer-pipeline-server/docs/user-guide/advanced-guide/detailed_usage/how-to-advanced/performance/core-pinning.md)
- [Intel Platform Monitoring Technology Specification](https://www.intel.com/content/www/us/en/content-details/710389/intel-platform-monitoring-technology-intel-pmt-technical-specification.html)