Understanding Caching in OS Image Composer#

The OS Image Composer implements sophisticated caching mechanisms to significantly improve build performance and reduce resource usage. This document explains how caching works and how to manage it effectively.

Table of Contents#

Overview of Caching Mechanisms#

The OS Image Composer uses two complementary caching mechanisms to dramatically improve build performance:

1. Package Cache - Stores downloaded OS packages (.rpm or .deb files) for reuse across builds

2. Chroot Environment Reuse - Preserves the chroot environment and its tarball to avoid rebuilding the base system

Cache Type

Purpose

Location

Performance Benefit

Package Cache

Downloaded packages

cache/pkgCache/{provider-id}/

Eliminates re-downloading packages

Chroot Environment

Base OS environment

workspace/{provider-id}/chrootenv/

Avoids recreating chroot for each build

Chroot Tarball

Chroot snapshot

workspace/{provider-id}/chrootbuild/

Quick chroot restoration

Together, these mechanisms provide substantial performance improvements:

  • First build: Creates all artifacts from scratch (10-15 minutes typical)

  • Subsequent builds with cache: Reuses packages and chroot (2-4 minutes typical)

  • Performance improvement: 60-80% reduction in build time

Package Caching#

The package cache stores downloaded OS packages locally. Once packages are downloaded and verified, they are stored in the cache and reused for subsequent builds.

How Package Caching Works#

During the packages stage of a build, the tool follows this process:

        flowchart TD
    Start([Need Package]) --> CheckCache{In Package Cache?}
    
    CheckCache -->|Yes| VerifyIntegrity{Verify Integrity}
    VerifyIntegrity -->|Valid| UseCache[Use Cached Package]
    VerifyIntegrity -->|Invalid| DeleteInvalid[Delete Invalid Entry]
    DeleteInvalid --> Download
    
    CheckCache -->|No| Download[Download from Repository]
    Download --> VerifyDownload[Verify GPG & Checksum]
    VerifyDownload --> StoreCache[Store in Package Cache]
    StoreCache --> UsePackage[Use Package for Installation]
    
    UseCache --> UsePackage
    UsePackage --> Done([Ready for Installation])
    
    %% Styling
    classDef decision fill:#fffacd,stroke:#d4b106,stroke-width:2px;
    classDef cache fill:#d1f0da,stroke:#0e7735,stroke-width:2px;
    classDef process fill:#f9f9f9,stroke:#333,stroke-width:1px;
    classDef endpoint fill:#b5e2fa,stroke:#0077b6,stroke-width:2px;
    
    class CheckCache,VerifyIntegrity decision;
    class UseCache,StoreCache cache;
    class Download,VerifyDownload,DeleteInvalid,UsePackage process;
    class Start,Done endpoint;
    

Detailed Steps:

  1. Cache Lookup: Check if package exists in cache/pkgCache/{provider-id}/

  2. Integrity Verification: Verify cached package hasn’t been corrupted

    • Check file size matches expected size

    • Verify SHA256 checksum

    • Validate GPG signature

  3. Cache Hit: If valid, collect package for installation

  4. Cache Miss/Invalid: If not in cache or invalid, download from repository

  5. Download and Verify: Download package and verify integrity

  6. Store in Cache: Save verified package for future use

  7. Generate Dependency Graph: Update chrootpkgs.dot with dependency information

Package Cache Organization#

The package cache is organized by provider to ensure each unique package is stored only once:

cache/
└── pkgCache/
    └── {provider-id}/                    # e.g., azure-linux-azl3-x86_64
        ├── package1-1.0-1.rpm
        ├── package2-2.5-3.rpm
        ├── chrootpkgs.dot                # Dependency graph
        └── [hundreds more packages...]

Provider ID Format: {os}-{dist}-{arch} (e.g., azure-linux-azl3-x86_64)

Each package file includes:

  • Full package filename with version and architecture

  • Stored in a flat directory structure

  • Verified GPG signatures and checksums

The chrootpkgs.dot file contains a visual representation of package dependencies in Graphviz DOT format, useful for troubleshooting and understanding package relationships.

Package Cache Benefits#

1. Dramatically Reduced Build Times

Build Scenario

Without Cache

With Cache

Improvement

First build (200 packages)

8-12 minutes

8-12 minutes

Baseline

Identical rebuild

8-12 minutes

2-3 minutes

70-75% faster

Similar build (150 shared packages)

8-12 minutes

4-5 minutes

50-60% faster

Minimal changes (5 new packages)

8-12 minutes

2-3 minutes

70-75% faster

2. Reduced Network Bandwidth

  • First build: Downloads all required packages (~500MB-2GB typical)

  • Subsequent builds: Downloads only new or updated packages (~0-100MB typical)

  • Shared packages across different configurations are downloaded once

3. Offline Build Capability

If all required packages are cached, you can build images without internet access:

  • Useful for air-gapped environments

  • Enables builds on systems with restricted network access

  • Reduces dependency on repository availability

4. Consistent Build Performance

  • Build times become predictable after initial cache population

  • Less affected by repository server performance

  • Reduces impact of network congestion

5. Development Workflow Efficiency

  • Rapid iteration during development

  • Quick testing of configuration changes

  • Fast CI/CD pipeline execution

Chroot Environment Reuse#

The chroot environment reuse mechanism preserves the base OS environment between builds, avoiding the expensive overhead of recreating it for each build.

How Chroot Reuse Works#

The tool manages chroot environments at the provider level (OS-distribution-architecture combination):

First Build for a Provider:

  1. Create base chroot environment in workspace/{provider-id}/chrootenv/

  2. Install essential packages (filesystem, systemd, kernel, etc.)

  3. Configure base system

  4. Create tarball snapshot in workspace/{provider-id}/chrootbuild/chrootenv.tar.gz

Subsequent Builds for Same Provider:

  1. Check if chroot environment exists in workspace/{provider-id}/chrootenv/

  2. If exists and valid, reuse the existing chroot

  3. Mount pseudo-filesystems (proc, sys, dev)

  4. Install additional packages specific to the image template

  5. Configure image-specific settings

Image Build Directory:

  • Each build creates a clean directory in workspace/{provider-id}/imagebuild/{systemConfigName}/

  • This directory contains the final image output

  • Rebuilt for each build to ensure clean output

Chroot Directory Structure#

workspace/
└── {provider-id}/                        # e.g., azure-linux-azl3-x86_64
    ├── chrootenv/                        # REUSED chroot environment
    │   ├── bin/
    │   ├── boot/
    │   ├── etc/
    │   ├── usr/
    │   ├── var/
    │   └── workspace/
    ├── chrootbuild/                      # REUSED chroot tarball
    │   ├── chroot/
    │   └── chrootenv.tar.gz              # Snapshot for quick restoration
    └── imagebuild/                       # REBUILT each time
        └── {systemConfigName}/           # e.g., production, minimal, edge
            ├── {image-name}.raw
            └── [build artifacts]

Persistence:

  • chrootenv/: Persists across builds, contains full chroot filesystem

  • chrootbuild/: Persists across builds, contains tarball for restoration

  • imagebuild/: Cleaned and rebuilt for each image build

Chroot Reuse Benefits#

1. Significant Time Savings

Operation

Without Reuse

With Reuse

Improvement

Chroot creation

2-3 minutes

~30 seconds

75-85% faster

Total build time

10-15 minutes

3-5 minutes

60-70% faster

2. Reduced Disk I/O

  • Avoids writing hundreds of MB of system files

  • Reduces wear on SSDs

  • Improves performance on slower storage

3. Consistent Base Environment

  • Same base environment used across multiple image builds

  • Ensures consistency in base system configuration

  • Reduces variables when troubleshooting

4. Resource Efficiency

  • Single chroot environment per provider, not per image

  • Efficient use of disk space (shared base, unique images)

Cache Integration with Build Process#

Both caching mechanisms integrate seamlessly with the build pipeline:

        flowchart TD
    Start([Start Build]) --> LoadTemplate[Load Template]
    LoadTemplate --> InitProvider[Initialize Provider]
    InitProvider --> Validate[Validate Stage]
    
    Validate --> PackageStage[Packages Stage]
    
    PackageStage --> FetchMeta[Fetch Repository Metadata]
    FetchMeta --> ResolveDeps[Resolve Dependencies]
    ResolveDeps --> PackageLoop[For Each Required Package]
    
    PackageLoop --> CheckPkgCache{In Package Cache?}
    CheckPkgCache -->|Yes| UsePackage[Use Cached Package]
    CheckPkgCache -->|No| DownloadPkg[Download Package]
    DownloadPkg --> StorePkg[Store in Package Cache]
    StorePkg --> UsePackage
    
    UsePackage --> MorePackages{More Packages?}
    MorePackages -->|Yes| PackageLoop
    MorePackages -->|No| Compose[Compose Stage]
    
    Compose --> CheckChroot{Chroot Exists?}
    CheckChroot -->|Yes| ReuseChroot[Reuse Chroot Environment]
    CheckChroot -->|No| CreateChroot[Create New Chroot]
    CreateChroot --> SaveTarball[Save Chroot Tarball]
    SaveTarball --> InstallPackages
    ReuseChroot --> InstallPackages[Install Image Packages]
    
    InstallPackages --> Configure[Configure System]
    Configure --> CreateImage[Create Image File]
    CreateImage --> Finalize[Finalize Stage]
    Finalize --> Done([Build Complete])
    
    %% Styling
    classDef stage fill:#f8edeb,stroke:#333,stroke-width:2px;
    classDef decision fill:#fffacd,stroke:#d4b106,stroke-width:2px;
    classDef cache fill:#d1f0da,stroke:#0e7735,stroke-width:2px;
    classDef process fill:#f9f9f9,stroke:#333,stroke-width:1px;
    classDef endpoint fill:#b5e2fa,stroke:#0077b6,stroke-width:2px;
    
    class PackageStage,Compose,Finalize stage;
    class CheckPkgCache,CheckChroot,MorePackages decision;
    class UsePackage,StorePkg,ReuseChroot,SaveTarball cache;
    class LoadTemplate,InitProvider,Validate,FetchMeta,ResolveDeps,PackageLoop,DownloadPkg,CreateChroot,InstallPackages,Configure,CreateImage process;
    class Start,Done endpoint;
    

Integration Points:

  1. Package Stage: Package cache checked for each required package

  2. Compose Stage: Chroot environment reused if available

  3. Throughout Build: Cached artifacts used transparently

  4. Across Builds: Caches persist and accumulate over time

Configuration Options#

Global Configuration#

Configure cache and workspace locations in /etc/os-image-composer/config.yml:

# Package cache configuration
cache_dir: /var/cache/os-image-composer  # Root cache directory
                                          # Contains pkgCache/

# Working directory configuration
work_dir: /var/tmp/os-image-composer     # Root workspace directory
                                          # Contains {provider-id}/ subdirs

# Worker configuration (affects download speed)
workers: 16                               # Number of concurrent downloads

# Temporary files
temp_dir: /tmp                            # Temporary files like SBOM

Directory Purposes:

  • cache_dir: Contains pkgCache/ subdirectory with downloaded packages

  • work_dir: Contains {provider-id}/ subdirectories with chroot environments and image builds

  • temp_dir: Temporary files including SBOM manifest

Command-Line Overrides#

Override configuration for specific builds:

# Use custom cache directory
sudo -E os-image-composer build --cache-dir /mnt/fast-ssd/cache template.yml

# Use custom work directory
sudo -E os-image-composer build --work-dir /mnt/nvme/workspace template.yml

# Increase workers for faster initial download
sudo -E os-image-composer build --workers 32 template.yml

# Combine multiple overrides
sudo -E os-image-composer build \
  --cache-dir /mnt/cache \
  --work-dir /mnt/workspace \
  --workers 24 \
  template.yml

Cache Management#

Cache Locations#

Package Cache:

cache/pkgCache/{provider-id}/

Default: /var/cache/os-image-composer/pkgCache/

Chroot Environment:

workspace/{provider-id}/chrootenv/
workspace/{provider-id}/chrootbuild/

Default: /var/tmp/os-image-composer/{provider-id}/

Image Build Output:

workspace/{provider-id}/imagebuild/{systemConfigName}/

Rebuilt for each build.

Cache Size Management#

Typical Sizes:

Component

Approximate Size

Package cache (single provider)

2-5 GB

Package cache (multiple providers)

5-15 GB

Chroot environment (per provider)

1-3 GB

Chroot tarball (per provider)

300-800 MB

Image build directory

Varies by image size

Total Disk Space Recommendations:

  • Minimum: 20 GB for single provider

  • Recommended: 50 GB for multiple providers

  • Optimal: 100 GB for long-term use with multiple providers

Monitor Cache Size:

# Check package cache size
du -sh cache/pkgCache/

# Check size by provider
du -sh cache/pkgCache/*/

# Check workspace size
du -sh workspace/

# Check chroot environments
du -sh workspace/*/chrootenv/

# Check chroot tarballs
du -sh workspace/*/chrootbuild/

# Count cached packages
find cache/pkgCache/ -name "*.rpm" | wc -l
find cache/pkgCache/ -name "*.deb" | wc -l

Clearing Caches#

Clear Package Cache:

# Clear all package caches
sudo rm -rf cache/pkgCache/

# Clear cache for specific provider
sudo rm -rf cache/pkgCache/azure-linux-azl3-x86_64/

# Next build will re-download packages

Clear Chroot Environment:

# Remove chroot for specific provider
sudo rm -rf workspace/azure-linux-azl3-x86_64/chrootenv/
sudo rm -rf workspace/azure-linux-azl3-x86_64/chrootbuild/

# Next build will recreate chroot environment

Clear Image Build Artifacts:

# Clear all image build directories
sudo rm -rf workspace/*/imagebuild/

# Clear specific image builds
sudo rm -rf workspace/azure-linux-azl3-x86_64/imagebuild/

Clear Everything:

# Clear all caches and workspaces
sudo rm -rf cache/
sudo rm -rf workspace/

# Next build starts from scratch

When to Clear Caches:

  • Package cache: When running low on disk space, or after major distribution upgrades

  • Chroot environment: When chroot is corrupted, or after OS vendor updates base packages

  • Image build artifacts: Regularly, as these are rebuilt for each image anyway

Best Practices#

1. Use Appropriate Storage

Place caches on appropriate storage:

# Development: Fast local SSD
cache_dir: /mnt/nvme/cache
work_dir: /mnt/nvme/workspace

# CI/CD: Network storage shared across agents
cache_dir: /mnt/nfs/os-image-composer-cache
work_dir: /var/tmp/os-image-composer

# Production: Reliable storage with backup
cache_dir: /var/cache/os-image-composer
work_dir: /var/tmp/os-image-composer

2. Size Storage Appropriately

Allocate sufficient space:

  • cache_dir: 10-30 GB (grows with package updates)

  • work_dir: 10-50 GB (grows with number of providers)

3. Monitor Cache Health

Create a monitoring script:

#!/bin/bash
# /usr/local/bin/check-oic-cache.sh

CACHE_DIR="/var/cache/os-image-composer"
WORK_DIR="/var/tmp/os-image-composer"
WARN_SIZE_GB=40
CRIT_SIZE_GB=80

check_size() {
    local dir=$1
    local name=$2
    if [ -d "$dir" ]; then
        local size_gb=$(du -s "$dir" 2>/dev/null | awk '{print int($1/1024/1024)}')
        echo "$name: ${size_gb}GB"
        
        if [ $size_gb -gt $CRIT_SIZE_GB ]; then
            echo "CRITICAL: $name exceeds ${CRIT_SIZE_GB}GB"
            return 2
        elif [ $size_gb -gt $WARN_SIZE_GB ]; then
            echo "WARNING: $name exceeds ${WARN_SIZE_GB}GB"
            return 1
        fi
    fi
    return 0
}

check_size "$CACHE_DIR" "Cache"
cache_status=$?

check_size "$WORK_DIR" "Workspace"
work_status=$?

if [ $cache_status -eq 2 ] || [ $work_status -eq 2 ]; then
    exit 2
elif [ $cache_status -eq 1 ] || [ $work_status -eq 1 ]; then
    exit 1
fi

exit 0

4. Implement Cleanup Policies

Automate cleanup of old artifacts:

#!/bin/bash
# Clean up old image build directories (keep only last 5 days)
find workspace/*/imagebuild/ -type d -mtime +5 -exec rm -rf {} +

# Clean up old packages not accessed in 180 days
find cache/pkgCache/ -name "*.rpm" -atime +180 -delete
find cache/pkgCache/ -name "*.deb" -atime +180 -delete

5. Protect Caches from Corruption

  • Use reliable filesystems (ext4, xfs)

  • Consider RAID for important build infrastructure

  • Avoid manual modification of cache contents

  • Implement backup for critical environments

6. Plan for Growth

  • Allocate ample space initially

  • Monitor growth rate over time

  • Implement cleanup policies before reaching capacity

  • Consider separate volumes for cache and workspace

Advanced Topics#

Cache on Network Storage#

You can place caches on network storage for sharing across build hosts:

# Global configuration
cache_dir: /mnt/nfs/os-image-composer-cache
work_dir: /mnt/nfs/os-image-composer-workspace

NFS Example:

# Mount NFS share
sudo mount -t nfs nfs-server:/export/oic-cache /mnt/nfs/os-image-composer-cache
sudo mount -t nfs nfs-server:/export/oic-workspace /mnt/nfs/os-image-composer-workspace

# Configure in /etc/os-image-composer/config.yml
cache_dir: /mnt/nfs/os-image-composer-cache
work_dir: /mnt/nfs/os-image-composer-workspace

Considerations:

  • Network performance affects build speed

  • Multiple hosts can read simultaneously

  • Ensure proper file locking for concurrent writes

  • Network issues affect builds

Sharing Caches Between Hosts#

Multiple build hosts can share caches for efficiency:

Benefits:

  • Central cache reduces total storage needs

  • All hosts benefit from any host’s downloads

  • Consistent chroot environments across infrastructure

Cache Performance Tuning#

For Fast Initial Cache Population:

# Use many workers for parallel downloads
--workers 32

For SSD/NVMe Storage:

# Place both cache and workspace on fastest storage
--cache-dir /mnt/nvme/cache --work-dir /mnt/nvme/workspace

For Network Storage:

# Reduce workers to avoid overwhelming network
--workers 8

For Large Package Sets:

# Pre-populate cache with minimal build first
sudo -E os-image-composer build minimal-template.yml

# Then build full image (uses cached packages and chroot)
sudo -E os-image-composer build full-template.yml