Step-by-Step Process: Fixing StyleGAN2 Eye Manipulation on CPU

This document outlines the step-by-step process that was followed to fix the StyleGAN2 eye manipulation implementation on CPU, which was initially producing pink/magenta noise instead of proper face images.

Step 1: Problem Analysis

Issue Identification

The original implementation was producing random pink/magenta noise instead of proper face images
Error messages pointed to tensor dimension mismatches
CUDA-specific operations were failing on CPU
Pretrained weights weren't loading correctly

Initial Diagnosis

Verified that the issue was related to tensor operations rather than general code structure
Examined tensor dimensions throughout the generation process
Identified missing or incorrectly implemented CPU-specific operations
Checked weight loading and model initialization

Step 2: Implementing CPU-Compatible Operations

2.1 Creating CPU equivalents of CUDA operations

Created cpu_ops.py with simplified replacements for CUDA functions:

def cpu_upfirdn2d(input, kernel, up=1, down=1, pad=(0, 0)):
    out = input
    if up > 1:
        out = F.interpolate(out, scale_factor=up, mode='nearest')
    if len(pad) > 0:
        pad_sizes = [pad[0], pad[1], pad[0], pad[1]]
        out = F.pad(out, pad_sizes, mode='reflect')
    if down > 1:
        out = F.avg_pool2d(out, down, stride=down)
    return out

2.2 Patching model operations

Replaced StyleGAN2's CUDA operations with CPU equivalents:

import model
model.upfirdn2d = cpu_upfirdn2d
model.conv2d_gradfix.conv2d = F.conv2d
model.conv2d_gradfix.conv_transpose2d = F.conv_transpose2d
model.NoiseInjection = SafeNoiseInjection
model.ToRGB = SafeToRGB
model.FusedLeakyReLU = CPULeakyReLU
model.fused_leaky_relu = cpu_leaky_relu

Step 3: Fixing Tensor Dimension Issues

3.1 Diagnosing dimension problems

Added dimension debugging code:

print(f"Latent vector shape: {latent_w.shape}")
print(f"Mean latent shape: {mean_latent.shape}")

3.2 Fixing the `generate_face` function

Updated the function to handle all possible tensor shapes:

def generate_face(latent_w=None, noise=None, truncation=0.7, seed=None):
    with torch.no_grad():
        # Get latent vector if not provided
        if latent_w is None:
            latent_w = get_latent_vector(seed=seed, truncation=truncation)
        
        # Handle different latent shape scenarios
        if latent_w.ndim == 3 and latent_w.shape[1] == g_ema.n_latent:
            # Already in correct format [batch_size, n_latent, latent_dim]
            pass
        elif latent_w.ndim == 2:
            # Format is [batch_size, latent_dim]
            latent_w = latent_w.unsqueeze(1).repeat(1, g_ema.n_latent, 1)
        elif latent_w.ndim == 1:
            # Format is [latent_dim]
            latent_w = latent_w.unsqueeze(0).unsqueeze(0).repeat(1, g_ema.n_latent, 1)

3.3 Similar fixes for `manipulate_eyes` function

Updated the eye manipulation function in the same way, with proper dimension handling

Step 4: Improving Weight Loading and Model Initialization

4.1 Creating a robust weight downloading utility

Implemented download_models.py to fetch official weights:

def download_model(model_name, output_dir="models"):
    url = AVAILABLE_MODELS[model_name]
    output_path = os.path.join(output_dir, f"{model_name}.pkl")
    
    # Download with proper error handling
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(output_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)

4.2 Better weight loading approach

Added multiple fallbacks for weight loading:

try:
    with open('stylegan2-ffhq-config-f.pkl', 'rb') as f:
        data = pickle.load(f)
        
        if isinstance(data, dict) and 'g_ema' in data:
            # PyTorch format
            g_ema.load_state_dict(data['g_ema'])
        elif isinstance(data, dict) and 'g' in data:
            # TensorFlow format
            # Conversion code for TensorFlow weights
        else:
            # Handle unknown format

Step 5: Creating Pre-computed Eye Feature Directions

5.1 Creating eye direction vectors

Defined pre-computed direction vectors for different eye features:

# Direction for eye size
EYE_SIZE_DIRECTION = np.array([
    0.0, 0.02, 0.1, 0.04, 0.08, 0.15, 0.2, 0.1, 0.02, 0.01,  # 0-9
    # ... more values ...
])

# Direction for eye openness
EYE_OPENNESS_DIRECTION = np.array([
    # ... values ...
])

5.2 Creating utility functions to use these directions

Implemented get_eye_direction and get_combined_eye_direction in eye_directions.py

Step 6: Creating a Simplified Implementation

6.1 Creating a streamlined notebook

Developed eye_manipulation_simple.ipynb with:
- Simplified model initialization
- CPU-compatible operations
- Pre-computed eye directions
- Better error handling

6.2 Adding a direct image processing alternative

Created simple_eye_manipulation.py as a fallback solution:

def manipulate_eyes_simple(img_array, operation="enlarge", strength=0.5):
    # Simplified eye manipulation using image processing
    # ... image processing code ...

Step 7: Documentation and Integration

7.1 Creating comprehensive documentation

Created detailed README files
Added TROUBLESHOOTING.md for common issues
Added TECHNICAL_DETAILS.md for implementation details

7.2 Creating batch files for easy execution

Added run_eye_manipulation_simple.bat for Windows
Added run_simple.bat for the direct approach

7.3 Creating diagnostic utilities

Added detailed error messages
Created functions to inspect tensor shapes
Added checks for common issues

Step 8: Testing and Verification

8.1 Testing on CPU-only environment

Verified face generation worked correctly
Tested eye manipulation with different strengths
Tested with different random seeds

8.2 Handling edge cases

Tested with different tensor dimensions
Added robust error handling
Verified operation with different model configurations

Step 9: Final Optimization

9.1 Memory usage improvements

Reduced model size to 128x128 for CPU
Reduced channel multiplier
Added optimized noise generation

9.2 Code quality improvements

Added detailed comments
Improved error handling
Made functions more robust to different inputs

Summary of Results

The complete process resulted in:

Fixed Implementation:
- Properly generates realistic face images
- Successfully manipulates eye features
- Runs on CPU-only environments
Multiple Solutions:
- StyleGAN2-based approach with pre-computed directions
- Direct image processing alternative
- Robust documentation and troubleshooting guides
Key Improvements:
- Replaced CUDA operations with CPU equivalents
- Fixed tensor dimension handling
- Improved weight loading process
- Added pre-computed eye feature directions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step-by-Step Process: Fixing StyleGAN2 Eye Manipulation on CPU

Step 1: Problem Analysis

Issue Identification

Initial Diagnosis

Step 2: Implementing CPU-Compatible Operations

2.1 Creating CPU equivalents of CUDA operations

2.2 Patching model operations

Step 3: Fixing Tensor Dimension Issues

3.1 Diagnosing dimension problems

3.2 Fixing the `generate_face` function

3.3 Similar fixes for `manipulate_eyes` function

Step 4: Improving Weight Loading and Model Initialization

4.1 Creating a robust weight downloading utility

4.2 Better weight loading approach

Step 5: Creating Pre-computed Eye Feature Directions

5.1 Creating eye direction vectors

5.2 Creating utility functions to use these directions

Step 6: Creating a Simplified Implementation

6.1 Creating a streamlined notebook

6.2 Adding a direct image processing alternative

Step 7: Documentation and Integration

7.1 Creating comprehensive documentation

7.2 Creating batch files for easy execution

7.3 Creating diagnostic utilities

Step 8: Testing and Verification

8.1 Testing on CPU-only environment

8.2 Handling edge cases

Step 9: Final Optimization

9.1 Memory usage improvements

9.2 Code quality improvements

Summary of Results

FilesExpand file tree

PROCESS_DOCUMENTATION.md

Latest commit

History

PROCESS_DOCUMENTATION.md

File metadata and controls

Step-by-Step Process: Fixing StyleGAN2 Eye Manipulation on CPU

Step 1: Problem Analysis

Issue Identification

Initial Diagnosis

Step 2: Implementing CPU-Compatible Operations

2.1 Creating CPU equivalents of CUDA operations

2.2 Patching model operations

Step 3: Fixing Tensor Dimension Issues

3.1 Diagnosing dimension problems

3.2 Fixing the generate_face function

3.3 Similar fixes for manipulate_eyes function

Step 4: Improving Weight Loading and Model Initialization

4.1 Creating a robust weight downloading utility

4.2 Better weight loading approach

Step 5: Creating Pre-computed Eye Feature Directions

5.1 Creating eye direction vectors

5.2 Creating utility functions to use these directions

Step 6: Creating a Simplified Implementation

6.1 Creating a streamlined notebook

6.2 Adding a direct image processing alternative

Step 7: Documentation and Integration

7.1 Creating comprehensive documentation

7.2 Creating batch files for easy execution

7.3 Creating diagnostic utilities

Step 8: Testing and Verification

8.1 Testing on CPU-only environment

8.2 Handling edge cases

Step 9: Final Optimization

9.1 Memory usage improvements

9.2 Code quality improvements

Summary of Results

3.2 Fixing the `generate_face` function

3.3 Similar fixes for `manipulate_eyes` function