This document outlines the step-by-step process that was followed to fix the StyleGAN2 eye manipulation implementation on CPU, which was initially producing pink/magenta noise instead of proper face images.
- The original implementation was producing random pink/magenta noise instead of proper face images
- Error messages pointed to tensor dimension mismatches
- CUDA-specific operations were failing on CPU
- Pretrained weights weren't loading correctly
- Verified that the issue was related to tensor operations rather than general code structure
- Examined tensor dimensions throughout the generation process
- Identified missing or incorrectly implemented CPU-specific operations
- Checked weight loading and model initialization
- Created
cpu_ops.pywith simplified replacements for CUDA functions:def cpu_upfirdn2d(input, kernel, up=1, down=1, pad=(0, 0)): out = input if up > 1: out = F.interpolate(out, scale_factor=up, mode='nearest') if len(pad) > 0: pad_sizes = [pad[0], pad[1], pad[0], pad[1]] out = F.pad(out, pad_sizes, mode='reflect') if down > 1: out = F.avg_pool2d(out, down, stride=down) return out
- Replaced StyleGAN2's CUDA operations with CPU equivalents:
import model model.upfirdn2d = cpu_upfirdn2d model.conv2d_gradfix.conv2d = F.conv2d model.conv2d_gradfix.conv_transpose2d = F.conv_transpose2d model.NoiseInjection = SafeNoiseInjection model.ToRGB = SafeToRGB model.FusedLeakyReLU = CPULeakyReLU model.fused_leaky_relu = cpu_leaky_relu
- Added dimension debugging code:
print(f"Latent vector shape: {latent_w.shape}") print(f"Mean latent shape: {mean_latent.shape}")
- Updated the function to handle all possible tensor shapes:
def generate_face(latent_w=None, noise=None, truncation=0.7, seed=None): with torch.no_grad(): # Get latent vector if not provided if latent_w is None: latent_w = get_latent_vector(seed=seed, truncation=truncation) # Handle different latent shape scenarios if latent_w.ndim == 3 and latent_w.shape[1] == g_ema.n_latent: # Already in correct format [batch_size, n_latent, latent_dim] pass elif latent_w.ndim == 2: # Format is [batch_size, latent_dim] latent_w = latent_w.unsqueeze(1).repeat(1, g_ema.n_latent, 1) elif latent_w.ndim == 1: # Format is [latent_dim] latent_w = latent_w.unsqueeze(0).unsqueeze(0).repeat(1, g_ema.n_latent, 1)
- Updated the eye manipulation function in the same way, with proper dimension handling
- Implemented
download_models.pyto fetch official weights:def download_model(model_name, output_dir="models"): url = AVAILABLE_MODELS[model_name] output_path = os.path.join(output_dir, f"{model_name}.pkl") # Download with proper error handling response = requests.get(url, stream=True) if response.status_code == 200: with open(output_path, 'wb') as f: for chunk in response.iter_content(chunk_size=8192): if chunk: f.write(chunk)
- Added multiple fallbacks for weight loading:
try: with open('stylegan2-ffhq-config-f.pkl', 'rb') as f: data = pickle.load(f) if isinstance(data, dict) and 'g_ema' in data: # PyTorch format g_ema.load_state_dict(data['g_ema']) elif isinstance(data, dict) and 'g' in data: # TensorFlow format # Conversion code for TensorFlow weights else: # Handle unknown format
- Defined pre-computed direction vectors for different eye features:
# Direction for eye size EYE_SIZE_DIRECTION = np.array([ 0.0, 0.02, 0.1, 0.04, 0.08, 0.15, 0.2, 0.1, 0.02, 0.01, # 0-9 # ... more values ... ]) # Direction for eye openness EYE_OPENNESS_DIRECTION = np.array([ # ... values ... ])
- Implemented
get_eye_directionandget_combined_eye_directionineye_directions.py
- Developed
eye_manipulation_simple.ipynbwith:- Simplified model initialization
- CPU-compatible operations
- Pre-computed eye directions
- Better error handling
- Created
simple_eye_manipulation.pyas a fallback solution:def manipulate_eyes_simple(img_array, operation="enlarge", strength=0.5): # Simplified eye manipulation using image processing # ... image processing code ...
- Created detailed README files
- Added TROUBLESHOOTING.md for common issues
- Added TECHNICAL_DETAILS.md for implementation details
- Added
run_eye_manipulation_simple.batfor Windows - Added
run_simple.batfor the direct approach
- Added detailed error messages
- Created functions to inspect tensor shapes
- Added checks for common issues
- Verified face generation worked correctly
- Tested eye manipulation with different strengths
- Tested with different random seeds
- Tested with different tensor dimensions
- Added robust error handling
- Verified operation with different model configurations
- Reduced model size to 128x128 for CPU
- Reduced channel multiplier
- Added optimized noise generation
- Added detailed comments
- Improved error handling
- Made functions more robust to different inputs
The complete process resulted in:
-
Fixed Implementation:
- Properly generates realistic face images
- Successfully manipulates eye features
- Runs on CPU-only environments
-
Multiple Solutions:
- StyleGAN2-based approach with pre-computed directions
- Direct image processing alternative
- Robust documentation and troubleshooting guides
-
Key Improvements:
- Replaced CUDA operations with CPU equivalents
- Fixed tensor dimension handling
- Improved weight loading process
- Added pre-computed eye feature directions