Use fixed size for sub-quadratic chunking on MPS

Even if this causes chunks to be much smaller, performance isn't significantly impacted. This will usually reduce memory usage but should also help with poor performance when free memory is low.

Use fixed size for sub-quadratic chunking on MPS
Even if this causes chunks to be much smaller, performance isn't significantly impacted. This will usually reduce memory usage but should also help with poor performance when free memory is low.
abfa4ad8 · brkirch · 3163d126 · abfa4ad8
Commit abfa4ad8 authored May 08, 2023 by brkirch
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

modules/sd_hijack_optimizations.py modules/sd_hijack_optimizations.py +5 -1

No files found.
--- a/modules/sd_hijack_optimizations.py
+++ b/modules/sd_hijack_optimizations.py
 from __future__ import annotations
 import math
 import psutil
+import platform

 import torch
 from torch import einsum
@@ -427,7 +428,10 @@ def sub_quad_attention(q, k, v, q_chunk_size=1024, kv_chunk_size=None, kv_chunk_
    qk_matmul_size_bytes = batch_x_heads * bytes_per_token * q_tokens * k_tokens

    if chunk_threshold is None:
-        chunk_threshold_bytes = int(get_available_vram() * 0.9) if q.device.type == 'mps' else int(get_available_vram() * 0.7)
+        if q.device.type == 'mps':
+            chunk_threshold_bytes = 268435456 * (2 if platform.processor() == 'i386' else bytes_per_token)
+        else:
+            chunk_threshold_bytes = int(get_available_vram() * 0.7)
    elif chunk_threshold == 0:
        chunk_threshold_bytes = None
    else: