# care-transformer
Mobile-optimized linear visual transformer with decoupled dual interaction mechanism
pip install care-transformer
```python
import torch
from care_transformer import CareTransformer
model = CareTransformer(
img_size=224,
patch_size=16,
in_chans=3,
num_classes=1000,
embed_dim=384,
depth=12,
num_heads=6,
linear_attn=True
)
x = torch.randn(1, 3, 224, 224)
out = model(x) # (1, 1000)
Linear complexity attention for mobile deployment. Decouples spatial and channel interactions to reduce FLOPs without tanking accuracy.
Key differences from vanilla ViT:
- O(n) attention instead of O(n²)
- Dual pathway: spatial-then-channel vs channel-then-spatial
- Optional grouped convolutions for token mixing
- Quantization-friendly architecture
Benchmarks on ImageNet-1K:
| Model | Params | FLOPs | Top-1 | Latency (mobile) |
|---|---|---|---|---|
| care-tiny | 5M | 1.2G | 76.4% | 12ms |
| care-small | 12M | 2.8G | 81.2% | 24ms |
| care-base | 28M | 6.1G | 83.7% | 48ms |
Latency measured on Snapdragon 888, INT8 quantized.
from care_transformer import CareTransformer
from care_transformer.utils import export_onnx, quantize_model
model = CareTransformer.from_pretrained('care-small')
# Export for mobile
export_onnx(model, 'care_small.onnx', opset=13)
# Post-training quantization
qmodel = quantize_model(
model,
calibration_loader=train_loader,
backend='qnnpack'
)
# Custom dual interaction block
from care_transformer.blocks import DualInteractionBlock
block = DualInteractionBlock(
dim=384,
num_heads=6,
spatial_first=True,
use_grouped_conv=True,
conv_groups=4
)
feat = torch.randn(8, 196, 384) # (B, N, C)
out = block(feat)MIT