Optimize Int8 Woq for CPU by yanbing-j · Pull Request #161 · meta-pytorch/gpt-fast

yanbing-j · 2024-04-23T06:10:48Z

This PR is to optimize Int8 Woq both in gpt-fast and mixtral-moe.

At the current stage, we use torch.ops.aten._weight_int8pack_mm as an workaround. And this workaround will be removed when pytorch/pytorch#120985 is merged in PyTorch stable release. Meanwhile, update int8 weight dimension according to torch.ops.aten._weight_int8pack_mm in pytorch/pytorch#118056 and add CPU profiling.

update int4 weight dim Add CPU profiling

yanbing-j · 2024-04-23T06:13:59Z

@HDCharles could you please take a look? Thanks!

yanbing-j · 2024-05-07T05:26:22Z

Hi @yanboliang , could you please take a look? Thanks!

Add int8 Woq for CPU

1e51c00

update int4 weight dim Add CPU profiling

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Int8 Woq for CPU#161

Optimize Int8 Woq for CPU#161
yanbing-j wants to merge 1 commit into
meta-pytorch:mainfrom
yanbing-j:yanbing/int8_woq_profile_cpu

yanbing-j commented Apr 23, 2024

Uh oh!

yanbing-j commented Apr 23, 2024

Uh oh!

yanbing-j commented May 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yanbing-j commented Apr 23, 2024

Uh oh!

yanbing-j commented Apr 23, 2024

Uh oh!

yanbing-j commented May 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants