AMD predicts future AI PCs will execute 30B parameter models at 100 tokens per second

Analysis AMD expects to have notebook chips within a few years that can locally execute 30 billion parameters in major languages ​​at a rate of 100 tokens per second. Achieving this goal – which also requires 100ms first token latency – isn’t as simple as it sounds. It will require optimizations on both the software … Read more