![Tim Dettmers on Twitter: "I am a huge fan of einsum notation. Here is a multi-layer transformer in a couple lines of code (without norms though). I think it's simple to read, Tim Dettmers on Twitter: "I am a huge fan of einsum notation. Here is a multi-layer transformer in a couple lines of code (without norms though). I think it's simple to read,](https://pbs.twimg.com/media/E0pYEI_UUAAOtUx.png)
Tim Dettmers on Twitter: "I am a huge fan of einsum notation. Here is a multi-layer transformer in a couple lines of code (without norms though). I think it's simple to read,
![Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer](https://theaisummer.com/static/8f1a78345a3ac7398acce56f159b268f/58fee/einsum.png)
Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer
![Birchlabs on Twitter: "made #stablediffusion 19% faster on Mac by replacing einsum(…, q, k)*scale with baddbmm(…), and einsum(…, attn, v) with bmm(…). baddbmm is 99% faster than the einsum+multiply. bmm is 15% Birchlabs on Twitter: "made #stablediffusion 19% faster on Mac by replacing einsum(…, q, k)*scale with baddbmm(…), and einsum(…, attn, v) with bmm(…). baddbmm is 99% faster than the einsum+multiply. bmm is 15%](https://pbs.twimg.com/media/Fgw0aurXgAE9Qim.jpg:large)
Birchlabs on Twitter: "made #stablediffusion 19% faster on Mac by replacing einsum(…, q, k)*scale with baddbmm(…), and einsum(…, attn, v) with bmm(…). baddbmm is 99% faster than the einsum+multiply. bmm is 15%
![MPS] einsum returns incorrect matmul result on first invocation on nightly builds · Issue #85224 · pytorch/pytorch · GitHub MPS] einsum returns incorrect matmul result on first invocation on nightly builds · Issue #85224 · pytorch/pytorch · GitHub](https://user-images.githubusercontent.com/6141784/190876454-297dff29-6261-4ace-b97a-5e3c2d55589a.png)
MPS] einsum returns incorrect matmul result on first invocation on nightly builds · Issue #85224 · pytorch/pytorch · GitHub
torch.einsum 400x slower than numpy.einsum on a simple contraction · Issue #10661 · pytorch/pytorch · GitHub
![Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer](https://theaisummer.com/static/4cc18938d1acf254e759f2e2870e9964/ee604/einsum-attention.png)
Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer
![Tim Rocktäschel on Twitter: "In case you need convincing arguments for setting aside time to learn about einsum (https://t.co/2lA3Bsh53D) and Alex Rogozhnikov's einops (https://t.co/SY4yJAktEh). Screenshot taken from https://t.co/RsCX5P5NLv. https://t ... Tim Rocktäschel on Twitter: "In case you need convincing arguments for setting aside time to learn about einsum (https://t.co/2lA3Bsh53D) and Alex Rogozhnikov's einops (https://t.co/SY4yJAktEh). Screenshot taken from https://t.co/RsCX5P5NLv. https://t ...](https://pbs.twimg.com/media/ERS_Us0WsAEBf_u.jpg:large)