LLaMA.cpp is a project on GitHub that implements inferencing of a LLaMA model in pure C/C++. The performance is pretty amazing given the limited hardware it can run on (even a Pi, if you have patience), and the author gives an explanation of how that’s even possible (hint: memory bandwidth).
You must log in or register to comment.