Summary: Edge AI Just Got Faster
June 14, 2023
So the only thing left to do at this point was to change the file format, so that mmap() generalized to all the models we were using.
We modified llama.cpp to load weights using mmap() instead of C++ standard I/O.
Thanks to him, we were able to delete all of the old standard i/o loader code at the end of the project, because every platform in our support vector was able to be supported by mmap() .
That’s because our conversion tools now turn multi-part weights into a single file.
However we’re still using the old C++ standard I/O code for the larger models.
Source Article
Edge AI Just Got Faster
Using mmap() to load LLaMA faster in parallel with less memory.
Read the complete article at: justine.lol