Summary: Edge AI Just Got Faster

So the only thing left to do at this point was to change the file format, so that mmap() generalized to all the models we were using.

We modified llama.cpp to load weights using mmap() instead of C++ standard I/O.

Thanks to him, we were able to delete all of the old standard i/o loader code at the end of the project, because every platform in our support vector was able to be supported by mmap() .

That’s because our conversion tools now turn multi-part weights into a single file.

However we’re still using the old C++ standard I/O code for the larger models.

Source Article

Edge AI Just Got Faster

Using mmap() to load LLaMA faster in parallel with less memory.

Read the complete article at: justine.lol

Add a Comment

Your email address will not be published. Required fields are marked *