Are you familiar with the difference between discrete learning and predictive learning? This distinction is exactly why LLM models are not designed to perform and execute function calls, they are not the right shape for it. LLM models are prediction machines. Function calling requires discrete learning machines. Fortunately, you can easily couple an LLM model with a discrete learning algorithm. It is beyond easy to do, you simply need to know the math to do it. Want to dive deeper into this subject? Check out this video.
Here is how we can calculate the size of any LLM model:
Each parameter in LLM models is typically stored as a floating-point number. The size of each parameter in bytes depends on the precision.
32-bit precision: Each parameter takes 4 bytes. 16-bit precision: Each parameter takes 2 bytes
To calculate the total memory usage of the model: Memory usage (in bytes) = No. of Parameters Γ Size of Each Parameter
For example: 32-bit Precision (FP32) In 32-bit floating-point precision, each parameter takes 4 bytes. Memory usage in bytes = 1 billion parameters Γ 4 bytes 1,000,000,000 Γ 4 = 4,000,000,000 bytes In gigabytes: β 3.73 GB
16-bit Precision (FP16) In 16-bit floating-point precision, each parameter takes 2 bytes. Memory usage in bytes = 1 billion parameters Γ 2 bytes 1,000,000,000 Γ 2 = 2,000,000,000 bytes In gigabytes: β 1.86 GB
It depends on whether you use 32-bit or 16-bit precision, a model with 1 billion parameters would use approximately 3.73 GB or 1.86 GB of memory, respectively.