How to reduce GPU memory?
#2
by
ulrika-cyl
- opened
me too, why the memory usage of int4 model is up to 70G?
LMdeploy will pre-allocate kv cache. You can consider setting the cache-max-entry-count parameter to reduce the maximum GPU memory usage. See https://internvl.readthedocs.io/en/latest/internvl2.0/deployment.html#memory-usage-testing
czczup
changed discussion status to
closed