anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g

Hello guys, When I want to run model, when enter something on chat, I don't any any reply and get this error in cmd:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 8.00 GiB total capacity; 7.06 GiB already allocated; 0 bytes free; 7.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Output generated in 3.35 seconds (0.00 tokens/s, 0 tokens, context 376)
How I can fix that?
Note: I am using RTX3070.