flan-t5-large-grammar-synthesis - GGUF

GGUF files for flan-t5-large-grammar-synthesis for use with Ollama, llama.cpp, or any other framework that supports t5 models in GGUF format.

This repo contains mostly 'higher precision'/larger quants, as the point of this model is for grammar/spelling correction and will be rather useless in low precision with incorrect fixes etc.

Refer to the original repo for more details.

Usage

You can use the GGUFs with llamafile (or llama-cli) like this:

llamafile.exe -m grammar-synthesis-Q6_K.gguf --temp 0 -p "There car broke down so their hitching a ride to they're class."

and it will output the corrected text:

system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 8192, n_batch = 2048, n_predict = -1, n_keep = 0


 The car broke down so they had to take a ride to school. [end of text]


llama_print_timings:        load time =     782.21 ms
llama_print_timings:      sample time =       0.23 ms /    16 runs   (    0.01 ms per token, 68376.07 tokens per second)
llama_print_timings: prompt eval time =      85.08 ms /    19 tokens (    4.48 ms per token,   223.33 tokens per second)
llama_print_timings:        eval time =     341.74 ms /    15 runs   (   22.78 ms per token,    43.89 tokens per second)
llama_print_timings:       total time =     456.56 ms /    34 tokens
Log end

If you have a GPU, be sure to add -ngl 9999 to your command to automatically place as many layers as the GPU can handle for faster inference.

Downloads last month
210
GGUF
Model size
783M params
Architecture
t5

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for pszemraj/flan-t5-large-grammar-synthesis-gguf

Quantized
(1)
this model

Collection including pszemraj/flan-t5-large-grammar-synthesis-gguf