--- license: mit pipeline_tag: text-generation tags: - ONNX - DML - ONNXRuntime - phi3 - nlp - conversational - custom_code inference: false language: - en --- # EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-cpu-int4-rtn-block-32 ## Performance Metrics ### CPU-INT4-RTN-BLOCK-32 We measured the performance of CPU-INT4-RTN-BLOCK-32 on AMD Ryzen 9 7940HS /w Radeon 78 | Prompt Length | Generation Length | Average Throughput (tps) | |---------------------------|-------------------|-----------------------------| | 128 | 128 | - | | 128 | 256 | - | | 128 | 512 | - | | 128 | 1024 | - | | 256 | 128 | - | | 256 | 256 | - | | 256 | 512 | - | | 256 | 1024 | - | | 512 | 128 | - | | 512 | 256 | - | | 512 | 512 | - | | 512 | 1024 | - | | 1024 | 128 | - | | 1024 | 256 | - | | 1024 | 512 | - | | 1024 | 1024 | - |