BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks
•
35
imgsys.org -- arena for text guided image generation
VLMEvalKit Evaluation Results Collection
Track, rank and evaluate open LLMs' CoT quality
View how beam search decoding works, in detail!
Jailbreak the LLM and privacy guardrails