Lin Tan's picture

14 8

Lin Tan

lin-tan

·

https://www.cs.purdue.edu/homes/lintan/

AI & ML interests

AI-Software Synergy. LLM4Code (binary and source code). Mary J. Elmore New Frontiers Professor Purdue University

Recent Activity

liked a dataset about 15 hours ago

lt-asset/REPOCOD_Lite

Reacted to their post with 🔥 3 days ago

Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks. - Leaderboard https://lt-asset.github.io/REPOCOD/ - Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD @jiang719 @shanchao @Yiran-Hu1007 Compared to #SWEBench, RepoCod tasks are - General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues. - With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8). Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with - Whole function generation - Repository-level context - Validation with test cases, and - Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00) Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with: - 67 repository-level, 67 file-level, and 66 self-contains tasks - Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens) - GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks. - Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD_Lite #LLM4code #LLM #CodeGeneration #Security

Reacted to their post with 🤗 3 days ago

Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks. - Leaderboard https://lt-asset.github.io/REPOCOD/ - Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD @jiang719 @shanchao @Yiran-Hu1007 Compared to #SWEBench, RepoCod tasks are - General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues. - With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8). Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with - Whole function generation - Repository-level context - Validation with test cases, and - Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00) Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with: - 67 repository-level, 67 file-level, and 66 self-contains tasks - Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens) - GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks. - Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD_Lite #LLM4code #LLM #CodeGeneration #Security

View all activity

Organizations

lin-tan's activity

liked a dataset about 15 hours ago

lt-asset/REPOCOD_Lite

Viewer • Updated 7 days ago • 200 • 19 • 1

liked a model 24 days ago

lt-asset/Waffle_VLM_WebSight

Updated about 1 month ago • 229 • 12

liked a dataset 25 days ago

lt-asset/REPOCOD

Viewer • Updated 20 days ago • 980 • 184 • 5

liked a dataset 26 days ago

lt-asset/collu-bench

Viewer • Updated Oct 13 • 13.2k • 95 • 4

liked 2 models about 2 months ago

lt-asset/nova-6.7b

Feature Extraction • Updated Oct 8 • 26 • 3

lt-asset/nova-6.7b-bcr

Updated Oct 8 • 80 • 3

liked 2 models 2 months ago

lt-asset/nova-1.3b-bcr

Text Generation • Updated Oct 8 • 124 • 4

lt-asset/nova-1.3b

Text Generation • Updated Oct 8 • 86 • 4