Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
14
8
Lin Tan
lin-tan
Follow
AbidaKing1234's profile picture
jiang719's profile picture
21world's profile picture
7 followers
·
7 following
https://www.cs.purdue.edu/homes/lintan/
Lin0Tan
lin-tan
lintan
AI & ML interests
AI-Software Synergy. LLM4Code (binary and source code). Mary J. Elmore New Frontiers Professor Purdue University
Recent Activity
liked
a dataset
about 15 hours ago
lt-asset/REPOCOD_Lite
Reacted to
their
post
with 🔥
3 days ago
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks. - Leaderboard https://lt-asset.github.io/REPOCOD/ - Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD @jiang719 @shanchao @Yiran-Hu1007 Compared to #SWEBench, RepoCod tasks are - General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues. - With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8). Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with - Whole function generation - Repository-level context - Validation with test cases, and - Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00) Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with: - 67 repository-level, 67 file-level, and 66 self-contains tasks - Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens) - GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks. - Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD_Lite #LLM4code #LLM #CodeGeneration #Security
Reacted to
their
post
with 🤗
3 days ago
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks. - Leaderboard https://lt-asset.github.io/REPOCOD/ - Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD @jiang719 @shanchao @Yiran-Hu1007 Compared to #SWEBench, RepoCod tasks are - General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues. - With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8). Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with - Whole function generation - Repository-level context - Validation with test cases, and - Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00) Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with: - 67 repository-level, 67 file-level, and 66 self-contains tasks - Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens) - GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks. - Dataset: https://huggingface.co/datasets/lt-asset/REPOCOD_Lite #LLM4code #LLM #CodeGeneration #Security
View all activity
Organizations
lin-tan
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a dataset
about 15 hours ago
lt-asset/REPOCOD_Lite
Viewer
•
Updated
7 days ago
•
200
•
19
•
1
liked
a model
24 days ago
lt-asset/Waffle_VLM_WebSight
Updated
about 1 month ago
•
229
•
12
liked
a dataset
25 days ago
lt-asset/REPOCOD
Viewer
•
Updated
20 days ago
•
980
•
184
•
5
liked
a dataset
26 days ago
lt-asset/collu-bench
Viewer
•
Updated
Oct 13
•
13.2k
•
95
•
4
liked
2 models
about 2 months ago
lt-asset/nova-6.7b
Feature Extraction
•
Updated
Oct 8
•
26
•
3
lt-asset/nova-6.7b-bcr
Updated
Oct 8
•
80
•
3
liked
2 models
2 months ago
lt-asset/nova-1.3b-bcr
Text Generation
•
Updated
Oct 8
•
124
•
4
lt-asset/nova-1.3b
Text Generation
•
Updated
Oct 8
•
86
•
4