Caleb Fahlgren PRO

cfahlgren1

AI & ML interests

None yet

Recent Activity

New activity about 16 hours ago
open-acc/README
liked a dataset about 23 hours ago
duckdb-nsql-hub/duckdb-docs
updated a dataset about 23 hours ago
duckdb-nsql-hub/duckdb-docs

Articles

Organizations

cfahlgren1's activity

replied to their post 1 day ago
posted an update 1 day ago
view post
Post
605
You can create charts, leaderboards, and filters on top of any Hugging Face dataset in less than a minute

โ€ข ASCII Bar Charts ๐Ÿ“Š
โ€ข Powered by DuckDB WASM โšก
โ€ข Download results to Parquet ๐Ÿ’ฝ
โ€ข Embed and Share results with friends ๐Ÿ“ฌ

Do you have any interesting queries?
reacted to davanstrien's post with โค๏ธ 1 day ago
replied to their post 1 day ago
view reply

Heavy is the head that wears the crown

reacted to fracapuano's post with โค๏ธ 1 day ago
view post
Post
927
Sharing what we have built over the course of the weekend at the @llamameta hackathon, by Cerebral Valley in London ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‘‡

@gabrycina @calebgcc and I competed with 200+ participants and 50+ teams for a 24-hrs sprint centered around hacking for impact! We focused on applications of robotics to those in need of assisted living, moving our focus to enable greater autonomy and accessibility of robotics in everyday life.

complete list of assets ๐Ÿ‘‡
๐Ÿค— trained robotics policies
v1:
- fracapuano/moss-pills
- fracapuano/moss-cup
v2:
- fracapuano/meta-grasp

๐Ÿค— datasets
v1:
- fracapuano/pills
- fracapuano/cup
v2:
- fracapuano/cupim


You can find a live demo of our submission at: https://x.com/_fracapuano/status/1858102728691458554

If you want to know more about how we collected 100GB+ of data, trained multiple RL-policies using @lerobot and used Llama-3.2 models to handle user interactions and switch between tasks, go ahead and have a look! Also, don't be a stranger, and reach out ๐Ÿฆพ

Our project is fully open-source, for the community (and ourselves, ๐Ÿ‘จโ€๐Ÿณ) to build! A huge thank you to @cadene for the help (and the robot ๐Ÿคญ) - truly feeling these hugs-vibes ๐Ÿค— , and to @thomwolf and @clem for sharing our work across

Little extra:
โžก๏ธ Our ๐Ÿง EEG waves๐Ÿง -based control of the ๐Ÿฆพrobotic arm๐Ÿฆพ
reacted to LukeNeumann's post with ๐Ÿคฏ 1 day ago
view post
Post
1005
Nine years ago, I uploaded the first 8K resolution video to YouTube and I've been stockpiling 8K footage ever since: https://www.youtube.com/watch?v=sLprVF6d7Ug&t

Should @Overlaiapp release the first open-source 8K video dataset?

Could anyone even fine tune a model with this?๐Ÿ˜…
ยท
replied to LukeNeumann's post 1 day ago
view reply

Would be massive! Let us know if you need any help ๐Ÿค—

reacted to dvilasuero's post with ๐Ÿš€๐Ÿค— 1 day ago
posted an update 1 day ago
reacted to nyuuzyou's post with ๐Ÿ”ฅ 3 days ago
view post
Post
880
๐Ÿ–ผ๏ธ Introducing Public Domain Pictures Dataset - nyuuzyou/publicdomainpictures

Dataset highlights:
- 644,412 public domain images with comprehensive metadata from publicdomainpictures.net
- English language metadata including titles, descriptions, and keywords
- Each entry contains rich metadata including:
- Unique image ID and full-size image URLs
- Detailed titles and descriptions
- Keyword/tag collections
- Creator attribution
- Released to the public domain under Creative Commons Zero (CC0) license
  • 2 replies
ยท
posted an update 3 days ago
view post
Post
2030
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
ยท
replied to victor's post 3 days ago
view reply

There is no harm, but if I turn the notification on or off for an unapproved gated repo, an error message will appear.

Hey @John6666 , this should be fixed very soon.

reacted to erikkaum's post with ๐Ÿ”ฅ 3 days ago
view post
Post
1611
A while ago I started experimenting with compiling the Python interpreter to WASM.

To build a secure, fast, and lightweight sandbox for code execution โ€” ideal for running LLM-generated Python code.

- Send code simply as a POST request
- 1-2ms startup times

Hack away:
https://github.com/ErikKaum/runner
posted an update 6 days ago
view post
Post
2190
Why use Google Drive when you can have:

โ€ข Free storage with generous limits๐Ÿ†“
โ€ข Dataset Viewer (Sorting, Filtering, FTS) ๐Ÿ”
โ€ข Third Party Library Support
โ€ข SQL Console ๐ŸŸง
โ€ข Security ๐Ÿ”’
โ€ข Community, Reach, and Visibility ๐Ÿ“ˆ

It's a no brainer!

Check out our post on what you get instantly out of the box when you create a dataset.
https://huggingface.co/blog/researcher-dataset-sharing
  • 1 reply
ยท
replied to maxiw's post 7 days ago
view reply

Yeah for sure! Would be cool to see links to the leaderboards of these to see more than top 5 and see where most of the community is ๐Ÿ‘€ @maxiw

Maybe like top 100 or top 500 with sql console saved link

replied to m-ric's post 7 days ago
replied to m-ric's post 7 days ago
reacted to m-ric's post with ๐Ÿ‘€๐Ÿ”ฅ 7 days ago
view post
Post
3624
๐—ง๐—ต๐—ฒ ๐—ป๐—ฒ๐˜…๐˜ ๐—ฏ๐—ถ๐—ด ๐˜€๐—ผ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ป๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜ ๐Ÿฆ‹, ๐—ถ๐˜'๐˜€ ๐—›๐˜‚๐—ฏ ๐—ฃ๐—ผ๐˜€๐˜๐˜€! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

โš™๏ธ Computed with the great dataset maxiw/hf-posts
โš™๏ธ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me
ยท