4 10

Max

reciprocate

maxreciprocate

AI & ML interests

Reward models

Recent Activity

liked a model 7 days ago

Qwen/QwQ-32B-Preview

liked a model about 1 month ago

stabilityai/stable-diffusion-3.5-large

View all activity

Organizations

reciprocate's activity

liked a model 7 days ago

Qwen/QwQ-32B-Preview

Text Generation • Updated 5 days ago • 33.6k • • 1k

liked a model about 1 month ago

stabilityai/stable-diffusion-3.5-large

Text-to-Image • Updated Oct 22 • 192k • • 1.46k

updated 2 datasets 3 months ago

reciprocate/tails-llama3.1-8b

Viewer • Updated Sep 9 • 1.12M • 46

reciprocate/ultra-annotated-200k

Viewer • Updated Sep 1 • 208k • 39

liked a Space 4 months ago

Running

🏢

Gpt-4o-mini Battles

liked 2 models 6 months ago

stabilityai/stable-diffusion-3-medium

Text-to-Image • Updated Aug 12 • 35.4k • 4.6k

stabilityai/stable-audio-open-1.0

Text-to-Audio • Updated Jul 31 • 28.5k • 981

updated 2 datasets 7 months ago

reciprocate/dpo-objective-v0.2

Viewer • Updated May 14 • 384 • 37

reciprocate/dpo-objective

Viewer • Updated May 14 • 512 • 41

New activity in allenai/reward-bench 7 months ago

fix(readme): rename `map` -> `filter` in code for selecting subset

#3 opened 7 months ago by

reciprocate

updated a dataset 7 months ago

reciprocate/tinygsm_interpreter_1M

Viewer • Updated May 6 • 1M • 54

liked 2 models 8 months ago

stabilityai/stablelm-2-1_6b-chat

Text Generation • Updated Jun 3 • 3.81k • 32

stabilityai/stablelm-2-12b-chat

Text Generation • Updated May 20 • 3.38k • 86

updated 3 datasets 8 months ago

reacted to euclaise's post with ❤️ 8 months ago

Post

Memphis: Advancing language model reasoning without relying on proprietary model outputs

Memphis is a series of models which advance human-data models, offering good performance without relying on proprietary model outputs (e.g. GPT-generated datasets). I've developed a new iterative finetuning procedure to improve the reasoning ability of these models beyond what is possible using only SFT on the same data.

Currently, I've released two models: Memphis-CoT-3B, and Memphis-scribe-3B.

To create these models, I've created new datasets:
- euclaise/reddit-instruct : A dataset of instruction/QA-like data scraped from Reddit. A curated version, filtered using Lilac and neural embedding models, is available at euclaise/reddit-instruct-curated
- euclaise/TinyCoT : TinyCoT is a mtea-dataset that aggregates a variety of different human-sourced reasoning data. It is a curated version of my previous MegaCoT dataset euclaise/MegaCoT, which contains 629k responses which get cut down to 28k for TinyCoT. There's also an intermediate version euclaise/MiniCoT, which has 129k responses.

Memphis-CoT is trained on reddit-instruct, a filtered version of oasst2 sablo/oasst2_curated, and TinyCoT. Multiple iterations were performed on TinyCoT, while reddit-instruct and oasst2 were only used for the initial model.

Memphis-scribe further finetunes Memphis-CoT on more creative tasks. It was finetuned from Memphis-CoT on 18 different datasets, including datasets like euclaise/WritingPrompts_curated, lemonilia/LimaRP, and more.

To prevent catastrophic forgetting, I used weight averaging between iterations.

- euclaise/Memphis-CoT-3B
- euclaise/Memphis-scribe-3B

2 replies

updated a model 8 months ago

reciprocate/mistral-7b-gsm8k-code-rm

Text Classification • Updated Mar 24 • 42 • 3

updated a dataset 8 months ago

reciprocate/tinygsm_mixtral_12M

Viewer • Updated Mar 24 • 12M • 96 • 1

updated a dataset 9 months ago

reciprocate/dpo_ultra-capybara-code_filtered-best

Viewer • Updated Mar 19 • 35.2k • 36 • 1