PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking
Paper
•
2410.12375
•
Published
•
2
PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking
Note Paper on arXiv
Note Model produces thinking tokens before answering
Note Model produces both thinking and reflection tokens before answering
Note Llama tokenizer with special tokens added (thinking, reflection, scratchpad, response)