--- library_name: peft base_model: HuggingFaceM4/idefics-9b-instruct --- # Model Card for Model ID This is a IDEFICS 9B model trained with ppo on the frozenlake env. ## Model Details ### Trainer Hyperparameters suppress_warnings: True debug: True seed: 9812 reseed_env: True torch_deterministic: True track: True wandb_project_name: "frozenlake_idefics" wandb_entity: null #'rl-team-unito' wandb_log_dir: "${now:%Y-%m-%d_%H-%M-%S}" save_video: True save_video_every: 20 save_stats: True save_episode: False env_size: 244 env_area: 8 num_prompt_images: 1 use_text_description: True # Algorithm specific arguments model: "HuggingFaceM4/idefics-9b-instruct" model_ckpt: null lora_adapter_path: null is_slippery: False fixed_orientation: True no_step_description: False first_person: True fov: 1 total_timesteps: 400000 disable_training: False from_accelerate_savestate_to_checkpoint: False learning_rate: 1e-5 critic_learning_rate: 1e-5 local_num_envs: 4 num_steps: 128 anneal_lr: False gamma: 0.99 gae_lambda: 0.95 num_minibatches: 128 update_epochs: 1 norm_adv: True clip_coef: 0.1 clip_vloss: True ent_coef: 0.01 #0.01 vf_coef: 0.5 max_grad_norm: 0.5 target_kl: null save_every: 50 gradient_accumulation: 4 adam_epsilon: 1e-8 gradient_ckpt: False lora: True temperature: 'max_logit' disable_adapters_for_generation: True normalization_by_words: False action_logits_from_whole_seq: True advanced_action_matching: False env_id: "FrozenLakeText-v0" # MiniGrid-LavaGapS7-v0 generate_actions: False value_prompt_template: "I am the agent in this minigrid world. {} Avoid the traps!\nWhat's the next best action?" action_template: " Based on the information provided, the next best action would be to {}" possible_actions_list: "forward pickup toggle opt_left opt_right opt_back" ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed] ### Framework versions - PEFT 0.10.0