m-a-p
/

MERT-v0

@@ -7,9 +7,35 @@ tags:
 # Introduction to our series work
-Our MAP pre-trained music model family:
-- a pre-trained MIR model [music2vec](https://huggingface.co/m-a-p/music2vec-v1) before, which shares similar model structure but has weaker performance.
-- a model trained with open-source music dataset [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public)
 # Introduction to this model

 # Introduction to our series work
+The development log of our Music Audio Pre-training (m-a-p) model family:
+- 17/03/2023: we release two advanced music understanding models, [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) and [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) , trained with new paradigm and dataset. They outperform the previous models and can better generalize to more tasks.
+- 14/03/2023: we retrained the MERT-v0 model with open-source-only music dataset [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public)
+- 29/12/2022: a music understanding model [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) trained with **MLM** paradigm, which performs better at downstream tasks.
+- 29/10/2022: a pre-trained MIR model [music2vec](https://huggingface.co/m-a-p/music2vec-v1) trained with **BYOL** paradigm.
+Here is a table for quick model pick-up:
+| Name                                                         | Pre-train Paradigm | Training Data (hour) | Pre-train Context   (second) | Model Size | Transformer Layer-Dimension | Feature Rate | Sample Rate | Release Date |
+| ------------------------------------------------------------ | ------------------ | -------------------- | ---------------------------- | ---------- | --------------------------- | ------------ | ----------- | ------------ |
+| [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) | MLM                | 160K                 | 5                            | 330M       | 24-1024                     | 75 Hz        | 24K Hz      | 17/03/2023   |
+| [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M)      | MLM                | 20K                  | 5                            | 95M        | 12-768                      | 75 Hz        | 24K Hz      | 17/03/2023   |
+| [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public) | MLM                | 900                  | 5                            | 95M        | 12-768                      | 50 Hz        | 16K Hz      | 14/03/2023   |
+| [MERT-v0](https://huggingface.co/m-a-p/MERT-v0)              | MLM                | 1000                 | 5                            | 95 M       | 12-768                      | 50 Hz        | 16K Hz      | 29/12/2023   |
+| [music2vec-v1](https://huggingface.co/m-a-p/music2vec-v1)    | BYOL               | 1000                 | 30                           | 95 M       | 12-768                      | 50 Hz        | 16K Hz      | 30/10/2022   |
+## Explanation
+The m-a-p models share the similar model architecture and the most distinguished difference is the paradigm in used pre-training. Other than that, there are several nuance technical configuration needs to know before using:
+- **Model Size**: the number of parameters that would be loaded to memory. Please select the appropriate size fitting your hardware.
+- **Transformer Layer-Dimension**: The number of transformer layers and the corresponding feature dimensions can be outputted from our model. This is marked out because features extracted by **different layers could have various performance depending on tasks**.
+- **Feature Rate**: Given a 1-second audio input, the number of features output by the model.
+- **Sample Rate**: The frequency of audio that the model is trained with.
 # Introduction to this model