Authorship Metadata support added to converter script, you may want to add the ability to add metadata overrides

#104
by mofosyne - opened

This PR was recently pushed in which would allow for users to add their own metadata via --metadata argument which may accept a file that looks like this metadata override file to fill in the new fields that you can find in the gguf spec. If you want to see the exact steps I used to generate a gguf so you can see it in practice you can look at this bash script

Also it will now generate a filename that matches a GGUF Naming Convention (documentation still needs some updating regarding that).

Would you like to add extra fields in your interface, which would add extra authorship (or fix certain authorship errors?) to the resultant gguf file?

See if the interface makes sense to you or if you would like some adjustment to make it more convenient to preserve authorship data. By the way, we tried to grab as much authorship data from your modelcard, but there doesn't seem to be any consistent method to do so (e.g. author name, org, doi etc...).

Oh and if you got extra time, it be good to hear your opinion about autogenerating uuids in this PR https://github.com/ggerganov/llama.cpp/pull/8565#issuecomment-2238657884

ggml.ai org

hey hey @mofosyne - deffo supportive of this. from my understanding, we'd need to provide the metadata.json right? (sorry if I did not understand this correctly)

Yeah you are recommended to fill in the metadata override file for the --metadata argument like for example shown below

./llama.cpp/convert_hf_to_gguf.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16 --verbose

This is because we can't extract all the correct context/metadata from your huggingface model card at this stage (via our heuristics) and would prefer that you supply the correct details directly. I don't actually intend for you to place the metadata.json in the actual repository as it's meant to be baked into the gguf file itself. The metadata override is optional as we cannot expect everyone to be diligent enough to fill it out completely and don't want to block people from contributing... instead if missing... we will try to guess as much as we can from your repo.

I've placed some details about what you may want to fill in the file in https://github.com/ggerganov/llama.cpp/wiki/Metadata-Override . I think the general public should be able to edit it, so if you find this feature useful, definitely improve this page so others can more easily make use of this feature.

But if you can... definitely push to have the model card metadata to be more standardized, so we can extract more useful information out of it for the gguf metadata.


If you are good with python, you can check how we are approaching it via gguf-py/gguf/metadata.py in llama.cpp

Sign up or log in to comment