Why does this model have an Apache 2.0 license?

#8
by Stealcase - opened

This model suffers from all the same legal issues described here which should proclude this model from being licensed with apache-2.0.
https://huggingface.co/norallm/normistral-7b-scratch/discussions/3

but with the added issue of being based on the Mistral, which in itself is licensed with an Apache 2.0 license without justifying that licensing scheme. Unfortunately it is very likely that Mistral is based on data which has not been licenced for this purpose, and have just avoided disclosing the dataset used in training and instead opted for commercial viability.

Echoing Mistral's license instead of licensing for research use only, is a misstep for the Norallm project when using a model with dubious sources.

Norwegian Large Language Models org

Thanks for your comment.

We are in an ongoing communication with legal experts on this topic.
However, our current understanding is that large language models do not necessarily directly inherit licensing limitations of the training data. No data is re-distributed by us in any way.

Based on your communication, I am very concerned that my critique is being dismissed.

  • Norway does not have Fair Use
  • You are using Exceptions to copyright for Research as the legal basis for accessing Copyrighted material to train LLMs in the first place
  • You are then abusing those exceptions by releasing the research artifact (that could only be created due to the exceptions) as tech that can be used commercially.

I elaborate here in more detail.
https://huggingface.co/norallm/normistral-7b-scratch/discussions/3#668ae1c97d3f73951e0f8710

Sign up or log in to comment