Legal Issue: Clarifying the Legal Basis for Model Release
The legal situation about this model is unclear to me. The model card says this was trained on LAION-2B-en. Is the organization (which manages huggingface.com/laion) implying that this model is a derivative work of that dataset? Either way, could the dataset card be updated with an explicit statement about this?
This will help a lot to determine how I can use & deploy this model in practice. Thank you!
(EDIT: Changed LAION-5B to LAION-2B-en to be more specific.)
The model card is very clear in saying this model was trained on laion2B-en.
You may ask your lawyer for further interpretation if the term "derivative work" could apply.
Hi @rom1504 ,
Thank you for your quick reply!
LAION is a non-profit registered in Germany, and thus the R&D you're doing falls under both European and German regulations. This means the organization has well defined responsibilities when it comes to the handling of copyrighted data, and also personally identifying information.
The specifics of your reply are very concerning, because it indicates either 1) LAION doesn't know and didn't handle the release with due care then, or 2) LAION does know and is simply refusing to handle an important and honest question with due care now. As both a rightsholder and a data-subject in a European jurisdiction (whose data is involved here), you have legal responsibilites towards me β and this is not a great start.
Could I ask you to escalate the issue to someone who has better legal knowledge of the situation, for instance the managing director of LAION?
Thank you!
P.S. I appreciate your work and the R&D done by LAION very much; that's not a question here!
Here is a legal paper on this legal question https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4022665 if you are interested to look into this open legal question that applies to almost all machine learning models, including almost all of those stored in huggingface
@alexjc LAION has a form to address those responsibilities w.r.t. the dataset at https://laion.ai/dataset-requests/
As for the model weights, whether 'derivative work' applies, etc. Nobody is trying to be dismissive, that truly is a question for the lawers and the scope of such a question covers most ML models.
Thank you both. I'd like to use these models and I'd like to use these models legally!
As such, I'm doing my due diligence and trying to assess the legal basis through which LAION released these CLIP models β which are trained on datasets that include personally identifying information of Europeans and data from rightsholders in Europe (including mine). Since you say it's an open legal question whether it's a derivative work or not (fair enough), then there must be another legal basis for LAION being able to release. I'd like to know what that legal basis is.
@rwightman I posted about this publicly because it's HuggingFace's default when you want to raise a legal issue. I also acknowledge it's in the DNA of organizations like LAION and HuggingFace to operate publicly, and I have tremendous respect for that, so I made this request public.
Doing research in the open and publicly is amazing and commendable, but also comes with a significantly higher bar for questions like this β as companies operating in a closed manner do not have to deal with the implications of transparency.
I will contact Christophe @ LAION via the website if that's the preferred approach, but ultimately the answer to the question should be made public here.
(Email sent to [email protected] just now.)
Got one reply from Christoph, still waiting for the follow-up where the question of the legal basis for release is addressed.
UPDATE: I'm still waiting for a reply from Christoph @ LAION. His first email didn't answer the question, he seemed quite dismissive that a legal basis for models was even necessary.
Now I've been waiting for weeks without even reply acknowledgement. Is Christoph always this negligent? Is he the right person for LAION to have as the one legally responsible?