From kde-kimageshop Sat Mar 30 11:52:09 2024 From: Cornelius Schumacher Date: Sat, 30 Mar 2024 11:52:09 +0000 To: kde-kimageshop Subject: Re: Licensing for models and datasets Message-Id: <5c735ebc-cbe1-41b9-939a-95be22051992 () kde ! org> X-MARC-Message: https://marc.info/?l=kde-kimageshop&m=171180841217964 On 26.03.24 17:33, Volker Krause wrote: > On Montag, 25. März 2024 15:17:48 CET Halla Rempt wrote: >> We're looking into adding an experimental AI-based feature to Krita: >> automated inking. That gives us three components, and we're not sure about >> the license we should use for two of them: the model and the datase. Would >> CC be best here? > > Looking at https://community.kde.org/Policies/Licensing_Policy the closest > thing would either be "media" files (generalized to "data files") and thus CC- > BY-SA (and presumably CC-BY/CC0) or "source code" (xGPL, BSD/MIT). I don't think we can directly use the current licensing policy for ML models and datasets. But I suppose we should discuss extending it to cover these use cases as well. CC-BY or CC-BY-SA are not the best choice for data as their attribution requirements can make it impractical to work with data under these licenses. There are some good arguments why data should rather not be licensed at all (https://plus.pli.edu/Details/Details?fq=id:(352066-ATL2)). This would suggest to use CC0 as closest practical form of it. For models, attribution requirements seem to be less of an issue. But as Volker described the copyright situation is quite complicated and it's not clear yet, what consequences this will have in the future. From this point of view a permissive license could a good choice as it is likely to not create problems in the future. As the MIT is already mentioned in the licensing policy, maybe this is the best choice? In addition to the licensing itself it could also be good to consider how to convey more information about the openness of the system. Even if it wouldn't make a difference in terms of copyright for the user of a model, it still might be preferable to use models which are trained on free and open data. Some kind of labeling and making this transparent to end users could be a solution to that. In the context of the Sustainable Software goal we have a bit of discussion around the labeling. There are some ongoing efforts, such as OSI's attempt to define what Open AI actually should mean (https://opensource.org/deepdive), or Nextcloud's Ethical AI labeling system (https://nextcloud.com/blog/nextcloud-ethical-ai-rating/). Maybe it would be worth thinking about adopting something like that in KDE as well. Who would be interested to discuss this? We have it on the agenda for the upcoming Goals sprint end of April, but it might be worth extending this discussion if there is broader interest. -- Cornelius Schumacher