[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-kimageshop
Subject:    Re: Sphinx Application Documentation - Image duplication
From:       Ben Cooksley <bcooksley () kde ! org>
Date:       2023-01-22 19:02:09
Message-ID: CA+XidOH=EJGYqK-_=4JjOSUO2CjB=_O_6srp7Zbj+F+kSYZ0qQ () mail ! gmail ! com
[Download RAW message or body]

On Mon, Jan 23, 2023 at 7:51 AM Julius Künzel <jk.kdedev@smartlab.uber.space>
wrote:

> Hi Ben, hi all,
>

Hi Julius,


>
> I did a little research about this recently and unfortunately it seems to
> me as if there is not really a solution on the Sphinx side. One need to
> have separate build dirs for every language and it copies all static files
> (css, js, images,..) to every build dir. That's just how it works :-/
> (Correct me in case anyone knows I am wrong).
> However we can of course try to solve this on our and and make our deploy
> tools smart in a way that they keep only one version of each image file and
> replace the others with symlinks.
> It should be more or less easy to detect images that are translated since
> they follow the pattern filename.de.png where "de" is the language code,
> so this image would be special for German, while for all other languages
> filename.png is used.
>
>
I had a very strong feeling that would be the case (very much seems that
Sphinx actually doesn't have proper i18n/l10n support and it's been hacked
in / bolted on later).

My initial thinking on a quick and (somewhat) dirty solution to this had
been to merge all of the image files into a single folder at top level and
then symlink that from each language.
Knowing that translated images actually have a separate filename convention
indicates that this might just be crazy enough to work.

Thoughts?


> I hope that helps so far. I might be able to look into this, but probably
> not very soon so if anybody else can work on this I am more than happy.
>
> Cheers,
> Julius
>

Regards,
Ben


>
> 15. Januar 2023 um 07:45, "Ben Cooksley" <bcooksley@kde.org
> <bcooksley@kde.org?to=%22Ben%20Cooksley%22%20%3Cbcooksley%40kde.org%3E>>
> schrieb:
>
> Hi all,
>
> For some time now it has been known to me that the system for generating
> application documentation websites using Sphinx with l10n support has had
> issues with duplicating data - particularly images.
>
> That leads to the following outcome, where aside from sites that we expect
> to be quite large (like www.kde.org and api.kde.org) all of the
> application documentation sites are quite big as well:
>
> root@nicoda /srv/www # du -h --max-depth=1 ./generated/ | grep G
> 2.3G    ./generated/cutehmi.kde.org
> 3.7G    ./generated/docs.digikam.org
> 2.4G    ./generated/api.kde.org
> 2.3G    ./generated/docs.krita.org
> 1.4G    ./generated/www.kde.org
> 7.9G    ./generated/docs.kdenlive.org
> 29G     ./generated/
>
> This stands in comparison to the Docbook documentation site for all other
> KDE applications:
>
> root@nicoda /srv/www # du -h --max-depth=1 . | grep G
> 29G     ./generated
> 16G     ./api.kde.org-legacy
> 6.0G    ./docs.kde.org
> 51G     .
>
> It would be nice if we could please look into some fixes for this, as it
> looks like Sphinx is duplicating the images - once for every language -
> when that isn't necessary.
> I could understand if the screenshots were updated as part of the
> translation, but it looks like they're not in the majority of cases - below
> being just a sample:
>
> root@nicoda /srv/www/generated/docs.krita.org # sha256sum
> zh_CN/_images/Krita_cpb_mixing.gif
> 12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7
>  zh_CN/_images/Krita_cpb_mixing.gif
> root@nicoda /srv/www/generated/docs.krita.org # sha256sum
> en/_images/Krita_cpb_mixing.gif
> 12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7
>  en/_images/Krita_cpb_mixing.gif
>
> While this isn't a massive issue right now, it is a future scalability
> issue as for Krita at least each language costs 178MB or so, while for
> Digikam that sits at 415MB per language and Kdenlive is 392MB.
>
> Many thanks,
> Ben
>
>
>
> Julius Künzel
> Volunteer KDE Developer, mainly hacking Kdenlive
> KDE GitLab: https://my.kde.org/user/jlskuz/
> Matrix: @jlskuz:kde.org
>
>

[Attachment #3 (text/html)]

<div dir="ltr"><div dir="ltr">On Mon, Jan 23, 2023 at 7:51 AM Julius Künzel &lt;<a \
href="mailto:jk.kdedev@smartlab.uber.space">jk.kdedev@smartlab.uber.space</a>&gt; \
wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><u></u><div><div>Hi Ben, hi \
all,</div></div></blockquote><div><br></div><div>Hi Julius,</div><div>  \
</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px \
solid rgb(204,204,204);padding-left:1ex"><div><div><br></div><div>I did a little \
research about this recently and unfortunately it seems to me as if there is not \
really a solution on the Sphinx side. One need to have separate build dirs for every \
language and it copies all static files (css, js, images,..) to every build dir. \
That&#39;s just how it works :-/ (Correct me in case anyone knows I am \
wrong).<br></div><div>However we can of course try to solve this on our and and make \
our deploy tools smart in a way that they keep only one version of each image file \
and replace the others with symlinks.</div><div>It should be more or less easy to \
detect images that are translated since they follow the pattern <code \
class="gmail-notranslate"><span>filename.de.png where &quot;de&quot; is the language \
code, so this image would be special for German, while for all other languages \
filename.png is used.</span></code></div><div><br></div></div></blockquote><div><br></div><div>I \
had a very strong feeling that would be the case (very much seems that Sphinx \
actually doesn&#39;t have proper i18n/l10n support and it&#39;s been hacked in / \
bolted on later).</div><div><br></div><div>My initial thinking on a quick and \
(somewhat) dirty  solution to this had been to merge all of the image files into a \
single folder at top level and then symlink that from each \
language.</div><div>Knowing that translated images actually have a separate filename \
convention indicates that this might just be crazy enough to \
work.</div><div><br></div><div>Thoughts?</div><div>  </div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div><div></div><div>I hope that helps so far. I \
might be able to look into this, but probably not very soon so if anybody else can \
work on this I am more than \
happy.</div><div><br></div><div>Cheers,</div><div>Julius<br></div></div></blockquote><div><br></div><div>Regards,</div><div>Ben</div><div> \
</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px \
solid rgb(204,204,204);padding-left:1ex"><div><div></div><div><code \
class="gmail-notranslate"><span><br></span></code></div><p>15. Januar 2023 um 07:45, \
&quot;Ben Cooksley&quot; &lt;<a \
href="mailto:bcooksley@kde.org?to=%22Ben%20Cooksley%22%20%3Cbcooksley%40kde.org%3E" \
target="_blank">bcooksley@kde.org</a>&gt; schrieb:</p><blockquote><div \
dir="ltr"><div>Hi all,</div><div><br></div><div>For some time now it has been known \
to me that the system for generating application documentation websites using Sphinx \
with l10n support has had issues with duplicating data - particularly \
images.</div><div><br></div><div>That leads to the following outcome, where aside \
from sites that we expect to be quite large (like <a href="http://www.kde.org/" \
rel="external nofollow noopener noreferrer" target="_blank">www.kde.org</a> and <a \
href="http://api.kde.org/" rel="external nofollow noopener noreferrer" \
target="_blank">api.kde.org</a>) all of the application documentation sites are quite \
big as well:</div><div><br></div><div><div>root@nicoda /srv/www # du -h --max-depth=1 \
./generated/ | grep G</div><div>2.3G      ./generated/<a \
href="http://cutehmi.kde.org/" rel="external nofollow noopener noreferrer" \
target="_blank">cutehmi.kde.org</a></div><div>3.7G      ./generated/<a \
href="http://docs.digikam.org/" rel="external nofollow noopener noreferrer" \
target="_blank">docs.digikam.org</a></div><div>2.4G      ./generated/<a \
href="http://api.kde.org/" rel="external nofollow noopener noreferrer" \
target="_blank">api.kde.org</a></div><div>2.3G      ./generated/<a \
href="http://docs.krita.org/" rel="external nofollow noopener noreferrer" \
target="_blank">docs.krita.org</a></div><div>1.4G      ./generated/<a \
href="http://www.kde.org/" rel="external nofollow noopener noreferrer" \
target="_blank">www.kde.org</a></div><div>7.9G      ./generated/<a \
href="http://docs.kdenlive.org/" rel="external nofollow noopener noreferrer" \
target="_blank">docs.kdenlive.org</a></div><div>29G       \
./generated/</div></div><div><br></div><div>This stands in comparison to the Docbook \
documentation site for all other KDE \
applications:</div><div><br></div><div><div>root@nicoda /srv/www # du -h \
--max-depth=1 . | grep G</div><div>29G       ./generated</div><div>16G       \
./api.kde.org-legacy</div><div>6.0G      ./<a href="http://docs.kde.org/" \
rel="external nofollow noopener noreferrer" \
target="_blank">docs.kde.org</a></div><div>51G       \
.</div></div><div><br></div><div>It would be nice if we could please look into some \
fixes for this, as it looks like Sphinx is duplicating the images - once for every \
language - when that isn&#39;t necessary.</div><div>I could understand if the \
screenshots were updated as part of the translation, but it looks like they&#39;re \
not in the majority of cases - below being just a \
sample:</div><div><br></div><div><div>root@nicoda /srv/www/generated/<a \
href="http://docs.krita.org/" rel="external nofollow noopener noreferrer" \
target="_blank">docs.krita.org</a> # sha256sum \
zh_CN/_images/Krita_cpb_mixing.gif</div><div>12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7 \
zh_CN/_images/Krita_cpb_mixing.gif</div><div>root@nicoda /srv/www/generated/<a \
href="http://docs.krita.org/" rel="external nofollow noopener noreferrer" \
target="_blank">docs.krita.org</a> # sha256sum \
en/_images/Krita_cpb_mixing.gif</div><div>12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7 \
en/_images/Krita_cpb_mixing.gif</div></div><div><br></div><div>While this isn&#39;t a \
massive issue right now, it is a future scalability issue as for Krita at least each \
language costs 178MB or so, while for Digikam that sits at 415MB per language and \
Kdenlive is 392MB.</div><div><br></div><div>Many \
thanks,</div><div>Ben</div></div></blockquote><div><br></div><div><br></div><div><div>Julius \
Künzel</div><div>Volunteer KDE Developer, mainly hacking Kdenlive</div><div>KDE \
GitLab: <a href="https://my.kde.org/user/jlskuz/" \
target="_blank">https://my.kde.org/user/jlskuz/</a></div><div>Matrix: @jlskuz:<a \
href="http://kde.org" \
target="_blank">kde.org</a></div></div><div><br></div></div></blockquote></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic