[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-kimageshop
Subject:    Re: Sphinx Application Documentation - Image duplication
From:       Ben Cooksley <bcooksley () kde ! org>
Date:       2023-01-22 20:42:07
Message-ID: CA+XidOH+zLjGi3uZSqrv2o452KbJVOO01RmDYp+VHbFtgs718Q () mail ! gmail ! com
[Download RAW message or body]

On Mon, Jan 23, 2023 at 8:59 AM L. E. Segovia <amy@amyspark.me> wrote:

> Hi all,
>

Hi Amy,


>
> If I understand correctly, by doing what you said you would instead have
> a copy of each image per supported language-- only that squashed into a
> massive monolithic folder. Did you instead mean to symlink the
> "localized" image files to the source copies?
>

From what Julius has found, all translated images have the language code
injected into the filename - with untranslated images being left unchanged.

Therefore, as pseudo-code:
- mv $root/en/_images/* $root/_images/
- mv $root/it/_images/* $root/_images/
- mv $root/fr/_images/* $root/_images/

Should achieve the objective of de-duplicating all of the images, as the
untranslated English screenshots should all overwrite each other - leaving
just a single copy of the English screenshots and all the translated ones
behind.


>
> Another idea I have is to preserve the localization step as is, but
> ignore the generated image folder, and in a postbuild step replace the
> <img src="localized image path"> with the path to the source folder. (I
> do something like this with a HTMLPipeline filter for my blog's emojis.)
>
> Cheers,
>
> amyspark
>

Cheers,
Ben


>
>
> PS: I've trimmed the CC as I wasn't sure if I should mail four lists at
> once. Feel free to forward the email if necessary.
>
> On 22/01/2023 16:02, Ben Cooksley wrote:
> > On Mon, Jan 23, 2023 at 7:51 AM Julius Künzel
> > <jk.kdedev@smartlab.uber.space <mailto:jk.kdedev@smartlab.uber.space>>
> > wrote:
> >
> >     __
> >     Hi Ben, hi all,
> >
> >
> > Hi Julius,
> >
> >
> >
> >     I did a little research about this recently and unfortunately it
> >     seems to me as if there is not really a solution on the Sphinx side.
> >     One need to have separate build dirs for every language and it
> >     copies all static files (css, js, images,..) to every build dir.
> >     That's just how it works :-/ (Correct me in case anyone knows I am
> >     wrong).
> >     However we can of course try to solve this on our and and make our
> >     deploy tools smart in a way that they keep only one version of each
> >     image file and replace the others with symlinks.
> >     It should be more or less easy to detect images that are translated
> >     since they follow the pattern |filename.de.png where "de" is the
> >     language code, so this image would be special for German, while for
> >     all other languages filename.png is used.|
> >
> >
> > I had a very strong feeling that would be the case (very much seems that
> > Sphinx actually doesn't have proper i18n/l10n support and it's been
> > hacked in / bolted on later).
> >
> > My initial thinking on a quick and (somewhat) dirty solution to this had
> > been to merge all of the image files into a single folder at top level
> > and then symlink that from each language.
> > Knowing that translated images actually have a separate filename
> > convention indicates that this might just be crazy enough to work.
> >
> > Thoughts?
> >
> >
> >     I hope that helps so far. I might be able to look into this, but
> >     probably not very soon so if anybody else can work on this I am more
> >     than happy.
> >
> >     Cheers,
> >     Julius
> >
> >
> > Regards,
> > Ben
> >
> >
> >     |
> >     |
> >
> >     15. Januar 2023 um 07:45, "Ben Cooksley" <bcooksley@kde.org
> >     <mailto:bcooksley@kde.org?to=%22Ben%20Cooksley%22%20%3Cbcooksley%
> 40kde.org%3E>> schrieb:
> >
> >         Hi all,
> >
> >         For some time now it has been known to me that the system for
> >         generating application documentation websites using Sphinx with
> >         l10n support has had issues with duplicating data - particularly
> >         images.
> >
> >         That leads to the following outcome, where aside from sites that
> >         we expect to be quite large (like www.kde.org
> >         <http://www.kde.org/> and api.kde.org <http://api.kde.org/>) all
> >         of the application documentation sites are quite big as well:
> >
> >         root@nicoda /srv/www # du -h --max-depth=1 ./generated/ | grep G
> >         2.3G    ./generated/cutehmi.kde.org <http://cutehmi.kde.org/>
> >         3.7G    ./generated/docs.digikam.org <http://docs.digikam.org/>
> >         2.4G    ./generated/api.kde.org <http://api.kde.org/>
> >         2.3G    ./generated/docs.krita.org <http://docs.krita.org/>
> >         1.4G    ./generated/www.kde.org <http://www.kde.org/>
> >         7.9G    ./generated/docs.kdenlive.org <http://docs.kdenlive.org/
> >
> >         29G     ./generated/
> >
> >         This stands in comparison to the Docbook documentation site for
> >         all other KDE applications:
> >
> >         root@nicoda /srv/www # du -h --max-depth=1 . | grep G
> >         29G     ./generated
> >         16G     ./api.kde.org-legacy
> >         6.0G    ./docs.kde.org <http://docs.kde.org/>
> >         51G     .
> >
> >         It would be nice if we could please look into some fixes for
> >         this, as it looks like Sphinx is duplicating the images - once
> >         for every language - when that isn't necessary.
> >         I could understand if the screenshots were updated as part of
> >         the translation, but it looks like they're not in the majority
> >         of cases - below being just a sample:
> >
> >         root@nicoda /srv/www/generated/docs.krita.org
> >         <http://docs.krita.org/> # sha256sum
> >         zh_CN/_images/Krita_cpb_mixing.gif
> >         12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7
> >          zh_CN/_images/Krita_cpb_mixing.gif
> >         root@nicoda /srv/www/generated/docs.krita.org
> >         <http://docs.krita.org/> # sha256sum
> en/_images/Krita_cpb_mixing.gif
> >         12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7
> >          en/_images/Krita_cpb_mixing.gif
> >
> >         While this isn't a massive issue right now, it is a future
> >         scalability issue as for Krita at least each language costs
> >         178MB or so, while for Digikam that sits at 415MB per language
> >         and Kdenlive is 392MB.
> >
> >         Many thanks,
> >         Ben
> >
> >
> >
> >     Julius Künzel
> >     Volunteer KDE Developer, mainly hacking Kdenlive
> >     KDE GitLab: https://my.kde.org/user/jlskuz/
> >     <https://my.kde.org/user/jlskuz/>
> >     Matrix: @jlskuz:kde.org <http://kde.org>
> >
>
> --
> amyspark 🌸 https://www.amyspark.me
>

[Attachment #3 (text/html)]

<div dir="ltr"><div dir="ltr">On Mon, Jan 23, 2023 at 8:59 AM L. E. Segovia &lt;<a \
href="mailto:amy@amyspark.me">amy@amyspark.me</a>&gt; wrote:<br></div><div \
class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi \
all,<br></blockquote><div><br></div><div>Hi Amy,</div><div>  </div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"> <br>
If I understand correctly, by doing what you said you would instead have<br>
a copy of each image per supported language-- only that squashed into a<br>
massive monolithic folder. Did you instead mean to symlink the<br>
&quot;localized&quot; image files to the source \
copies?<br></blockquote><div><br></div><div>From what Julius has found, all \
translated images have the language code injected into the filename - with \
untranslated images being left unchanged.</div><div><br></div><div>Therefore, as \
pseudo-code:</div><div>- mv $root/en/_images/* $root/_images/</div><div>- mv \
$root/it/_images/* $root/_images/</div><div>- mv $root/fr/_images/* $root/_images/  \
</div><div><br></div><div>Should achieve the objective of de-duplicating all of the \
images, as the untranslated English screenshots should all overwrite each other - \
leaving just a single copy of the English screenshots and all the translated ones \
behind.</div><div>  </div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>
Another idea I have is to preserve the localization step as is, but<br>
ignore the generated image folder, and in a postbuild step replace the<br>
&lt;img src=&quot;localized image path&quot;&gt; with the path to the source folder. \
(I<br> do something like this with a HTMLPipeline filter for my blog&#39;s \
emojis.)<br> <br>
Cheers,<br>
<br>
amyspark<br></blockquote><div><br></div><div>Cheers,</div><div>Ben</div><div>  \
<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>
<br>
PS: I&#39;ve trimmed the CC as I wasn&#39;t sure if I should mail four lists at<br>
once. Feel free to forward the email if necessary.<br>
<br>
On 22/01/2023 16:02, Ben Cooksley wrote:<br>
&gt; On Mon, Jan 23, 2023 at 7:51 AM Julius Künzel<br>
&gt; &lt;<a href="mailto:jk.kdedev@smartlab.uber.space" \
target="_blank">jk.kdedev@smartlab.uber.space</a> &lt;mailto:<a \
href="mailto:jk.kdedev@smartlab.uber.space" \
target="_blank">jk.kdedev@smartlab.uber.space</a>&gt;&gt;<br> &gt; wrote:<br>
&gt; <br>
&gt;        __<br>
&gt;        Hi Ben, hi all,<br>
&gt; <br>
&gt; <br>
&gt; Hi Julius,<br>
&gt;   <br>
&gt; <br>
&gt; <br>
&gt;        I did a little research about this recently and unfortunately it<br>
&gt;        seems to me as if there is not really a solution on the Sphinx side.<br>
&gt;        One need to have separate build dirs for every language and it<br>
&gt;        copies all static files (css, js, images,..) to every build dir.<br>
&gt;        That&#39;s just how it works :-/ (Correct me in case anyone knows I \
am<br> &gt;        wrong).<br>
&gt;        However we can of course try to solve this on our and and make our<br>
&gt;        deploy tools smart in a way that they keep only one version of each<br>
&gt;        image file and replace the others with symlinks.<br>
&gt;        It should be more or less easy to detect images that are translated<br>
&gt;        since they follow the pattern |filename.de.png where &quot;de&quot; is \
the<br> &gt;        language code, so this image would be special for German, while \
for<br> &gt;        all other languages filename.png is used.|<br>
&gt; <br>
&gt; <br>
&gt; I had a very strong feeling that would be the case (very much seems that<br>
&gt; Sphinx actually doesn&#39;t have proper i18n/l10n support and it&#39;s been<br>
&gt; hacked in / bolted on later).<br>
&gt; <br>
&gt; My initial thinking on a quick and (somewhat) dirty  solution to this had<br>
&gt; been to merge all of the image files into a single folder at top level<br>
&gt; and then symlink that from each language.<br>
&gt; Knowing that translated images actually have a separate filename<br>
&gt; convention indicates that this might just be crazy enough to work.<br>
&gt; <br>
&gt; Thoughts?<br>
&gt;   <br>
&gt; <br>
&gt;        I hope that helps so far. I might be able to look into this, but<br>
&gt;        probably not very soon so if anybody else can work on this I am more<br>
&gt;        than happy.<br>
&gt; <br>
&gt;        Cheers,<br>
&gt;        Julius<br>
&gt; <br>
&gt; <br>
&gt; Regards,<br>
&gt; Ben<br>
&gt;   <br>
&gt; <br>
&gt;        |<br>
&gt;        |<br>
&gt; <br>
&gt;        15. Januar 2023 um 07:45, &quot;Ben Cooksley&quot; &lt;<a \
href="mailto:bcooksley@kde.org" target="_blank">bcooksley@kde.org</a><br> &gt;        \
&lt;mailto:<a href="mailto:bcooksley@kde.org" \
target="_blank">bcooksley@kde.org</a>?to=%22Ben%20Cooksley%22%20%3Cbcooksley%<a \
href="http://40kde.org" rel="noreferrer" target="_blank">40kde.org</a>%3E&gt;&gt; \
schrieb:<br> &gt; <br>
&gt;              Hi all,<br>
&gt; <br>
&gt;              For some time now it has been known to me that the system for<br>
&gt;              generating application documentation websites using Sphinx with<br>
&gt;              l10n support has had issues with duplicating data - \
particularly<br> &gt;              images.<br>
&gt; <br>
&gt;              That leads to the following outcome, where aside from sites \
that<br> &gt;              we expect to be quite large (like <a \
href="http://www.kde.org" rel="noreferrer" target="_blank">www.kde.org</a><br> &gt;   \
&lt;<a href="http://www.kde.org/" rel="noreferrer" \
target="_blank">http://www.kde.org/</a>&gt; and <a href="http://api.kde.org" \
rel="noreferrer" target="_blank">api.kde.org</a> &lt;<a href="http://api.kde.org/" \
rel="noreferrer" target="_blank">http://api.kde.org/</a>&gt;) all<br> &gt;            \
of the application documentation sites are quite big as well:<br> &gt; <br>
&gt;              root@nicoda /srv/www # du -h --max-depth=1 ./generated/ | grep \
G<br> &gt;              2.3G      ./generated/<a href="http://cutehmi.kde.org" \
rel="noreferrer" target="_blank">cutehmi.kde.org</a> &lt;<a \
href="http://cutehmi.kde.org/" rel="noreferrer" \
target="_blank">http://cutehmi.kde.org/</a>&gt;<br> &gt;              3.7G      \
./generated/<a href="http://docs.digikam.org" rel="noreferrer" \
target="_blank">docs.digikam.org</a> &lt;<a href="http://docs.digikam.org/" \
rel="noreferrer" target="_blank">http://docs.digikam.org/</a>&gt;<br> &gt;            \
2.4G      ./generated/<a href="http://api.kde.org" rel="noreferrer" \
target="_blank">api.kde.org</a> &lt;<a href="http://api.kde.org/" rel="noreferrer" \
target="_blank">http://api.kde.org/</a>&gt;<br> &gt;              2.3G      \
./generated/<a href="http://docs.krita.org" rel="noreferrer" \
target="_blank">docs.krita.org</a> &lt;<a href="http://docs.krita.org/" \
rel="noreferrer" target="_blank">http://docs.krita.org/</a>&gt;<br> &gt;              \
1.4G      ./generated/<a href="http://www.kde.org" rel="noreferrer" \
target="_blank">www.kde.org</a> &lt;<a href="http://www.kde.org/" rel="noreferrer" \
target="_blank">http://www.kde.org/</a>&gt;<br> &gt;              7.9G      \
./generated/<a href="http://docs.kdenlive.org" rel="noreferrer" \
target="_blank">docs.kdenlive.org</a> &lt;<a href="http://docs.kdenlive.org/" \
rel="noreferrer" target="_blank">http://docs.kdenlive.org/</a>&gt;<br> &gt;           \
29G       ./generated/<br> &gt; <br>
&gt;              This stands in comparison to the Docbook documentation site for<br>
&gt;              all other KDE applications:<br>
&gt; <br>
&gt;              root@nicoda /srv/www # du -h --max-depth=1 . | grep G<br>
&gt;              29G       ./generated<br>
&gt;              16G       ./api.kde.org-legacy<br>
&gt;              6.0G      ./<a href="http://docs.kde.org" rel="noreferrer" \
target="_blank">docs.kde.org</a> &lt;<a href="http://docs.kde.org/" rel="noreferrer" \
target="_blank">http://docs.kde.org/</a>&gt;<br> &gt;              51G       .<br>
&gt; <br>
&gt;              It would be nice if we could please look into some fixes for<br>
&gt;              this, as it looks like Sphinx is duplicating the images - once<br>
&gt;              for every language - when that isn&#39;t necessary.<br>
&gt;              I could understand if the screenshots were updated as part of<br>
&gt;              the translation, but it looks like they&#39;re not in the \
majority<br> &gt;              of cases - below being just a sample:<br>
&gt; <br>
&gt;              root@nicoda /srv/www/generated/<a href="http://docs.krita.org" \
rel="noreferrer" target="_blank">docs.krita.org</a><br> &gt;              &lt;<a \
href="http://docs.krita.org/" rel="noreferrer" \
target="_blank">http://docs.krita.org/</a>&gt; # sha256sum<br> &gt;              \
zh_CN/_images/Krita_cpb_mixing.gif<br> &gt;              \
12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7<br> &gt;             \
zh_CN/_images/Krita_cpb_mixing.gif<br> &gt;              root@nicoda \
/srv/www/generated/<a href="http://docs.krita.org" rel="noreferrer" \
target="_blank">docs.krita.org</a><br> &gt;              &lt;<a \
href="http://docs.krita.org/" rel="noreferrer" \
target="_blank">http://docs.krita.org/</a>&gt; # sha256sum \
en/_images/Krita_cpb_mixing.gif<br> &gt;              \
12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7<br> &gt;             \
en/_images/Krita_cpb_mixing.gif<br> &gt; <br>
&gt;              While this isn&#39;t a massive issue right now, it is a future<br>
&gt;              scalability issue as for Krita at least each language costs<br>
&gt;              178MB or so, while for Digikam that sits at 415MB per language<br>
&gt;              and Kdenlive is 392MB.<br>
&gt; <br>
&gt;              Many thanks,<br>
&gt;              Ben<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt;        Julius Künzel<br>
&gt;        Volunteer KDE Developer, mainly hacking Kdenlive<br>
&gt;        KDE GitLab: <a href="https://my.kde.org/user/jlskuz/" rel="noreferrer" \
target="_blank">https://my.kde.org/user/jlskuz/</a><br> &gt;        &lt;<a \
href="https://my.kde.org/user/jlskuz/" rel="noreferrer" \
target="_blank">https://my.kde.org/user/jlskuz/</a>&gt;<br> &gt;        Matrix: \
@jlskuz:<a href="http://kde.org" rel="noreferrer" target="_blank">kde.org</a> &lt;<a \
href="http://kde.org" rel="noreferrer" target="_blank">http://kde.org</a>&gt;<br> \
&gt; <br> <br>
-- <br>
amyspark 🌸 <a href="https://www.amyspark.me" rel="noreferrer" \
target="_blank">https://www.amyspark.me</a><br> </blockquote></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic