[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-kimageshop
Subject: Re: Sphinx Application Documentation - Image duplication
From: Ben Cooksley <bcooksley () kde ! org>
Date: 2023-01-22 20:42:07
Message-ID: CA+XidOH+zLjGi3uZSqrv2o452KbJVOO01RmDYp+VHbFtgs718Q () mail ! gmail ! com
[Download RAW message or body]
On Mon, Jan 23, 2023 at 8:59 AM L. E. Segovia <amy@amyspark.me> wrote:
> Hi all,
>
Hi Amy,
>
> If I understand correctly, by doing what you said you would instead have
> a copy of each image per supported language-- only that squashed into a
> massive monolithic folder. Did you instead mean to symlink the
> "localized" image files to the source copies?
>
From what Julius has found, all translated images have the language code
injected into the filename - with untranslated images being left unchanged.
Therefore, as pseudo-code:
- mv $root/en/_images/* $root/_images/
- mv $root/it/_images/* $root/_images/
- mv $root/fr/_images/* $root/_images/
Should achieve the objective of de-duplicating all of the images, as the
untranslated English screenshots should all overwrite each other - leaving
just a single copy of the English screenshots and all the translated ones
behind.
>
> Another idea I have is to preserve the localization step as is, but
> ignore the generated image folder, and in a postbuild step replace the
> <img src="localized image path"> with the path to the source folder. (I
> do something like this with a HTMLPipeline filter for my blog's emojis.)
>
> Cheers,
>
> amyspark
>
Cheers,
Ben
>
>
> PS: I've trimmed the CC as I wasn't sure if I should mail four lists at
> once. Feel free to forward the email if necessary.
>
> On 22/01/2023 16:02, Ben Cooksley wrote:
> > On Mon, Jan 23, 2023 at 7:51 AM Julius Künzel
> > <jk.kdedev@smartlab.uber.space <mailto:jk.kdedev@smartlab.uber.space>>
> > wrote:
> >
> > __
> > Hi Ben, hi all,
> >
> >
> > Hi Julius,
> >
> >
> >
> > I did a little research about this recently and unfortunately it
> > seems to me as if there is not really a solution on the Sphinx side.
> > One need to have separate build dirs for every language and it
> > copies all static files (css, js, images,..) to every build dir.
> > That's just how it works :-/ (Correct me in case anyone knows I am
> > wrong).
> > However we can of course try to solve this on our and and make our
> > deploy tools smart in a way that they keep only one version of each
> > image file and replace the others with symlinks.
> > It should be more or less easy to detect images that are translated
> > since they follow the pattern |filename.de.png where "de" is the
> > language code, so this image would be special for German, while for
> > all other languages filename.png is used.|
> >
> >
> > I had a very strong feeling that would be the case (very much seems that
> > Sphinx actually doesn't have proper i18n/l10n support and it's been
> > hacked in / bolted on later).
> >
> > My initial thinking on a quick and (somewhat) dirty solution to this had
> > been to merge all of the image files into a single folder at top level
> > and then symlink that from each language.
> > Knowing that translated images actually have a separate filename
> > convention indicates that this might just be crazy enough to work.
> >
> > Thoughts?
> >
> >
> > I hope that helps so far. I might be able to look into this, but
> > probably not very soon so if anybody else can work on this I am more
> > than happy.
> >
> > Cheers,
> > Julius
> >
> >
> > Regards,
> > Ben
> >
> >
> > |
> > |
> >
> > 15. Januar 2023 um 07:45, "Ben Cooksley" <bcooksley@kde.org
> > <mailto:bcooksley@kde.org?to=%22Ben%20Cooksley%22%20%3Cbcooksley%
> 40kde.org%3E>> schrieb:
> >
> > Hi all,
> >
> > For some time now it has been known to me that the system for
> > generating application documentation websites using Sphinx with
> > l10n support has had issues with duplicating data - particularly
> > images.
> >
> > That leads to the following outcome, where aside from sites that
> > we expect to be quite large (like www.kde.org
> > <http://www.kde.org/> and api.kde.org <http://api.kde.org/>) all
> > of the application documentation sites are quite big as well:
> >
> > root@nicoda /srv/www # du -h --max-depth=1 ./generated/ | grep G
> > 2.3G ./generated/cutehmi.kde.org <http://cutehmi.kde.org/>
> > 3.7G ./generated/docs.digikam.org <http://docs.digikam.org/>
> > 2.4G ./generated/api.kde.org <http://api.kde.org/>
> > 2.3G ./generated/docs.krita.org <http://docs.krita.org/>
> > 1.4G ./generated/www.kde.org <http://www.kde.org/>
> > 7.9G ./generated/docs.kdenlive.org <http://docs.kdenlive.org/
> >
> > 29G ./generated/
> >
> > This stands in comparison to the Docbook documentation site for
> > all other KDE applications:
> >
> > root@nicoda /srv/www # du -h --max-depth=1 . | grep G
> > 29G ./generated
> > 16G ./api.kde.org-legacy
> > 6.0G ./docs.kde.org <http://docs.kde.org/>
> > 51G .
> >
> > It would be nice if we could please look into some fixes for
> > this, as it looks like Sphinx is duplicating the images - once
> > for every language - when that isn't necessary.
> > I could understand if the screenshots were updated as part of
> > the translation, but it looks like they're not in the majority
> > of cases - below being just a sample:
> >
> > root@nicoda /srv/www/generated/docs.krita.org
> > <http://docs.krita.org/> # sha256sum
> > zh_CN/_images/Krita_cpb_mixing.gif
> > 12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7
> > zh_CN/_images/Krita_cpb_mixing.gif
> > root@nicoda /srv/www/generated/docs.krita.org
> > <http://docs.krita.org/> # sha256sum
> en/_images/Krita_cpb_mixing.gif
> > 12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7
> > en/_images/Krita_cpb_mixing.gif
> >
> > While this isn't a massive issue right now, it is a future
> > scalability issue as for Krita at least each language costs
> > 178MB or so, while for Digikam that sits at 415MB per language
> > and Kdenlive is 392MB.
> >
> > Many thanks,
> > Ben
> >
> >
> >
> > Julius Künzel
> > Volunteer KDE Developer, mainly hacking Kdenlive
> > KDE GitLab: https://my.kde.org/user/jlskuz/
> > <https://my.kde.org/user/jlskuz/>
> > Matrix: @jlskuz:kde.org <http://kde.org>
> >
>
> --
> amyspark 🌸 https://www.amyspark.me
>
[Attachment #3 (text/html)]
<div dir="ltr"><div dir="ltr">On Mon, Jan 23, 2023 at 8:59 AM L. E. Segovia <<a \
href="mailto:amy@amyspark.me">amy@amyspark.me</a>> wrote:<br></div><div \
class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi \
all,<br></blockquote><div><br></div><div>Hi Amy,</div><div> </div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"> <br>
If I understand correctly, by doing what you said you would instead have<br>
a copy of each image per supported language-- only that squashed into a<br>
massive monolithic folder. Did you instead mean to symlink the<br>
"localized" image files to the source \
copies?<br></blockquote><div><br></div><div>From what Julius has found, all \
translated images have the language code injected into the filename - with \
untranslated images being left unchanged.</div><div><br></div><div>Therefore, as \
pseudo-code:</div><div>- mv $root/en/_images/* $root/_images/</div><div>- mv \
$root/it/_images/* $root/_images/</div><div>- mv $root/fr/_images/* $root/_images/ \
</div><div><br></div><div>Should achieve the objective of de-duplicating all of the \
images, as the untranslated English screenshots should all overwrite each other - \
leaving just a single copy of the English screenshots and all the translated ones \
behind.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>
Another idea I have is to preserve the localization step as is, but<br>
ignore the generated image folder, and in a postbuild step replace the<br>
<img src="localized image path"> with the path to the source folder. \
(I<br> do something like this with a HTMLPipeline filter for my blog's \
emojis.)<br> <br>
Cheers,<br>
<br>
amyspark<br></blockquote><div><br></div><div>Cheers,</div><div>Ben</div><div> \
<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br>
<br>
PS: I've trimmed the CC as I wasn't sure if I should mail four lists at<br>
once. Feel free to forward the email if necessary.<br>
<br>
On 22/01/2023 16:02, Ben Cooksley wrote:<br>
> On Mon, Jan 23, 2023 at 7:51 AM Julius Künzel<br>
> <<a href="mailto:jk.kdedev@smartlab.uber.space" \
target="_blank">jk.kdedev@smartlab.uber.space</a> <mailto:<a \
href="mailto:jk.kdedev@smartlab.uber.space" \
target="_blank">jk.kdedev@smartlab.uber.space</a>>><br> > wrote:<br>
> <br>
> __<br>
> Hi Ben, hi all,<br>
> <br>
> <br>
> Hi Julius,<br>
> <br>
> <br>
> <br>
> I did a little research about this recently and unfortunately it<br>
> seems to me as if there is not really a solution on the Sphinx side.<br>
> One need to have separate build dirs for every language and it<br>
> copies all static files (css, js, images,..) to every build dir.<br>
> That's just how it works :-/ (Correct me in case anyone knows I \
am<br> > wrong).<br>
> However we can of course try to solve this on our and and make our<br>
> deploy tools smart in a way that they keep only one version of each<br>
> image file and replace the others with symlinks.<br>
> It should be more or less easy to detect images that are translated<br>
> since they follow the pattern |filename.de.png where "de" is \
the<br> > language code, so this image would be special for German, while \
for<br> > all other languages filename.png is used.|<br>
> <br>
> <br>
> I had a very strong feeling that would be the case (very much seems that<br>
> Sphinx actually doesn't have proper i18n/l10n support and it's been<br>
> hacked in / bolted on later).<br>
> <br>
> My initial thinking on a quick and (somewhat) dirty solution to this had<br>
> been to merge all of the image files into a single folder at top level<br>
> and then symlink that from each language.<br>
> Knowing that translated images actually have a separate filename<br>
> convention indicates that this might just be crazy enough to work.<br>
> <br>
> Thoughts?<br>
> <br>
> <br>
> I hope that helps so far. I might be able to look into this, but<br>
> probably not very soon so if anybody else can work on this I am more<br>
> than happy.<br>
> <br>
> Cheers,<br>
> Julius<br>
> <br>
> <br>
> Regards,<br>
> Ben<br>
> <br>
> <br>
> |<br>
> |<br>
> <br>
> 15. Januar 2023 um 07:45, "Ben Cooksley" <<a \
href="mailto:bcooksley@kde.org" target="_blank">bcooksley@kde.org</a><br> > \
<mailto:<a href="mailto:bcooksley@kde.org" \
target="_blank">bcooksley@kde.org</a>?to=%22Ben%20Cooksley%22%20%3Cbcooksley%<a \
href="http://40kde.org" rel="noreferrer" target="_blank">40kde.org</a>%3E>> \
schrieb:<br> > <br>
> Hi all,<br>
> <br>
> For some time now it has been known to me that the system for<br>
> generating application documentation websites using Sphinx with<br>
> l10n support has had issues with duplicating data - \
particularly<br> > images.<br>
> <br>
> That leads to the following outcome, where aside from sites \
that<br> > we expect to be quite large (like <a \
href="http://www.kde.org" rel="noreferrer" target="_blank">www.kde.org</a><br> > \
<<a href="http://www.kde.org/" rel="noreferrer" \
target="_blank">http://www.kde.org/</a>> and <a href="http://api.kde.org" \
rel="noreferrer" target="_blank">api.kde.org</a> <<a href="http://api.kde.org/" \
rel="noreferrer" target="_blank">http://api.kde.org/</a>>) all<br> > \
of the application documentation sites are quite big as well:<br> > <br>
> root@nicoda /srv/www # du -h --max-depth=1 ./generated/ | grep \
G<br> > 2.3G ./generated/<a href="http://cutehmi.kde.org" \
rel="noreferrer" target="_blank">cutehmi.kde.org</a> <<a \
href="http://cutehmi.kde.org/" rel="noreferrer" \
target="_blank">http://cutehmi.kde.org/</a>><br> > 3.7G \
./generated/<a href="http://docs.digikam.org" rel="noreferrer" \
target="_blank">docs.digikam.org</a> <<a href="http://docs.digikam.org/" \
rel="noreferrer" target="_blank">http://docs.digikam.org/</a>><br> > \
2.4G ./generated/<a href="http://api.kde.org" rel="noreferrer" \
target="_blank">api.kde.org</a> <<a href="http://api.kde.org/" rel="noreferrer" \
target="_blank">http://api.kde.org/</a>><br> > 2.3G \
./generated/<a href="http://docs.krita.org" rel="noreferrer" \
target="_blank">docs.krita.org</a> <<a href="http://docs.krita.org/" \
rel="noreferrer" target="_blank">http://docs.krita.org/</a>><br> > \
1.4G ./generated/<a href="http://www.kde.org" rel="noreferrer" \
target="_blank">www.kde.org</a> <<a href="http://www.kde.org/" rel="noreferrer" \
target="_blank">http://www.kde.org/</a>><br> > 7.9G \
./generated/<a href="http://docs.kdenlive.org" rel="noreferrer" \
target="_blank">docs.kdenlive.org</a> <<a href="http://docs.kdenlive.org/" \
rel="noreferrer" target="_blank">http://docs.kdenlive.org/</a>><br> > \
29G ./generated/<br> > <br>
> This stands in comparison to the Docbook documentation site for<br>
> all other KDE applications:<br>
> <br>
> root@nicoda /srv/www # du -h --max-depth=1 . | grep G<br>
> 29G ./generated<br>
> 16G ./api.kde.org-legacy<br>
> 6.0G ./<a href="http://docs.kde.org" rel="noreferrer" \
target="_blank">docs.kde.org</a> <<a href="http://docs.kde.org/" rel="noreferrer" \
target="_blank">http://docs.kde.org/</a>><br> > 51G .<br>
> <br>
> It would be nice if we could please look into some fixes for<br>
> this, as it looks like Sphinx is duplicating the images - once<br>
> for every language - when that isn't necessary.<br>
> I could understand if the screenshots were updated as part of<br>
> the translation, but it looks like they're not in the \
majority<br> > of cases - below being just a sample:<br>
> <br>
> root@nicoda /srv/www/generated/<a href="http://docs.krita.org" \
rel="noreferrer" target="_blank">docs.krita.org</a><br> > <<a \
href="http://docs.krita.org/" rel="noreferrer" \
target="_blank">http://docs.krita.org/</a>> # sha256sum<br> > \
zh_CN/_images/Krita_cpb_mixing.gif<br> > \
12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7<br> > \
zh_CN/_images/Krita_cpb_mixing.gif<br> > root@nicoda \
/srv/www/generated/<a href="http://docs.krita.org" rel="noreferrer" \
target="_blank">docs.krita.org</a><br> > <<a \
href="http://docs.krita.org/" rel="noreferrer" \
target="_blank">http://docs.krita.org/</a>> # sha256sum \
en/_images/Krita_cpb_mixing.gif<br> > \
12eb4cbad29a5a6486d3438dabb888a0aa0b9579e55b3be2f3c1d6e1d76fc1d7<br> > \
en/_images/Krita_cpb_mixing.gif<br> > <br>
> While this isn't a massive issue right now, it is a future<br>
> scalability issue as for Krita at least each language costs<br>
> 178MB or so, while for Digikam that sits at 415MB per language<br>
> and Kdenlive is 392MB.<br>
> <br>
> Many thanks,<br>
> Ben<br>
> <br>
> <br>
> <br>
> Julius Künzel<br>
> Volunteer KDE Developer, mainly hacking Kdenlive<br>
> KDE GitLab: <a href="https://my.kde.org/user/jlskuz/" rel="noreferrer" \
target="_blank">https://my.kde.org/user/jlskuz/</a><br> > <<a \
href="https://my.kde.org/user/jlskuz/" rel="noreferrer" \
target="_blank">https://my.kde.org/user/jlskuz/</a>><br> > Matrix: \
@jlskuz:<a href="http://kde.org" rel="noreferrer" target="_blank">kde.org</a> <<a \
href="http://kde.org" rel="noreferrer" target="_blank">http://kde.org</a>><br> \
> <br> <br>
-- <br>
amyspark 🌸 <a href="https://www.amyspark.me" rel="noreferrer" \
target="_blank">https://www.amyspark.me</a><br> </blockquote></div></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic