[moved to core-devel] [warning: lots of reading ahead :)] On Wed, Nov 21, 2007 at 11:42:10AM +0100, David Faure wrote: > On Wednesday 21 November 2007, Andreas Pakulat wrote: > > I don't understand why creating groups with slashes got broken either. I > > suspect that "somewhere" in kconfigs code KConfigGroup::name() is used > > where instead the fullname is needed. > probably. will you have a look? or thomas? (that's independend of the rest here, i'd think) > So now I'm confused. If this KConfig behavior change wasn't expected, > why the kicontheme.cpp change? The bug should be fixed in KConfig, > not in the KIconTheme user code... > prolly. but it isn't *that* simple: :) there are two cases where slashes appear: - some hierarchy, like in the icons case - just as part of the name, like in an url (ok, an url *is* hierarchic, but in this case one should consider the structure opaque) in the first case, the slashes should appear verbatim in the group header. the proper way to construct such a hierarchy is by using nested kconfiggroups - see andreas' fix. in the second case, the slashes should be escaped somehow. the way to construct such a thing would be simply passing the complete name to the kconfiggroup ctor. that this distinction makes sense becomes obvious when you consider mixing the two. and when you consider alternative backends, it becomes obvious why the separator between hierarchy levels needs to be opaque to the api. so far the theory. now the practical consequences: - api. functions must treat group names as atomic, not as hierarchical: - they did so far and it's too late to change now - it would be just wrong anyway, as the separator needs to be opaque - on-disk format of ini files. in the current format, groups are encoded just like keys: \-escape leading and trailing whitespace, control chars, [, ] and =. taking " weird \ []] group / name=blah" as an example yields [\sweird \\ \x5b\x5d\x5d group / name\x3dblah]. manually inserted [ and ] would break parsing. an = would be taken literally. (*) now we need some way to delimit hierarchy levels. proposals range from the not-so-fortunate / (current), the probably better | (apaku) or the even better ^ (me :). this all doesn't really *solve* the problem, though - it merely lessens the probability of it surfacing. same game as with list separators in values before ... so - again - two choices with various variations each. example of the hour: "g1/1" with subgroup "g2^2[\": - add an additional layer of encoding. using / for the level delimiter: - using the same escape char as the underlying layer: [g1\\/1/g2^2\x5b\\\\] - using a different escape char: [g1^/1/g2^^2\x5b\\] - escape separator at the lowest layer already: - pick an arbitrary char ("by pure chance, it's the slash"): [g1\x2f1/g2^2\x5b\\] - use a char that currently really cannot appear, as it is already encoded: [g1/1=g2^2\x5b\\] this simply redefines "needlessly encoded in the case of groups" into "reserved for later use". :-D - actually, there is a third approach i haven't thought of before: make the separator itself an escape at the lowest layer: [g1/1\/g2^2\x5b\\] evaluating the proposals: - additional layer: - both variants look somewhat ugly, the first being particularly unreadable (what coincidence that this is the encoding used for list values :). using | for the delimiter would reduce the number of to-be-escaped cases. otoh, who cares? :} - coding-wise, this is the simpler solution: one could store the list-encoded key in the entry map, thus needing changes only to the code for constructing groups and returning their name. - to have forward compatibility, groups with this encoding would need to gain a [$h] (for "hierarchical") marker. limited backward compat would be achieved by encoding only nested groups that way. - lowest layer: - arbitrary char: - a slash or bar looks sort of most readable to me - that needs the [$h] marker, too - equal sign: - looks just "unnatural" - but needs no [$h] marker. provided we define the end of may 2007 as the beginning of times, but that's reasonable within the constraints (*). - code-wise, one could re-encode this into the list-encoded format and handle it like the other case. in any case it's more code. the proper solution (for either case) would be a really hierarchical entry map, but that's out of scope for 4.0. - lowest layer escape: - looks somewhat ... backwards - needs no [$h] marker - code-wise it's about the same as the previous variant (*) the current group name encoding is neither backward nor forward compatible with kde3: group names were written literally, with the exception of [ and ] being doubled, the example thus yielding [ weird \ [[]]]] group / name=blah]. line breaks would break parsing, other control chars would make it unreadable for humans. i haven't seen complaints about the broken compatibility yet, but here are some thoughts nonetheless: - i don't think this is relevant for shared kde3/kde4 configs, as they don't have weird group names, afaik - a one-time upgrade for actually affected apps would need some special format rewrite stuff in kconf_upgrade: the regular kconfig would obviously break down - consider this forward compatible solution: [ is effectively an escape char. so something as perverted as "[new\nline ]\x13here" would become [[[new[nline ]][x13here]. for more readability the backslash could be preserved, making it [[[new[\nline ]][\x13here]. assume we implement the forward compatible variant. again, use the example "g1/1" with subgroup "g2^2[\" and / for the delimiter, the encoding options are: - add an additional layer of encoding: - using the same escape char as the underlying layer: [g1[[/1/g2^2[[[[\] - using a different escape char: [g1^/1/g2^^2[[\] - escape separator at the lowest layer already: [g1[/1/g2^2[[\] - make the separator itself an escape at the lowest layer: [g1/g[/g2^2[[\] evaluation: same as above. and just in the moment i wanted to conclude the mail, i've got yet another idea ... how about encoding it like this: [g1/1][g2^2] a problem here is that the immutability marker is as such a valid group name and would thus create ambiguity (btw, this is also a problem for the stand-alone file immutability marker). one could remedy this by encoding a leading $ in group names as [$. concluding questions (yeah, finally ... ;): - should we restore backwards compat? - which separator encoding to chose? - i tend to favor my last-minute idea even if i spent only five minutes developing it, as opposed to five hours on the rest. ;) - second option would be the "regular" lowest-layer encoding (without the = hack in the non-backward-compatible variant). i guess i'd favor / over |, but i'm undecided. whew, things take much less time when you don't try to consider everthing. ;) -- Hi! I'm a .signature virus! Copy me into your ~/.signature, please! -- Chaos, panic, and disorder - my work here is done.