[prev in list] [next in list] [prev in thread] [next in thread]
List: kde-look
Subject: Re: Moving away from app-centric mimetypes (e.g. kword)
From: "Steven D'Aprano" <dippy () mikka ! net ! au>
Date: 2002-05-19 4:53:36
[Download RAW message or body]
On Sat, 18 May 2002 22:17, David Golden wrote:
> On Saturday 18 May 2002 04:25, Steven D'Aprano wrote:
> > Speaking as a user, I may or may not want to apply metadata to a
> > directory, but otherwise the concept of a directory as a place you
> > store related files is too useful to get rid of. Why do you want to
> > do this?
>
> ??? I thoroughly don't.
Okay, I've misunderstood you.
> Files are a place where people store
> related information. Directories are a place where people store
> related files. Files are Information. Why can't I CD into the
> permissions of a file, or CD into its mime type ?
Why would you want to?
> Directories ~ Files.
I don't understand the ~ symbol. Is this supposed to be
"approximately equal"?
> for metadata, one can invisage a directory interface to "classic"
> unix permissions, for example.
> (I'm not saying this _particular_ scheme is any good, just an
> illustrative possibility..)
Okay, I see how it works. But why?
What else can you do? cd into a mp3 file or jpeg or text file? I don't
understand the point.
[snip]
> Most complex file formats effectively incorporate half-assed
> directories or full database structures already. XML _is_ a tree
> structure. (XML is just lisp sexps reimplemented badly, too, but
> that's another rant) The windoze registry.... argh. argh.. argh..
>
> One can easily imagine:
> cd <ELF-FILE>
> ls
> ...segment 1
> ....segment 2
An observation here. Long ago, when I was young and foolish, I tried
programming some mid-level file manager stuff on the Apple Mac. Now the
thing with the Mac file manager is that is has (or had) three tiers:
high level routines for opening files, low level nightmares for doing
weird things that nobody sensible would want to do, and a mid-level.
The mid-level routines basically had a single command (say, hfopen)
which took a complex data structure, and that data structure varied
according to whether you wanted to open a file, or a directory, or a
device driver, or a volume. It was a mess to use, and it must have been
a nightmare to maintain.
If I have understood you correctly, you are suggesting something
similar. You are suggesting that cd should work on files, directories,
presumably pipes and devices and anything else that can show up in a
directory tree. Presumably you want to cd into a tar file, and a gzip
file, and a xml file, and any other file.
So who is responsible for updating the cd command for every new file
format that comes along? When I try to cd into a Metacard file (.mc),
and I don't get the result I expect, is that a bug in the cd command?
> > > kword and staroffice already do this -
> >
> > They do? Explain please.
>
> Staroffice's file formats are zipped directorys of xml files,
> according to another post on this ML, so too are kword. Java jars
> are zipped directory trees. See a pattern?
Yes, but I fail to see the relevance.
I think you are making an error of terminology, and that is giving you
the wrong idea.
A Staroffice file isn't a zipped *directory* of xml files, its a zipped
*set* of xml files. The reason being, you can store anything you like
in a directory. If I add and remove xml files at random from
/home/steve/myfiles/, I'm not going to break anything. But if I add and
remove xml files at random from a staroffice file, I will surely break
it.
(Whether it breaks gracefully or explodes in a shower of flames is
besides the point -- its still broken.)
>
> > Arghhh. False hits on a large system are BAD!
>
> Of course they are. no hits are worse.
No. If you have no hits, then you know the data isn't there, and you
can give up or go elsewhere.
> You can progressively narrow
> down too many hits, as you yourself illustrated, since the earlier
> searches can provide context for narrowing your search.
In general you can't, not without understanding some sort of structure.
The problem with just adding extra search terms is that (while
sometimes it works) you run the risk of being too specific. If the
search engine uses an implied AND between terms (like google does) than
adding a term will narrow the search, sometimes too narrow. If it is an
implied OR, then you widen the search, sometimes too wide.
Sometimes what you need to both AND and OR searches. I think Reisner's
fear of asking users to use structure is foolish. That doesn't mean
asking them to write complex queries or build regex strings. It means
giving them a simple but effective interface to generate complex
queries and regex strings, so they don't have to learn the syntax of
the query but can put it together like Lego blocks.
Users who can't deal with Lego blocks will make do with simple searches
and deal with the hundreds of false hits. Those who can deal with it
will scratch their head, make a coffee, use the interface to put
together a complex query, and be rewarded by a handful of valid hits.
> Starting from
> no hits and trying to work your way up is something most people find
> harder, since you don't have context...
Everybody always starts with no hits, since you start with a search
engine pointing at nothing.
> i.e. at least you _can_ get a result back. People in general don't
> know exactly what they want. If they did, we probably wouldn't need
> DNS, let alone search engines.
Domain Name Servers? :-)
No, people do know exactly what they want. What they don't know is how
to phrase it in some unnatural query language.
I can ask "What is the name of the Imperial Officer choked by Darth
Vader, using just the Force, on the Death Star, in the original Star
Wars movie?" That's *easy*, and a five year old can do it. But if I had
to turn that into SQL or a regex, it would be quicker for me to flick
through the novel until I find it. Or ask one of my geeky Star Wars fan
friends, who probably can tell me as soon as I say "What is the name of
the Imperial Officer choked by Darth Vader", without all the extra
search terms.
> With a traditional filesystem or relational db,
> you either have to know exactly what you want, or, use, from a
> compsci perspective, inefficient post-facto searches like the "find"
> command, or periodic indexing, like the "slocate" command (which
> itself seems to be left out of the default install of some new
> distros for some benighted reason...)
Inefficient from whose perspective?
It just seems over-kill to me to invent an entire new file system, just
to avoid building a flexible search engine.
And even then, you will notice that Reisner's proposal isn't what he
says it is. He claims that you should not expect the user to learn
structure, but then he expects users to use a specific syntax:
ls [subject/[illegal strike] to/elves from/santa ultimatum]
Am I being unfair to say that Reisner would object to the exact same
query if it were written:
find subject="illegal strike", to="elves", from="santa", AND "ultimatum"
I don't see any difference, except in syntax. And personally, speaking
as a user, using "=" to indicate equality seems much more sensible than
using "/" (which means division to users, not directory seperators).
--
Steven D'Aprano
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic