'Re: Moving away from app-centric mimetypes (e.g. kword)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-look
Subject:    Re: Moving away from app-centric mimetypes (e.g. kword)
From:       "Steven D'Aprano" <dippy () mikka ! net ! au>
Date:       2002-05-19 4:53:36
[Download RAW message or body]

On Sat, 18 May 2002 22:17, David Golden wrote:
> On Saturday 18 May 2002 04:25, Steven D'Aprano wrote:
> > Speaking as a user, I may or may not want to apply metadata to a
> > directory, but otherwise the concept of a directory as a place you
> > store related files is too useful to get rid of. Why do you want to
> > do this?
>
> ??? I thoroughly don't.

Okay, I've misunderstood you.

> Files are a place where people store
> related information. Directories are a place where people store
> related files. Files are Information.   Why can't I CD into the
> permissions of a file, or CD into its mime type ?

Why would you want to?

> Directories ~ Files.

I don't understand the ~ symbol. Is this supposed to be 
"approximately equal"?

> for metadata, one can invisage a directory interface to "classic"
> unix permissions, for example.
> (I'm not saying this _particular_ scheme is any good, just an
> illustrative possibility..)

Okay, I see how it works. But why?

What else can you do? cd into a mp3 file or jpeg or text file? I don't 
understand the point.

[snip]
> Most complex file formats effectively incorporate half-assed
> directories or full database  structures already.  XML _is_ a tree
> structure.   (XML is just lisp sexps reimplemented badly, too, but
> that's another rant) The windoze registry.... argh. argh.. argh..
>
> One can easily imagine:
> cd <ELF-FILE>
> ls
> ...segment 1
> ....segment 2

An observation here. Long ago, when I was young and foolish, I tried 
programming some mid-level file manager stuff on the Apple Mac. Now the 
thing with the Mac file manager is that is has (or had) three tiers: 
high level routines for opening files, low level nightmares for doing 
weird things that nobody sensible would want to do, and a mid-level.

The mid-level routines basically had a single command (say, hfopen) 
which took a complex data structure, and that data structure varied 
according to whether you wanted to open a file, or a directory, or a 
device driver, or a volume. It was a mess to use, and it must have been 
a nightmare to maintain.

If I have understood you correctly, you are suggesting something 
similar. You are suggesting that cd should work on files, directories, 
presumably pipes and devices and anything else that can show up in a 
directory tree. Presumably you want to cd into a tar file, and a gzip 
file, and a xml file, and any other file.

So who is responsible for updating the cd command for every new file 
format that comes along? When I try to cd into a Metacard file (.mc), 
and I don't get the result I expect, is that a bug in the cd command?

> > > kword and staroffice already do this -
> >
> > They do? Explain please.
>
> Staroffice's file formats are zipped directorys of xml files,
> according to another post on this ML, so too are kword.   Java jars
> are zipped directory trees. See a pattern?

Yes, but I fail to see the relevance.

I think you are making an error of terminology, and that is giving you 
the wrong idea.

A Staroffice file isn't a zipped *directory* of xml files, its a zipped 
*set* of xml files. The reason being, you can store anything you like 
in a directory. If I add and remove xml files at random from 
/home/steve/myfiles/, I'm not going to break anything. But if I add and 
remove xml files at random from a staroffice file, I will surely break 
it.

(Whether it breaks gracefully or explodes in a shower of flames is 
besides the point -- its still broken.)


>
> > Arghhh. False hits on a large system are BAD!
>
> Of course they are.  no hits are worse.

No. If you have no hits, then you know the data isn't there, and you 
can give up or go elsewhere.

> You can progressively narrow
> down too many hits, as you yourself illustrated, since the earlier
> searches can provide context for narrowing your search.

In general you can't, not without understanding some sort of structure. 
The problem with just adding extra search terms is that (while 
sometimes it works) you run the risk of being too specific. If the 
search engine uses an implied AND between terms (like google does) than 
adding a term will narrow the search, sometimes too narrow. If it is an 
implied OR, then you widen the search, sometimes too wide.

Sometimes what you need to both AND and OR searches. I think Reisner's 
fear of asking users to use structure is foolish. That doesn't mean 
asking them to write complex queries or build regex strings. It means 
giving them a simple but effective interface to generate complex 
queries and regex strings, so they don't have to learn the syntax of 
the query but can put it together like Lego blocks.

Users who can't deal with Lego blocks will make do with simple searches 
and deal with the hundreds of false hits. Those who can deal with it 
will scratch their head, make a coffee, use the interface to put 
together a complex query, and be rewarded by a handful of valid hits.

> Starting from
> no hits and trying to work your way up is something most people find
> harder, since you don't have context...

Everybody always starts with no hits, since you start with a search 
engine pointing at nothing.

> i.e. at least you _can_ get a result back.   People in general don't
> know exactly what they want.  If they did, we probably wouldn't need
> DNS, let alone search engines.

Domain Name Servers? :-)

No, people do know exactly what they want. What they don't know is how 
to phrase it in some unnatural query language.

I can ask "What is the name of the Imperial Officer choked by Darth 
Vader, using just the Force, on the Death Star, in the original Star 
Wars movie?" That's *easy*, and a five year old can do it. But if I had 
to turn that into SQL or a regex, it would be quicker for me to flick 
through the novel until I find it. Or ask one of my geeky Star Wars fan 
friends, who probably can tell me as soon as I say "What is the name of 
the Imperial Officer choked by Darth Vader", without all the extra 
search terms.

>  With a traditional filesystem or relational db,
> you either have to know exactly what you want, or, use, from a
> compsci perspective, inefficient post-facto searches like the "find"
> command, or periodic indexing, like the "slocate" command (which
> itself seems to be left out of the default install of some new
> distros for some benighted reason...)

Inefficient from whose perspective?

It just seems over-kill to me to invent an entire new file system, just 
to avoid building a flexible search engine.

And even then, you will notice that Reisner's proposal isn't what he 
says it is. He claims that you should not expect the user to learn 
structure, but then he expects users to use a specific syntax:

ls [subject/[illegal strike] to/elves from/santa ultimatum]


Am I being unfair to say that Reisner would object to the exact same 
query if it were written:

find subject="illegal strike", to="elves", from="santa", AND "ultimatum"


I don't see any difference, except in syntax. And personally, speaking 
as a user, using "=" to indicate equality seems much more sensible than 
using "/" (which means division to users, not directory seperators).


-- 
Steven D'Aprano

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic