[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lyx-devel
Subject:    (fwd) string discussion from gtk-- mailing list
From:       Allan Rae <rae () elec ! uq ! edu ! au>
Date:       1999-03-29 5:27:21
[Download RAW message or body]

(was Re: [gtkmm] new Gtk-- widgets coming (fwd))

Sections of this discussion may be of interest to the arguement about
what a better chunk might be made from.

This posting refers to a thread that would have to be a couple of weeks
old.  I've been a fairly quiet listener on that list for a while now.

Allan. (ARRae)

---------- Forwarded message ----------
Date: Sat, 27 Mar 1999 13:51:11 +0100
From: Christof Petig <mildner@uni-wuppertal.de>
Reply-To: gtkmm@modeemi.cs.tut.fi
To: gtkmm@modeemi.cs.tut.fi
Subject: Re: [gtkmm] new Gtk-- widgets coming

Paul Barton-Davis wrote:

Sorry for answering this late but I have been too busy upgrading our systems
to glibc-2 and gtk+-1.2.
And also sorry for sending this mail from my wife's account - I have no mind
to reconfigure her computer - my email address is
      christof.petig@wtal.de

> >Paul Barton-Davis <pbd@Op.Net> writes:
> >
> >> Do I use "string" in my parser/lexer ? What happens if my
> >> parser/lexer has conveniently mmapped its input, and so we could
> >> potentially use the text from the file without any data copying ?
> >
> >Then you probably want an strstream.
>
> Nope. strstream doesn't have the key property than an mmapped area
> does: no need to copy the data.  You'd need to derive a class from
> strstream, and set a flag saying "its necessary to make copies of this
> input stream, because its going away", or alternatively "don't, under
> any circumstances, make any copies of this data".

'Make a copy' rings a bell for me - this is trivial with strings, they are
designed to handle copies.
'Don't ever make a copy' is not possible with strings (AFAIK, please
enlighten me if I'm wrong).
But IIRC you _have_ to pass the length to your C interface, unless the char*
is NUL terminated. The fact that data() is private (data() returns a char*
out of an string without adding an NUL) prevents the usage you intended. But
string was never designed to handle this case -

I recommend programming an own class 'mmap_string' or so which NEVER alters
the data and avoids copying wherever possible. This would give you the full
power and if you provide an operator char*() for mmap_string it would be
transparent. It would even have no performance penalties compared to char*.

This separation into two classes string and mmap_string makes clear what is
intended to do with this string. And putting a class around it (or even a
typedef) clearly states your intention (necessary for people not that
familiar to your program's internals).

If you want me to write this class for you (costs me about ten minutes),
please tell me.

> Further more, what happens when I decide to use string::c_str() to
> pass a string to a C library function that requires a char *
> representation ? If the mmapped area was mapped PROT_READ,
> we just segfaulted, and if it was mapped MAP_PRIVATE with write
> protection, we just caused a data copy because of the insertion of
> a null. If I use const char *, there is no question about the cost of
> trying to use part of the mapped area as a "string" - it can't be done
> without a data copy.

How does the C function know about your char*'s length? I assume you pass
it, putting it into the mmap_string class would tie them together. Passing
around char* and int's is ok for a C program, for a C++ program where
classes and struct's have _no_ performance penalty, this is menace.

> My parsers (written with Bison+Flex) all will lex istreams by default
> (thus parsing strstream and fstream sources the same way), and also
> handle const char * which is assumed to point to an mmapped area.

Huh, sorry if I don't get all your ideas. Confused ...

>
>
> >> What happens to my generalized Printer class, that has a
> >> printf()/vform()-like operator(), which will not allow the passing
> >> of "string", and doesn't know that "%s" means "string" not "char
> >> *".
> >
> >Isn't there any way it can be changed to accept strings ?
>
> If you want to rewrite __doprnt(), be my guest. __doprnt(), accessed
> via any of "v?[fs]?printf()" or its iostream cousing, vform(), only
> understands %s to refer to const char *. I don't know of any low level
> varags formatting function that has a %X code for a string. It would
> be quite useful, and I suppose that ostream::vform would be the place.

There isn't any (according to my knowledge), since operator<< is the
preferred way of output formating. If you don't like ostream you might
abstract it (another class), feel free to ask me about this.

> >> it tends to discourage data copying,
> >
> >No, it forces you to ask yourself whether you can or not (or should or
> >not), copy or alter it, and most of the time you don't know.
>
> Thats why g_string makes a lot of sense. But g_string isn't a
> different object to char *, its just a syntactic convention.
> "const char *" and "const char * const" work similarly for my taste.

That's why I prefer C++ classes, they are typesafe! g_string is just a weak
C equivalent to string (correct me if I'm wrong)

> >On the other hand, strings don't encourage or discourage anything in
> >that regard, they remove the problem.
>
> I don't think so.  "char *" is a constant reminder to me that I'm
> working with pointers to data. Note that the equivalent to "const char
> *" is not "const string", but "const string *". Since I grew up on C,
> and tend to read "const char *" as "a string", I get quite confused
> when I see "const string *", which I read as "ptr to string". In the

perhaps you refer to const string & (avoiding copying). Still being used to
C conventions explains your unease with string. Believe me, it has great
benefits, though it is not always the right way to go.

const string & means passing a string without copying it (which resembles
const char*).

> first case, I don't need to do any additional indirection on my
> original data structure in order to pass by reference. With a string
> class, I have to explicitly make this choice, which is good on some
> levels, bad on others. Its bad when I don't ever want the bytes of my
> string to be copied anywhere. Passing "const char *" makes *me* this much
> more likely to stick to this.

Please drop excessive [const] char* within C++, I would even prefer a
typedef over char* since char* means nothing except 'a bunch of characters
you might access now'. It doesn't provide a concept of lifetime and has no
concept of length - unless NUL terminated.

> As a result, I'm much more likely to pass around "string &", and
> pretty soon I find myself never sure if am working with a copy of the
> data, or the original bytes.

As a rule of thumb: pass const string & - you can promote it to string any
time you like, if you want to alter the contents deep inside the called
functions you should pass string &. But I regard this as (in most cases) bad
design.

Initializing a string by a const string & makes a copy. Initializing a const
string & by a string doesn't.

> This is a personal failing, but in
> general, I dislike references for this reason, although I like the
> semantic sugar that they offer.
>
> >Just out of curiosity, would you care to give me your definition of
> >"just as powerful" ?? It's mostly the "just as" part that I'm
> >interested in.
>
> they both contain a sequence of bytes that can be searched, copied,
> compared and assigned. the cost of searching, copying, comparison and
> assignment are either the same, or only slightly different, with char
> * winning sometimes, and string other times.
>
> if you care about appending & substrings, then certainly string is the
> way to go. but i very, very rarely carry out these operations, and
> when I do, its mostly because I'm using "string" :)

"string"? why don't you use  string (the real thing ;-) )?

> >> Unless you want to do lots of things with substrings, I see few
> >> reasons to use a string class, and I'm actually disappointed that
> >> Gt -- does. > >String has an operator=(), a copy ctor, and has no silly
> size >problems. These three features alone, and the huge slew of bugs and
> >tiring, boring, over-hashed programming problems they avoid, more than
> >fully justify the use of this class in favor of a primitive, feature
> >deprived array of bytes.So, consider the following:
>
>      string foo = "a string";

Oops, string foo("a string") avoids initializing foo to "".

>      string bar = foo.substr (0, foo.find ('t'));

two times copied ... ;-(

>     printf (bar, "%s\n", bar.c_str());

you certainly didn't mean your first bar?

>
> Did I make a copy of foo, part of foo, or no copy at all ? Contrast
> with:
>
>      const char *foo = "a string";
>      printf ("%.*s", strchr (foo, 't') - foo, foo);
>
> I know exactly what happened here: no data copying.

I would recommend a class dedicated to this purpose which hides the details
and explicitely states your intention. I also use C functions to manipulate
and search through strings - for simplicity and fast implementation - though
this clearly is bad coding - I really should give STL a try!

> Perhaps you will argue for using iostream to avoid the problem, and I
> agree that this is desirable. But I find myself often interfacing with
> a C library that forces me to use c_str(), and thus leaves me unsure
> whether I made a copy of the substr or not. In fact, the only way to
> check whether this happened is to look at the string class source
> code, because last time I looked, it wasn't part of any of the pending
> string class standards.

AFAIK the standard enforces a copy once a string object is created.

> >Strings gain us time and spares us bugs (the idiotic, hard-to-find
> >kind). We have better things to do than worry about char arrays (and
> >I'm not trying to be funny this time).
>
> On one level, I completely agree with you. But I suspect you code
> almost entirely in C++, use iostream instead of stdio or
> read(2)/write(2), and don't mind the pain of interfacing with C
> libraries. Under those circumstances, I cannot agree that strings save
> time, or spare us bugs.  They encourage an approach which avoids some
> problems (such as undersized arrays) and encourages others (data
> copying). At least for me.
>
> But really I was just venting over how hard it was to use string
> throughout a fairly large program when one is used to "char *". I
> don't seriously think that Gtk-- should drop its use of string.

Sigh. Good.
I really recommend C++ beginners to use string and forget about char*
because it saves you much trouble - and is much cleaner design.
In your case (avoiding copying) a new class would be the best solution,
which complements (add missing features, drop unwanted f.) string. Now you
can clearly decide and state, which was intended.

If you would like to give it a try, I offer to write that class - this class
has benefits of it's own (even to me!). Perhaps even to gtk-- (please stress
perhaps).

Also I really like some ideas from the cow book (Ruminations on C++, Koenig
and Moo), which abstract the iostreams and give you a single API for
iostream and FILE* and int (write,open,read).
Without any performance penalties (thank to inline substituted templates).

Concerning the new Gtk-- widgets, I really don't find time now to
incorporate them into gtk--addons, since I would like to add a decent
autoconf/automake then. Would you please send a copy of your most recent
widgets to me (this saves me the cost of browsing the web to get them). Or
would you like to check them in yourself (I don't mind). Or would you like
to discuss some matters ...

Feel free to contact me personally  (christof.petig@wtal.de)

 Christof  (also maintainer of glade--, but busy with other projects)

PS: Anybody interested in a GPLed LAT (DEC VAX network protocol) port to
linux. I have to do that quickly. I don't guess so ...




[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic