[prev in list] [next in list] [prev in thread] [next in thread]
List: sqlite-users
Subject: Re: [sqlite] transparent compression implementations for SQLite?
From: Simon Slavin <slavins () bigfraud ! org>
Date: 2011-12-31 21:03:19
Message-ID: 68AB554E-BB88-4FC4-ABE9-E9680A6F26B4 () bigfraud ! org
[Download RAW message or body]
On 31 Dec 2011, at 8:37pm, Ivan Shmakov wrote:
>>>>>> Simon Slavin <slavins@bigfraud.org> writes:
>>>>>> On 31 Dec 2011, at 4:56pm, Ivan Shmakov wrote:
> =
>>> The integers could take up to 32 bits long, but I deem them likely
>>> to =93cluster=94, like, e. g.: 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 101, 101,
>>> 102, 102, 102, 103, 103, 768, 768, etc. My guess is that such
>>> sequences should be quite compressible, but the point is that
>>> there'll only be a few such numbers per row, and millions of such
>>> rows in the table.
> =
>> Thing is, an integer like 103 takes two bytes to store,
> =
> Those were given as examples only. The real values are going to
> use up to 28 bits (roughly, 4 octets) or so.
> =
>> and one of those bytes is the byte that indicates what kind of value
>> is being stored. So it's really only taking one byte per value.
> =
> Is it? I mean, if I know for sure that I'd only ever need, say,
> eight 32-bit integers per row, and pack them into a single BLOB
> instead of having them stored as proper INTEGER's, won't I be
> able to save a few bytes per row? (For a performance penalty
> when accessing these fields. Also, I'd have to forget about
> indexes on these, other than a UNIQUE one on the whole
> BLOB-coded tuple, I guess.)
> =
> (With a few functions to unpack these values back out of BLOB's,
> I'd even be able to CREATE a suitable VIEW. Or may be a VIRTUAL
> TABLE.)
So yes, it depends on whether you need to do searches or sorting on these c=
olumns. If you don't care about the individual fields then all you need is=
a function which turns either integers into the shortest BLOB it can.
>> And you can't improve on this by compressing individual values, only
>> by compressing at the table level. And if you're compressing at
>> table level, making any changes is rather slow.
> =
> For now, I'm interested mostly in INSERT's. (And SELECT's, of
> course.)
Well, you obviously realise your trade-offs here. You can compress at the =
field level with minimal trade-off. You can compress at the row level, whi=
ch slows down changes and queries, but not much. You can compress at the c=
olumn or table levels, which slow down changes a great deal. And because o=
f the way SQLite works internally you can compress at the page level.
So you're really trying to figure out which trade-off is best for you. It =
all depends on how big your databases are and whether they live on just mac=
hine or need to be sent places frequently. But disk space is so cheap thes=
e days it's cheaper for me to go out and buy a Terabyte disk than to spend =
a week writing clever compression code.
>>> [snip]is there a way to determine the filesystem space occupied by a
>>> particular table, or index, in SQLite? It now seems to me that the
>>> problem is different to what I've initially presumed.
> =
>> Not inside SQLite's API functions. You can do it by reading the file
>> format and multiplying the page size by how many pages are used.
> =
> ACK, thanks. Do I understand it correctly that the
> sqlite-analyzer (suggested elsewhere in this thread) does this
> very thing?
Yes. It prints out lots of information about your database including what =
you asked for. But it's a stand-alone app, not part of the SQLite API.
Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic