You get the same effect if you try to ZIP (compress) PC files that are already compressed-- ie JPG or already ZIPped files. They get bigger. Part of the reason is that the compressed file is already tightly packed, so it's like putting on a jacket over a windbreaker over a sweater. It takes longer and makes you 'bigger.'

It also depends on what kind of data you're compressing. Text scrunches down -really- small! Numbers that have no pattern, not so much. But files with a lot of dates (especially a lot of the same dates) do pretty good. One of the early algorithms I read about compressed by creating string snippets of the file and assigning a 'token' to the snippet. And then replaced that string, wherever it was found, with the token. And then it made a 2nd pass, and if it found a sequence of tokens that got used a lot, -that- would get another token. Of course, it takes time to analyze the data, and you have to optimize the length of the snippets ("Susan" and "Susanna" could be optimized with a token for Susan, assuming that the snippets are only 5 characters long. If the snippets were 6 characters, you don't have a match), and optimize the number of passes through the data. Algorithms that take longer make a more determined effort to find matching snippets, maybe trying several possibilities to get the best match (like your tax agent computing your taxes both as "married" and as "single" to see which gives you the lowest tax bill). And, of course, the length of the token has to be shorter than the data it replaces!

The algorithm 'knows' how to tell a token from uncompressed data...

Paul E Musselman

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2019 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].