Tuesday, January 30, 2018

The deflate algorithm

The deflate algorithm


The deflate algorithm, the compression approach used in the original Pkzip 2.0 version, is a fascinating read defined in the RFC1951 standard.  In order to understand the spec, its helpful to work through the "hello" examples discussed in this article.

Section 3.2.2. in the RFC defines the basic approach to creating Huffman trees, but this article provides a solid explanation about how it works and shows how its implemented.   The key insight is that Huffman codes are generated by counting the lengths of the variable-length codes (For more background/context, see Stanford lectures on algebraic error control codes and/or the textbook used in the class).

Gzip is essentially using the deflate algorithm but adds a header/trailer to the end.  RFC1952 provides a breakdown of how the header is generated, but you can also test for yourself:

$echo "hello" > hello
$gzip hello
$python
>>> data = open("hello.gz", "r").read()
>>> data[0:2]
x1fx8b
>>> data[3:4]
x08
>>> [hex(ord(a)) for a in data[4:8]]
[0x9e, 0x2e, 0xfa, 0x50]
>>> 0x50fa2e9e
1358573214
>>> datetime.datetime.fromtimestamp(1358573214)
datetime.datetime(2013, 1, 18, 21, 26, 54)
>>> data[10:16]
hellox00


More related articles too:

http://www.zlib.net/manual.html
http://www.adayinthelifeof.nl/2010/06/02/deflating-the-universe/
http://www.w3.org/Graphics/PNG/RFC-1951#codes
http://www.commandlinefanatic.com/cgi-bin/showarticle.cgi?article=art007


visit link download