Decompression of deflated string while staying compatible with the majority of servers.
Description
Certain Servers will return deflated data with headers which PHP’s gzinflate() function cannot handle out of the box. The following function has been created from various snippets on the gzinflate() PHP documentation.
Warning: Magic numbers within. Due to the potential different formats that the compressed data may be returned in, some "magic offsets" are needed to ensure proper decompression takes place. For a simple progmatic way to determine the magic offset in use, see: https://core.trac.wordpress.org/ticket/18273
Parameters
$gz_data
stringrequired- String to decompress.
Source
public static function compatible_gzinflate($gz_data) {
if (is_string($gz_data) === false) {
throw InvalidArgument::create(1, '$gz_data', 'string', gettype($gz_data));
}
if (trim($gz_data) === '') {
return false;
}
// Compressed data might contain a full zlib header, if so strip it for
// gzinflate()
if (substr($gz_data, 0, 3) === "\x1f\x8b\x08") {
$i = 10;
$flg = ord(substr($gz_data, 3, 1));
if ($flg > 0) {
if ($flg & 4) {
list($xlen) = unpack('v', substr($gz_data, $i, 2));
$i += 2 + $xlen;
}
if ($flg & 8) {
$i = strpos($gz_data, "\0", $i) + 1;
}
if ($flg & 16) {
$i = strpos($gz_data, "\0", $i) + 1;
}
if ($flg & 2) {
$i += 2;
}
}
$decompressed = self::compatible_gzinflate(substr($gz_data, $i));
if ($decompressed !== false) {
return $decompressed;
}
}
// If the data is Huffman Encoded, we must first strip the leading 2
// byte Huffman marker for gzinflate()
// The response is Huffman coded by many compressors such as
// java.util.zip.Deflater, Ruby's Zlib::Deflate, and .NET's
// System.IO.Compression.DeflateStream.
//
// See https://decompres.blogspot.com/ for a quick explanation of this
// data type
$huffman_encoded = false;
// low nibble of first byte should be 0x08
list(, $first_nibble) = unpack('h', $gz_data);
// First 2 bytes should be divisible by 0x1F
list(, $first_two_bytes) = unpack('n', $gz_data);
if ($first_nibble === 0x08 && ($first_two_bytes % 0x1F) === 0) {
$huffman_encoded = true;
}
if ($huffman_encoded) {
$decompressed = @gzinflate(substr($gz_data, 2));
if ($decompressed !== false) {
return $decompressed;
}
}
if (substr($gz_data, 0, 4) === "\x50\x4b\x03\x04") {
// ZIP file format header
// Offset 6: 2 bytes, General-purpose field
// Offset 26: 2 bytes, filename length
// Offset 28: 2 bytes, optional field length
// Offset 30: Filename field, followed by optional field, followed
// immediately by data
list(, $general_purpose_flag) = unpack('v', substr($gz_data, 6, 2));
// If the file has been compressed on the fly, 0x08 bit is set of
// the general purpose field. We can use this to differentiate
// between a compressed document, and a ZIP file
$zip_compressed_on_the_fly = ((0x08 & $general_purpose_flag) === 0x08);
if (!$zip_compressed_on_the_fly) {
// Don't attempt to decode a compressed zip file
return $gz_data;
}
// Determine the first byte of data, based on the above ZIP header
// offsets:
$first_file_start = array_sum(unpack('v2', substr($gz_data, 26, 4)));
$decompressed = @gzinflate(substr($gz_data, 30 + $first_file_start));
if ($decompressed !== false) {
return $decompressed;
}
return false;
}
// Finally fall back to straight gzinflate
$decompressed = @gzinflate($gz_data);
if ($decompressed !== false) {
return $decompressed;
}
// Fallback for all above failing, not expected, but included for
// debugging and preventing regressions and to track stats
$decompressed = @gzinflate(substr($gz_data, 2));
if ($decompressed !== false) {
return $decompressed;
}
return false;
}
Changelog
Version | Description |
---|---|
1.6.0 | Introduced. |
User Contributed Notes
You must log in before being able to contribute a note or feedback.