ZIP (Zipped Archive) is a popular compressed file format used to store and transmit multiple files in a single container. Here’s an overview of the ZIP file format:

History

The ZIP file format was created by Phil Katz, a computer programmer, in 1989. Katz developed the ZIP format as a replacement for earlier compression formats like ARC (Archive) and LZH (Lempel-Ziv-Huffman). The first version of the ZIP format was released in February 1990.

Structure

A ZIP file consists of several parts:

  1. End of Central Directory Record (EOCDR): This record marks the end of the central directory. It contains information like the location of the central directory and the size of the ZIP file.
  2. Central Directory: This section contains metadata about each file in the ZIP archive, such as:
    • File name
    • File size
    • Compression method used
    • CRC-32 (Cyclic Redundancy Check) checksum
  3. Local File Headers: These headers contain information specific to each compressed file within the ZIP archive.
  4. Compressed Data: This section contains the actual compressed data for each file in the ZIP archive.

File Format

The ZIP file format uses a combination of bit flags, variable-length integers, and fixed-size fields to store metadata and compressed data. The file format is as follows:

  • End of Central Directory Record (EOCDR):
    • 4 bytes: “PK” signature
    • 2 bytes: Version number
    • 2 bytes: Minimum version required to extract the ZIP archive
    • 2 bytes: Flags
    • 2 bytes: Compression method used for the central directory
    • 2 bytes: Size of the central directory
    • 4 bytes: Offset of the central directory from the beginning of the file
  • Central Directory:
    • File header (46 bytes):
      • 4 bytes: “PK” signature
      • 2 bytes: Version number
      • 2 bytes: Flags
      • 2 bytes: Compression method used for this file
      • 4 bytes: CRC-32 checksum of the file data
      • 4 bytes: Compressed size of the file data
      • 4 bytes: Uncompressed size of the file data
    • File name (variable length)
  • Local File Headers:
    • 30 bytes: File header with metadata similar to the central directory
    • Variable-length compressed data

Compression

ZIP files use various compression algorithms, including:

  1. DEFLATE (default)
  2. LZMA (optional)
  3. BZip2 (optional)

The choice of compression algorithm depends on the specific ZIP implementation and the options used when creating the ZIP archive.

Advantages

The ZIP file format offers several advantages, such as:

  • Compression: ZIP files can store multiple files in a compressed format, reducing storage space.
  • Portability: ZIP files are widely supported across different operating systems and platforms.
  • Flexibility: The ZIP format allows for various compression algorithms and encryption methods.

Common uses

ZIP files are commonly used for:

  • Archiving multiple files
  • Compressing large files or datasets
  • Sharing files over email or online platforms
  • Packaging software distributions

In summary, the ZIP file format is a widely used compressed archive format that stores multiple files in a single container. Its structure consists of metadata and compressed data sections, and it supports various compression algorithms and encryption methods.