

The tar package (from “Tape ARchive”) often does archiving in conjunction with other utilities. These utilities are single-purpose and compress single files only–they do not by themselves create archives. Linux has several different useful utilities for file compression, such as bzip2, gzip, and xz.
FILE COMPRESSION SOFTWARE
Windows has its own built-in software that lets you designate files, folders, and entire drives as compressed, extending the capacity of storage media. Open-source compression utilities are also available, such as Peazip, 7-Zip, and gzip.
FILE COMPRESSION ZIP
In recent years, Microsoft has bundled PKZIP technology into Windows, allowing the operating system to automatically recognize and open most zip files. PKZIP compresses, decompresses, and allows the creation of complex archives, saving them with the file extension. PKZIP, a commercially-available utility program first introduced in the late 1980s, has become a de facto compression standard for the Microsoft Windows environment. Archiving usually requires a separate program. Other programs, particularly in the Linux/Unix domain, only handle compression of one file at a time. Some compression programs also let you combine multiple files together, providing the dual benefit of smaller space and archival packaging. This packaged collection of files is called an archive. It is frequently convenient to package many files and/or folders into a single compressed file, such as for emailing a collection of files or distributing a complex software application. It may lose some of the clarity of the original but is still perfectly usable and is far quicker to download. Using an image editor and lossy compression, you might create a compressed version of that photo that is 200KB. For example, a photo in its raw form may take 5MB, but if you want to use it on a web page, using that photo would cause the page to load more slowly. Files relying upon human perception often utilize lossy compression, since the source material may have more resolution than we can realistically perceive. Lossy compression can often produce more compact results by discarding data that may not affect the final resolution of the file. Documents, spreadsheets, and similar other files are often compressed with lossless techniques like these LZ-based algorithms. The larger the string it can find, and the more often that string recurs through the file, the more it can compress the output file. The algorithm uses an adaptive technique that analyzes the source file for strings of characters that repeat. Most lossless compression algorithms build upon the work Abraham Lempel and Jacov Ziv pioneered in the late 1970s in creating the algorithms that would be called LZ (many subsequent compression algorithms build upon this work, so their names begin with this pattern: LZO, LZW, LSWL, LZX, LZJB, etc.).

A lossless compressed file retains all information so that decompressing it restores the original file in its entirety. Compressing them further yields results only a few percent smaller than the originals–in some cases, they may become slightly larger when compressed, since the compression can add a small amount of management data to the file.Ĭompression comes in two basic types, lossless and lossy. Conversely, files that have already been compressed, such as MP3s and JPEGs, have low redundancy. Text files, for example, may have many repeated words or letter combinations that can produce significant compression–as much as 80%, in some cases.ĭatabases and spreadsheets often also make good candidates for file compression because they, too, typically have repeated content. The more redundancy the compression algorithm detects, the smaller the compressed file becomes. Most compression techniques work by reducing the space redundant information in a file takes up. This technology has applications ranging from archives and backups to media and software distribution. While the type of source file and the type of compression algorithm determines how well compression works, a compressed set of an average mix of files typically takes about 50 percent less space than the originals. Many different kinds of software, including backup programs, operating systems, media apps, and file management utilities, use this technique. This article is good for general audiences and provides an introduction to data compression techniques and uses.įile compression is a technique for “squeezing” data files so that they take up less storage space, whether on a hard drive or other media.
