Handling Compressed Files

From LMU BioDB 2017
Jump to: navigation, search

Downloadable software is usually provided in a compressed format—that is, the files in the software have been processed so that they take up less space, resulting in a faster download. Compression also typically “packs” any software that consists of multiple files into a single file, again making the download process simpler.

Decompression

Compressed files can’t be used “as-is;” they need to be decompressed by an appropriate application. The key word here is “appropriate”—there are different kinds of decompression, and each kind may require a different application. Each kind of decompression can be thought of as a different file format, the way Microsoft Word files are either .doc or .docx files, or how images can be in .jpg, .gif, .png, or other formats.

The Week 4 starter files have been uploaded in the .zip compression format. As supplementary information, this wiki page discusses this format as well as another common format, .tar.gz (called a “tarball” in techie circles), that you might also encounter in the future. The two formats serve the same purpose; the only difference is how the two choices are compressed—a detail that most of us don’t need to know about.

.zip

The ZIP format is typically more familiar to Windows users, but is widely supported on other operating systems as well.

Typically, you can double-click on the file’s icon to unzip. On Windows, the ability to unzip depends on the version of Windows that you’re using. Some versions of Windows have the capability built-in; other versions require third-party applications such as 7-Zip and WinRAR, as well as WinZip. The Seaver 120 lab computers have 7-Zip.

There is a gotcha that you should be aware of when using the built-in Windows unzip functionality, if it is available to you: Windows defaults to doing “live” decompression; that is, as you double-click a .zip file and navigate through its contents, the .zip file may act like a folder, but in reality it’s still a single .zip file, which Windows opens as-you-go. This works fine when you’re just looking at files, but may cause confusion when you actually want to edit or run them. To be completely sure, right-click on the .zip and make sure to Extract the files so that they become actual files on the disk.

Command line tip: in the bash command line environment, there is a command for unzipping a .zip file:

unzip filename.zip

Substitute filename.zip with the actual name of the .zip file that you would like to decompress.

.tar.gz

The .tar.gz format is actually two formats: the first one, .tar, is responsible for grouping multiple files and folders into a single file. That single file is then compressed, producing the .gz. This format is generally most readily available on Unix-flavored operating systems like Linux or macOS. In the bash command line environment, you would “extract” the files in a tarball using this command:

tar xzf filename.tar.gz

The same operation can also be done from the graphical user interface by double-clicking on the file’s icon. On Windows, the open-source 7-Zip and shareware WinRAR applications can handle .tar.gz and other formats.

If you happen a file that ends in just .gz and not .tar.gz, then it is compressed only. Command-line decompression then requires a different command:

gunzip filename.gz

Graphical user interface approaches can figure this out on their own and don’t require that you do anything different.

Other Formats

Other compression formats abound, including bunzip2, compress (.Z files), .rar, and .7z, to name a few. It’s generally useful to understand how to deal with files such as these, since downloads are frequently compressed in some way.

Compression

Of course, if it is possible to decompress these files, then someone must have compressed them first. Many of the utilities listed above go both ways: they can create and extract compressed files. In most operating systems, right-clicking a folder reveals a menu that includes a Compress command, which uses the .zip format most of the time. On the command line, there are “mirror” commands for compressing files: zip instead of unzip for .zip files, tar czf instead of tar xzf for .tar.gz files, and gzip instead of gunzip for plain .gz files.