The Web from the Command Line

From LMU BioDB 2017
Jump to: navigation, search

Due to the “need-to-know” approach that we take in this course, our study of the command line necessarily takes a leap to a fairly powerful and advanced command. This command beings the web to the command line, and learning it helps one understand what is truly happening behind the scenes when we visit websites with our web browsers. That command is curl. Strictly speaking, it is spelled cURL because its name is intended to mean “see URL.” “See” and “c”—get it?

Basic Usage

Put simply, curl performs single web requests and displays the response provided by the contacted web server without further processing (nor layout). In its simplest form, one can just give it a URL:

curl http://www.lmu.edu

For most URLs, invoking this command will produce a flood of text—a perfect use case for more/less, or for output redirection in case you want to save the file to your computer:

curl http://www.lmu.edu > lmu.html

That’s it, really. Using curl, you can get the content of a web page purely at the data level. No images are loaded, no layout is done, no visuals are rendered. This command is like the very first step that a web browser takes when visiting a website, except that it goes no further than that.

Why curl?

One look at this might beg the question of why the command exists at all, especially when we have web browsers that work perfectly fine for daily use. Daily use is the operative term here—of course curl is not meant to be a web browser replacement. Instead, as a command line program that performs a simple request/response cycle, curl can be used for scripting, automation, and other types of processing that go beyond the visual consumption of a website’s content.

curl Mimics Requests in the Network Developer Tools Tab

One such use of curl is to trigger requests that you can’t perform just by typing a URL into a web browser. The location bar in web browsers only perform what are called GET requests—requests that are meant to retrieve content from a server. Some requests, however, have a different method, such as POST or PUT—these are meant to submit data to a server. On web browsers, you do this implicitly by filling out forms and clicking on some Submit button. If you don’t want to go through a browser or would like to do this automatically, you can use curl.

For example, interacting with this wiki makes use of POST requests (go ahead, work with this wiki with the Network tab selected in developer tools). This means that one can’t just edit pages by typing something into the web browser locator bar—which is probably a good thing. Instead, you are required to type into an editor area, then click one of the buttons at the bottom in order to process the request.

With curl, you can perform the same request directly:

curl -X POST -d "title=Sample_Page&action=submit" https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php

This sends a POST request to our wiki server with data items title whose value is Sample_Page and action with a value of submit. If you read through the response that you’ll get, you’ll eventually see that the wiki server did correctly interpret this request as one to edit a page, but it refused to do it anyway because a user needs to be logged in. If you search the text you’ll see that it includes the message “You do not have permission to edit this page:”

curl -X POST -d "title=Sample_Page&action=submit" https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php | grep "permission"

Logging in is a whole other issue, but the point of this example is to show that you can simulate a web browser action that you wouldn’t otherwise be able to do without interacting with the web browser directly.

If you can see a web request in the Network tab of a web browser’s developer tools, then you can simulate this request with curl. And to conclude on that note: if you look at all of the information in the Network tab, you will see that the web browser actually sends a lot of data and metadata to a web server with every request. Given the claim that curl can simulate these requests accurately, you can also infer that this page only scratches the surface of what curl can do. However, from the “need-to-know” perspective, the page does contain enough hints to help you figure out applicable assignments in this class, particularly Week 3.