Python Tutorial

Reading Materials

Introductory Guides

Reference Materials

  • The Standard Library – official documentation of the enormous variety of built-in functionality in the ‘stdlib’
  • Python Module of the Week – community-written tutorials for a number of useful modules (updated for Python 3)

Getting Started

Python is both the name of the language and a terminal command already installed on your mac. It’s typically capitalized as ‘Python’ in the former context and lowercase when you type it at the command line as python.

Open a terminal window and try typing it now, then hit return. You should see something along the lines of:

% python
Python 2.7.10 (default, Aug 17 2018, 19:45:58) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

This is the REPL, which you can think of as equivalent to the javascript console in a browser. You can enter a line of python code after the >>> prompt and when you hit return it will execute that code and print out the result (if any). Try using it as a simple calculator by entering some of the following:

1 + 2 + 3 + 4
10 / 5 - 3
2**8

The line that gets printed out with the result is the value of the ‘expression’ you entered and can be stored in a variable for future use. For example:

seconds_in_a_day = 24 * 60 * 60
seconds_in_a_year = 365 * seconds_in_a_day

As with javascript, variables can refer to any type of data:

an_integer = 27
a_float = 3.14   # including a decimal point makes it a floating-point number
a_boolean = True # note that True and False are capitalized
some_text = "Twas brillig and the slithy toves did gyre & gimble..."
some_names = ["Alice", "Bob", "Carol"]
misc_attributes = {"height":1.8, "age":64, "vegetarian":True}

Once you’re done interacting with the REPL, you can exit it by entering quit() and hitting return, or typing <CTRL> d.

Hello World

One of your most common activities while writing and debugging will be printing text, variables, etc. to the console so you can sanity-check your program as it executes. In javascript you use the console.log function for this purpose, but in python it's simply called print:

print "Hello, world"
print 14, "iced", "bears

Note that print is a little atypical since it's a statement rather than a function—meaning that unlike basically every other operation in the language, you don't need to enclose its arguments in parentheses. Also note that you can pass more than one value in a comma-separated sequence and spaces will automatically be added between the values.

If you want more control over how strings are constructed (whether you’re printing them out or storing them in a variable), you can use the % operator to combine a ‘format string’ with some values:

birds = [4, 20]
price = 12.5
pastry = 'pie'
txt = '%i & %i blackbirds baked in a %s: $%0.2f' % (birds[0], birds[1], pastry, price)
print txt

Each spot where there’s a % within the format string will be replaced with the corresponding item from the list following the % operator. The letters following the % let you control how strings and numbers are formatted.

Saving your work

As handy as it is to be able to get instant feedback in REPL, you’ll want to do most of your coding in a text editor since, among other things, that way your work doesn’t disappear when you quit the program! A ‘script’ is just a text file (typically with a name ending in .py) that contains 1 or more lines of python code in it.

Test this out by creating a file called hello.py containing the text:

name = raw_input("Who's there? ")
print "Hello, %s!"

Now, in the terminal, cd into the directory containing your script and type the command:

python hello.py

When we typed python on its own earlier, we saw how that launched the REPL for us to interact with. Here though we've provided the name of our script as an ‘argument’, so rather than going into interactive-mode, the python command will execute all the instructions in our script and then exit.

Indentation, Loops, and Conditionals

Perhaps the biggest visual difference between Python and most languages with C-inspired syntax is the lack of curly braces as a structural element. In javascript, you’re used to if-statements, for-loops, functions, etc. beginning and ending with braces to define their scope, and indentation is used merely to aid readability—it has no meaning as far as the language itself is concerned.

Python turns this relationship on its head. Here, indentation is meaningful—in fact it’s the only way to indicate that we’ve stepped into a loop body, branched into a conditional, or are defining a function. For example, here’s a simple for-each loop in javascript:

[1,2,3,4].forEach(function(num){
    console.log(num)
})
console.log("done")

And here’s that same loop again in Python:

for num in [1,2,3,4]:
    print num
print "done"

Note that in the js version, the loop variable (num) is created as an argument to the function passed off to forEach, whereas Python as specific syntax for defining the variable relative to a sequence of values using the format "for variable in sequence:".

Also important is that the line beginning the loop ends with a colon—this is the indication that the indentation level is about to change. Every indented line following that colon will be part of the loop, and as soon as a line that’s back at the original indentation level is seen, the loop will end.

Sometimes rather than iterating over the items in a list directly, you’d rather deal with them in terms of their ‘index’. In javascript you’d use the classic 3-stanza (init/test/increment) for statement for this:

let names = ['alice', 'bob', 'carol']
for (var i=0; i<names.length; i++){
    console.log('element '+i, names[i])
}

Python on the other hand only has the for ___ in ___ form for looping, but it provides a handy built-in function called range that generates a list of integers from 0 to an arbitrary endpoint. For instance range(5) generates the list [0, 1, 2, 3, 4]. We can use this in conjunction with another helper function called len (which measures any kind of sequence) to construct a for-loop similar to the javascript version:

names = ['alice', 'bob', 'carol']
for i in range(len(names)):
    print "element %i: %s" % (i, names[i])

Since it’s such a regular occurrance to need both the index and the value of the item in a list, there’s another helper built-in function called enumerate that takes a list and returns pairs of values: the ‘index’ into the list, and the item itself. This lets us rewrite the loop as follows:

names = ['alice', 'bob', 'carol']
for i,name in enumerate(names):
    print "element %i: %s" % (i, name)

Looping over lists is by far the most common, but Python considers any sort of ‘sequence’ to be a candidate for iteration. For instance, you can iterate over the characters in a string:

for c in 'zork':
    print c

If you’re dealing with a dictionary, you can also iterate over all the keys using the same syntax—though note that the order in which they pop up will be unpredictable:

info = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
for key in info:
    print key, info[key]

One of the main reasons to iterate over all the items in a sequence is to indentify particular elements of interest and treat them differently from the others. The primary mechanism for making behavior conditional on a test is the if statement. For instance, we could add an if to the body of the previous loop to only print out the key that corresponds to a value of 3:

info = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
for key in info:
    if info[key] == 3:
        print key

A series of tests can be chained together using if, elif (short for else-if), and else. As soon as any one of the tests is satisfied, the code indented below it will execute and none of the subsequent tests will be checked. This is called 'short circuiting' logic:

sounds = ['Oops', 'Sigh', 'Biff!', 'Bang!', 'Pow!', 'Click', 'Shuffle', 'Boooooom!']
for sound in sounds:
    if 'ooo' in sound:
        print ' '.join(sound)
    elif sound.endswith('!'):
        print sound.upper()
    else:
        print sound.lower()

Datatypes

  • Ints, Floats, Tuples, Lists, and Dicts (and their corresponding builtins)
  • Slicing syntax
  • Sorting (using the sorted builtin and friends

Modules and the Standard Library

One of the main reasons to use Python (aside from its pleasant syntax) is its voluminous collection of built-in utility code: The Standard Library. Because there’s so much in there, the various functions and objects have been grouped into a hierarchy of Modules which allow you to only include the parts you need in your script. To use a particular module, you need to import either the whole thing or particular subcomponents of it at the top of your script.

For instance, we could use the math module to provide us with some handy constants and utilites:

import math

r = 3
area = math.pi * math.pow(r, 2)
print "A circle with radius %i has an area of %f" % (r, area)

If you’re in the REPL, you can read a manual page-style description of the module by typing help(math), or of any of its contents by adding on a name separated by a dot—e.g., help(math.ceil). To get a listing of everything the module contains, we can use the dir builtin:

print dir(math)

If you know ahead of time which of the contents of a module you’ll be using, you can import those functions by name (which saves you from having to repeat the module name each time you use them). For instance, we could rewrite the circle example as:

from math import pi, pow

r = 3
area = pi * pow(r, 2)
print "A circle with radius %i has an area of %f" % (r, area)

Some Highlights

The stdlib is huge and absurdly useful. Whenever you’re grappling with a problem you suspect someone has dealt with before, check for a module; more often than not one already exists and you can take advantage of someone else having worked out the bugs for you in advance.

As you’re first getting started, here are some modules that provide the biggest benefits:

  • sys the internal state of the language/interpreter
  • io reading and writing unicode
  • os filesystem, path, and process utilities
  • json ‘load’ or ‘dump’ contents of variables to strings
  • math trig functions, rounding, logarithms, and handy constants
  • re regular expressions
  • collections useful specializations of dict types
  • pprint pretty-print nested datastructures to console
  • subprocess run other command line utilities from your script

Third-party Modules and Package Management

Even more libraries are available from the open source community and have been conveniently collected by the Python Package Index (PyPI) which you can think of as equivalent to NPM in the javascript world. For every project you work on you’ll probably want to make use of a different selection of modules. In order to avoid trouble down the road it’s a good idea to keep your third-party modules alongside your project code rather than installing them globally. To make this as streamlined as possible, I encourage you to use Virtual environments via the virtualenv command when setting up your scripts.

A ‘virtual environment’ is a folder that sits at the top-level of your project typically named env which provides a custom python command and subdirectories that dependencies will be installed into. Check to see whether the virtualenv command already exists on your machine. If not, type this on the command line to install it:

sudo easy_install virtualenv==16.0.0

Once complete, you should never need to use the easy_install command again. Now let’s set up a work directory and create an environment within it. In terminal type something like this:

mkdir -p ~/python/test
cd ~/python/test
virtualenv env

Now that the environment has been set up, you can start using it by typing:

source ./env/bin/activate

You should now see the text (env) as part of your command line prompt, letting you know that you’re actively working in a virtualenv. From now on, typing python will use the executable at env/bin/python rather than the global version used by the rest of the system. Most important, this local version of python will let you import any of the PyPI modules we’re about to install.

You can add dependencies to your environment using the pip command. Let’s install a couple of handy web-related modules right now:

pip install requests jinja2

You should now be able to open up the python REPL and type import requests without it triggering an error. Try taking a look at the docs by typing help(requests) to see what it's all about.

When you’re done working on a particular project you can type deactivate at the command line (not within the Python REPL) and note that the (env) in your prompt disappears. From that point on, anytime you type python it will instead use the system version and the modules installed via pip won’t be visible.

Web Scraping with requests

The Requests library has one of the most nicely designed APIs in existence and makes accessing web pages as straightforward as possible. Create a new file called scrape.py and paste in the following code:

import requests

r = requests.get("https://ms2.samizdat.co/2019/static/s+s/")
print r.text

Now try running the script from the command line (making sure you've activate-ed the virtualenv we set up earlier). Once we've pulled some text down from the net, we can easily save a local copy of it to disk using the open function to create a file in writeable mode:

from io import open
import requests

r = requests.get("https://ms2.samizdat.co/2019/static/s+s/")
with open('polls.html', 'w') as f:
    f.write(r.text)

Take a look at the polls.html file and note that it links to a bunch of text files. Copy and paste those URLs into your script, either as a list or a dictionary that has filenames as keys and URLs as values:

years = ["https://ms2.samizdat.co/2019/static/s+s/1952.txt", ...]

~~or~~

years = {"1952.txt":"https://ms2.samizdat.co/2019/static/s+s/1952.txt", ... }

Now write a for loop that steps through the items in the list, then:

  1. Uses requests.get to fetch the URL
  2. Uses open and .write to save each file locally

You should now have a bunch of files in the same directory as your script called 1952.txt, 1962.txt and so forth. To read through the lines of text in one of them we can use the open function again but this time in read-only mode (the default) and then use a for-loop to iterate over the lines. Create a new file called process.py and paste in:

from io import open

with open('1952.txt') as f:
    for line in f:
        print line

You can also unpack the lines into a standard list by simply calling the .readlines method of the file object. This may actually be the more useful approach since you can then loop repeadedly over the list, have random access to the lines, etc. Also note that the os.path module has a ton of functions that let you snap-together and pull-apart portions of filenames. Here we’re using the splitext function to separate the name and file-extension (and then discard the latter):

from io import open
from os.path import splitext

filename = '1952.txt'
with open(filename) as f:
    year = splitext(filename)[0]
    print 'YEAR:', year
    lines = f.readlines()
    print lines

Rather than having to type all the individual file names in by hand, it'd be handy if you could read the contents of the directory you’re in and then loop over the results. The glob module provides a helper for just such a task:

from glob import glob

all_files = glob('*.txt') # this will match any filename ending in ".txt"
print all_files

Try using the all_files list returned by glob to set up a for-loop that steps through each of the text files and opens it as above.

Exercise: Convert the Text Files to JSON

Now that you’ve seen the mechanics for fetching and loading the files in a loop, the next step is to process the contents of the lines in order to extract the different ‘fields’ within. You’ll note that each text file covers a single year, and most of the lines in a given file are of the form:

<whitespace><2-digit-rank>. <Title> | <1-or-more-digits> mentions

There are also some blank lines in each file that will need to be ignored, and a final line that lists off ‘honorable mentions’ using the form:

Closest runners-up: <title-1>, <title-2>, <...>, and <title-n> (<1-or-more-digits> mentions)

Your goal is to create a JSON file that combines the information from these different text files using a structure similar to:

{
    "1952": [
        {title:"Bicycle Thieves", rank:1, mentions:25},
        {title:"City Lights", rank:2, mentions:19},
        ...
    ],
    "1962": [ ... ],
    "1972": [ ... ],
    ...
}

To do this, you’ll want to build up a dictionary using the years from the filenames as keys, and values that are a list of films. Each film in one of these year-specific lists should be a dictionary with keys for title, rank, and mentions. Once you’ve created this nested dictionary in code, you can save it to a file as follows:

import json

polls = {
    "1952": [
        {"title":"Bicycle Thieves", "rank":1, "mentions":25},
        {"title":"City Lights", "rank":2, "mentions":19},
    ]
} 

with file('polls.json', 'w') as f:
    json.dump(polls, f, indent=2)

The tricky part will be processing the strings within each file as you step over its lines. Some pointers for extracting the data:

  • Use the built-in string methods, particularly strip and split to remove leading/trailing whitespace and break lines at particular characters (such as the | character).
  • Use the re.search method provided by the regular expressions module to match sequences of digits, names, and so forth. Surround the portions of the pattern that you want to ‘capture’ with parentheses and then pull them out using the .group method of the match object.

Extra Credit: Convert the JSON to Templated HTML

Now that you’ve condensed the data from the text files into a cleaned-up JSON file, you can now use that as the raw material for generating a new HTML page. Try using the Jinja library to separate the structural parts of your markup into a separate file and then merge in the data loaded from your JSON structure.

To do this you’ll first want to create a ‘template file’, which is mostly plain HTML, but with special tags wrapped in curly braces mixed in. The Jinja Template Reference has some excellent explanation of the various ways to insert values, loop over lists, etc. To start with try pasting this into a file called template.html:

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>Untitled</title>
</head>
<body>
  <p>Hello {{name}}</p>
</body>
</html>

Now create a separate file called render.py:

from io import open
import json
import jinja2

template = jinja2.Template(open('template.html').read())

# info = json.load(file('polls.json'))
markup = template.render(name="you")

with open('output.html', 'w') as f:
    f.write(markup)

This will ultimately load your polls.json file, perhaps do some pre-processing, and then pass it off to jinja for rendering. Finally, it will write the generated HTML out to a file called output.html. For now the only templating magic occurring is the insertion of the variable called name into the page’s lone <p> tag. It’s up to you to choose how to group and present the poll results. Do you want to simply show the rankings per year? The all-time totals per film? The films that only ever appeared in the ‘runner up’ lists? It’s up to you...

Show Comments