Tutorial 02: Python Modules

Python modules provide a way of packaging code in a reusable way. Much later we will see how to create our own modules. For now, the focus will be on loading modules that provide basic functionality beyond the standard functions available in the base language. These consist of (1) the Python standard library, and (2) third-party software. The first are available on all systems running Python. The second need to be installed on top of the basic language.

Standard library

Let's start by loading the sys module, which provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter.

To load the module just run this:

In [1]:
import sys

To run a function or access a variable inside an imported module, we use a notation that starts with the module name followed by a dot and then the object name, like this: module.object. For example, sys has an object called version that describes the version of Python that is currently installed. We can access it like this:

In [2]:
sys.version
Out[2]:
'3.6.6 |Anaconda custom (64-bit)| (default, Jun 28 2018, 11:07:29) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]'

Jupyter notebooks provide an easy way of seeing all of the objects in a module. Start by typing the module name followed be a dot, then hit the "Tab" button on your keyboard. A menu of all available objects will appear. Try that here and select an element to see what happens.

In [3]:
sys.
  File "<ipython-input-3-57874cbcc37d>", line 1
    sys.
        ^
SyntaxError: invalid syntax

Anaconda modules

One benefit of using Anaconda Python is that it includes by default many scientific modules in addition to the standard library. One library that we will use heavily is called numpy. Let's try to load this library as well. Here we will use a slightly different command that defines an alias for the library (np is a very common alias for numpy that you will see in many scripts and examples):

In [4]:
import numpy as np

Now, to access an object in the numpy module, we type np (rather than numpy) followed by a dot and then the object name. For example, we can test the absolute value function:

In [5]:
np.abs(-100)
Out[5]:
100

The python scientific stack, which we will use in most tutorials, consists of the following four modules and common aliases:

In [6]:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt

Make sure that these load okay an your system. We will learn more about these modules throughout the semester.

Additional modules

Anaconda Python comes with many of the modules needed for general purpose data science work. We need to install several others to help us work with text and image data.

The way that you install these is slightly different depending on your platform.

  • macOS / linux: open up a terminal window and type which conda. If should print out a path with "anaconda3" in the name.
  • windows: anaconda python should have installed a program called Anaconda Prompt. Open this and type the commands below into the prompt.

Now, the first thing we need to do is to update conda, the package manager for our version of Python. Do this by typing the following into the terminal or prompt:

conda update --all

Now, install the keras library with the following:

conda install keras

Once that is done, you should be able to load keras within this Python notebook. Try it here:

In [7]:
import keras
Using TensorFlow backend.

Note that it may give a warning or message; this is okay as long as you do not have an error. Now, proceed to install three other libraries (run each line one at a time, and select y if prompted with a qustion):

conda install gensim
conda install opencv
conda install spacy
conda install networkx

Then, run these using the pip installer:

pip install --upgrade pip
pip install pyLDAvis

Note: I fully expect some of you to have errors with some of these packages. Don't get too frustrated. I am doing this now, well before we need any of these so that I have plenty of time to help you.

Finally, check that each package you installed can be loaded ok:

In [8]:
import gensim
In [9]:
import cv2       # this is the name of the opencv library
In [10]:
import spacy
In [11]:
import networkx
In [12]:
import pyLDAvis

I just tested all of these on my computer and they worked fine. If you run into an issues please let me as soon as possible!

Installing data

The keras and spacy modules also need some external data. Let's try to load those now to make sure everything is working as expected.

In [13]:
import keras.applications

vgg_model = keras.applications.vgg16.VGG16(weights='imagenet')

And for spacy, run the following:

In [14]:
spacy.cli.download("en")
    Linking successful
    /anaconda3/lib/python3.6/site-packages/en_core_web_sm -->
    /anaconda3/lib/python3.6/site-packages/spacy/data/en

    You can now load the model via spacy.load('en')

Again, please let me know if you have any trouble with these steps. I'd like to figure them out before we actually need these libraries.


Practice

There is not much to practice here, but let's see how to explore a module in Python. First, import the platform module:

In [15]:
import platform

Now, run the command dir(platform) to see all of the functions in the os module. This is the same as using the . and tab notation above.

In [16]:
dir(platform)
Out[16]:
['DEV_NULL',
 '_UNIXCONFDIR',
 '_WIN32_CLIENT_RELEASES',
 '_WIN32_SERVER_RELEASES',
 '__builtins__',
 '__cached__',
 '__copyright__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__version__',
 '_default_architecture',
 '_dist_try_harder',
 '_follow_symlinks',
 '_ironpython26_sys_version_parser',
 '_ironpython_sys_version_parser',
 '_java_getprop',
 '_libc_search',
 '_linux_distribution',
 '_lsb_release_version',
 '_mac_ver_xml',
 '_node',
 '_norm_version',
 '_parse_release_file',
 '_platform',
 '_platform_cache',
 '_pypy_sys_version_parser',
 '_release_filename',
 '_release_version',
 '_supported_dists',
 '_sys_version',
 '_sys_version_cache',
 '_sys_version_parser',
 '_syscmd_file',
 '_syscmd_uname',
 '_syscmd_ver',
 '_uname_cache',
 '_ver_output',
 'architecture',
 'collections',
 'dist',
 'java_ver',
 'libc_ver',
 'linux_distribution',
 'mac_ver',
 'machine',
 'node',
 'os',
 'platform',
 'popen',
 'processor',
 'python_branch',
 'python_build',
 'python_compiler',
 'python_implementation',
 'python_revision',
 'python_version',
 'python_version_tuple',
 're',
 'release',
 'subprocess',
 'sys',
 'system',
 'system_alias',
 'uname',
 'uname_result',
 'version',
 'warnings',
 'win32_ver']

Let's say you are interested in the function platform.system. To find out more about this function, type the command help(platform.system) in the code block below:

In [17]:
help(platform.system)
Help on function system in module platform:

system()
    Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'.
    
    An empty string is returned if the value cannot be determined.

Based on the help page, what do you think would be the result of running platform.system on your machine?

Answer: I would expect it to return 'macOS'

Try running the command below:

In [18]:
platform.system()
Out[18]:
'Darwin'

Does the answer match your expectation? If not, try to figure out why!

Answer: Not exactly. It returns the name 'Darwin', the name of the open source operating system created by apple.