Python modules provide a way of packaging code in a reusable way. Much later we will see how to create our own modules. For now, the focus will be on loading modules that provide basic functionality beyond the standard functions available in the base language. These consist of (1) the Python standard library, and (2) third-party software. The first are available on all systems running Python. The second need to be installed on top of the basic language.
Let's start by loading the
sys module, which provides access to some variables
used or maintained by the interpreter and to functions that interact strongly
with the interpreter.
To load the module just run this:
To run a function or access a variable inside an imported module, we use a notation
that starts with the module name followed by a dot and then the object name, like
module.object. For example,
sys has an object called
version that describes
the version of Python that is currently installed. We can access it like this:
'3.6.6 |Anaconda custom (64-bit)| (default, Jun 28 2018, 11:07:29) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]'
Jupyter notebooks provide an easy way of seeing all of the objects in a module. Start by typing the module name followed be a dot, then hit the "Tab" button on your keyboard. A menu of all available objects will appear. Try that here and select an element to see what happens.
File "<ipython-input-3-57874cbcc37d>", line 1 sys. ^ SyntaxError: invalid syntax
One benefit of using Anaconda Python is that it includes by default many scientific
modules in addition to the standard library. One library that we will use heavily is
numpy. Let's try to load this library as well. Here we will use a slightly
different command that defines an alias for the library (
np is a very common alias
numpy that you will see in many scripts and examples):
import numpy as np
Now, to access an object in the numpy module, we type
np (rather than
followed by a dot and then the object name. For example, we can test the absolute
The python scientific stack, which we will use in most tutorials, consists of the following four modules and common aliases:
import numpy as np import scipy as sp import pandas as pd import matplotlib.pyplot as plt
Make sure that these load okay an your system. We will learn more about these modules throughout the semester.
Anaconda Python comes with many of the modules needed for general purpose data science work. We need to install several others to help us work with text and image data.
The way that you install these is slightly different depending on your platform.
which conda. If should print out a path with "anaconda3" in the name.
Now, the first thing we need to do is to update
conda, the package manager
for our version of Python. Do this by typing the following into the terminal
conda update --all
Now, install the keras library with the following:
conda install keras
Once that is done, you should be able to load
keras within this Python
notebook. Try it here:
Using TensorFlow backend.
Note that it may give a warning or message; this is okay as long as you do not have
an error. Now, proceed to install three other libraries (run each line one at a time,
y if prompted with a qustion):
conda install gensim conda install opencv conda install spacy conda install networkx
Then, run these using the
pip install --upgrade pip pip install pyLDAvis
Note: I fully expect some of you to have errors with some of these packages. Don't get too frustrated. I am doing this now, well before we need any of these so that I have plenty of time to help you.
Finally, check that each package you installed can be loaded ok:
import cv2 # this is the name of the opencv library
I just tested all of these on my computer and they worked fine. If you run into an issues please let me as soon as possible!
spacy modules also need some external data. Let's
try to load those now to make sure everything is working as expected.
import keras.applications vgg_model = keras.applications.vgg16.VGG16(weights='imagenet')
And for spacy, run the following:
Linking successful /anaconda3/lib/python3.6/site-packages/en_core_web_sm --> /anaconda3/lib/python3.6/site-packages/spacy/data/en You can now load the model via spacy.load('en')
Again, please let me know if you have any trouble with these steps. I'd like to figure them out before we actually need these libraries.
There is not much to practice here, but let's see how to
explore a module in Python. First, import the
Now, run the command
dir(platform) to see all of the functions in
os module. This is the same as using the
. and tab
['DEV_NULL', '_UNIXCONFDIR', '_WIN32_CLIENT_RELEASES', '_WIN32_SERVER_RELEASES', '__builtins__', '__cached__', '__copyright__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '__version__', '_default_architecture', '_dist_try_harder', '_follow_symlinks', '_ironpython26_sys_version_parser', '_ironpython_sys_version_parser', '_java_getprop', '_libc_search', '_linux_distribution', '_lsb_release_version', '_mac_ver_xml', '_node', '_norm_version', '_parse_release_file', '_platform', '_platform_cache', '_pypy_sys_version_parser', '_release_filename', '_release_version', '_supported_dists', '_sys_version', '_sys_version_cache', '_sys_version_parser', '_syscmd_file', '_syscmd_uname', '_syscmd_ver', '_uname_cache', '_ver_output', 'architecture', 'collections', 'dist', 'java_ver', 'libc_ver', 'linux_distribution', 'mac_ver', 'machine', 'node', 'os', 'platform', 'popen', 'processor', 'python_branch', 'python_build', 'python_compiler', 'python_implementation', 'python_revision', 'python_version', 'python_version_tuple', 're', 'release', 'subprocess', 'sys', 'system', 'system_alias', 'uname', 'uname_result', 'version', 'warnings', 'win32_ver']
Let's say you are interested in the function
find out more about this function, type the command
help(platform.system) in the code block below:
Help on function system in module platform: system() Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'. An empty string is returned if the value cannot be determined.
Based on the help page, what do you think would be the result of
platform.system on your machine?
Answer: I would expect it to return 'macOS'
Try running the command below:
Does the answer match your expectation? If not, try to figure out why!
Answer: Not exactly. It returns the name 'Darwin', the name of the open source operating system created by apple.