Tutorial 04: Strings

These notes are adapted from the Python tutorial available at: https://docs.python.org/3/tutorial/.

Here we see how Python works with strings of characters.

String literals

Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes ('...') or double quotes ("...") with the same result .

In [1]:
'spam eggs'  # single quotes
In [2]:
"spam eggs"  # double quotes (notice the output is the same!)

In the interactive interpreter, the output string is enclosed in quotes and special characters are escaped with backslashes. While this might sometimes look different from the input (the enclosing quotes could change), the two strings are equivalent. The string is enclosed in double quotes if the string contains a single quote and no double quotes, otherwise it is enclosed in single quotes.

The print() function produces a more readable output without the quotes and other special characters:

In [3]:
print("spam eggs")
In [4]:
s = 'First line.\nSecond line.'  # \n means newline
s                                # without print(), \n is included in the output
In [5]:
print(s)

Strings can be concatenated (glued together) with the + operator:

In [6]:
'tea' + 'pot'

This also works if we save the string as a variable:

In [7]:
prefix = 'Py'
prefix + 'thon'

Substrings

Strings can be indexed (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one:

In [8]:
word = 'Python'
word[0]  # character in position 0
In [9]:
word[5]  # character in position 5
Out[9]:
'n'

Indices may also be negative numbers, to start counting from the right:

In [10]:
word[-1]  # last character
Out[10]:
'n'
In [11]:
word[-2]  # second-last character
Out[11]:
'o'
In [12]:
word[-6]
Out[12]:
'P'

Note that since -0 is the same as 0, negative indices start from -1.

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain substring:

In [13]:
word[0:2]  # characters from position 0 (included) to 2 (excluded)
Out[13]:
'Py'
In [14]:
word[2:5]  # characters from position 2 (included) to 5 (excluded)
Out[14]:
'tho'

Note how the start is always included, and the end always excluded. This makes sure that s[:i] + s[i:] is always equal to s:

In [15]:
word[:2] + word[2:]
Out[15]:
'Python'
In [16]:
word[:4] + word[4:]
Out[16]:
'Python'

Slice indices have useful defaults; an omitted first index defaults to zero, an omitted second index defaults to the size of the string being sliced.

In [17]:
word[:2]   # character from the beginning to position 2 (excluded) 
Out[17]:
'Py'
In [18]:
word[4:]  # characters from position 4 (included) to the end
Out[18]:
'on'
In [19]:
word[-2:] # characters from the second-last (included) to the end
Out[19]:
'on'

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

The first row of numbers gives the position of the indices 0…6 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

Attempting to use an index that is too large will result in an error:

In [20]:
word[42]  # the word only has 6 characters
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-20-469c6d99b5b2> in <module>()
----> 1 word[42]  # the word only has 6 characters

IndexError: string index out of range

However, out of range slice indexes are handled gracefully when used for slicing:

In [21]:
word[4:42]
Out[21]:
'on'
In [22]:
word[42:]
Out[22]:
''

Practice

Below, define a variable called my_name and assign it a string with your full name.

In [23]:
my_name = 'Taylor Arnold'

Print out a friendly message that combines your name and a friendly greeting (such as "Hi Taylor! Have a great day!") using the variable you just created and the + operator.

In [24]:
'Hello ' + my_name + '!'
Out[24]:
'Hello Taylor Arnold!'

Consider the following string assigned to the variable food_type:

In [25]:
food_type = "apple pie"

Write code that prints out the first word of the string ("apple") using the substrings commands above:

In [26]:
food_type[:5]
Out[26]:
'apple'

Write code that prints out the second word of the string ("pie") using the substrings commands above by referencing from the end of the string with a negative index:

In [27]:
food_type[-3:]
Out[27]:
'pie'

Assign the string "pig" to the variable input_word:

In [28]:
input_word = 'pig'

Write code below to produce a plural form of the string in input_word, assuming that we can just add the letter s to the end. Store the value as plural_word. Return the plural form.

In [29]:
input_word + 's'
Out[29]:
'pigs'

What are some input words for which the code above would fail to produce the correct plural form in English.

Answer: Mouse, moose, and octopus.

Pig Latin is a made up word game common in English speaking countries. The exact rules can get complex, but the basic idea is to move the starting letter to the end of the word and add the letters "ay" to the end of the word. So, "pig" becomes "igpay".

Write code below that converts the word stored in input_word into Pig Latin.

In [30]:
input_word[1:] + input_word[0] + 'ay'
Out[30]:
'igpay'

Extra Practice

When you see an "Extra Practice" section, these are optional questions meant to stretch your understanding. They are often geared for students with prior programming experience.

String methods

String objects in Python have various methods associated with them. To call a method, add a . to the end of the string followed by the method name. For example,

In [31]:
x = "My name is taylor arnold."
x.upper()
Out[31]:
'MY NAME IS TAYLOR ARNOLD.'

In the blocks below, try the methods lower, title, captialize, and swapcase. All of these change the letters in the string between captial and non-captial versions.

In [32]:
x.lower()
Out[32]:
'my name is taylor arnold.'
In [33]:
x.title()
Out[33]:
'My Name Is Taylor Arnold.'
In [34]:
x.capitalize()
Out[34]:
'My name is taylor arnold.'
In [35]:
x.swapcase()
Out[35]:
'mY NAME IS TAYLOR ARNOLD.'

Now, take the strings x and y here:

In [36]:
x = "to be or not to be"
y = "tO bE oR nOT tO bE"

Using the given string methods above to convert x into the same string as y.

In [37]:
z = x.title().swapcase()
print(z == y)
z
True
Out[37]:
'tO bE oR nOT tO bE'

the is function

Run the following block of code. You should see that the end result returns the value of True, as you might expect:

In [38]:
x = 42
x is 42
Out[38]:
True

But now, run the following block of code instead. You should find that the result returns False.

In [39]:
x = 420
x is 420
Out[39]:
False

What in the world is going on here? Hint: you might try looking up what the is function does in Python.

Answer: This is a tricky question that gets at the internals of Python. From the docs:

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)

Using the id function, which shows the memory address for any integer, you can see that the integers -5 to 256 are in fact in adjacent memory locations (keeping in mind that it takes 32 bytes to store to the integer).

In [40]:
diff = []
for j in range(-5, 256):
    diff.append((id(j) - id(-5)) / 32)
    
print(diff)
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0, 128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0, 160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0, 176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0, 199.0, 200.0, 201.0, 202.0, 203.0, 204.0, 205.0, 206.0, 207.0, 208.0, 209.0, 210.0, 211.0, 212.0, 213.0, 214.0, 215.0, 216.0, 217.0, 218.0, 219.0, 220.0, 221.0, 222.0, 223.0, 224.0, 225.0, 226.0, 227.0, 228.0, 229.0, 230.0, 231.0, 232.0, 233.0, 234.0, 235.0, 236.0, 237.0, 238.0, 239.0, 240.0, 241.0, 242.0, 243.0, 244.0, 245.0, 246.0, 247.0, 248.0, 249.0, 250.0, 251.0, 252.0, 253.0, 254.0, 255.0, 256.0, 257.0, 258.0, 259.0, 260.0]