String operations
Strings as objects
If we have the string "Alice"
and the string "Bob"
. It is clear that, despite both being strings, each string object is distinct from each other. In programming, we think of objects as specific instances of a class (or data type). We will be a little more thorough with this definition later (in fact, it’s not a correct one because we are leaving details out) but this naive understanding will suffice for now.
For many objects, it is possible to invoke the methods associated with that object. Methods are special kinds of functions associated with the data type of an object. As a concrete example, consider the code
myMsg = 'Hello, World!'
myMsgUpper = myMsg.upper()
print(myMsgUpper)
If you run this code, you would obtain the following:
HELLO, WORLD!
The .upper()
part of the code is an example of a method implemented in Python’s str
data type. When it is invoked on myMsg
, it knows that its effect should take place specifically only on the contents of myMsg
. Thus, we arrive at the string HELLO, WORLD!
.
Python’s str
data type implements a lot of different methods. For a full list of those, you should consult this page on the Python documentation page. We will cover a couple of these, but not all of them.
Indexing strings
There are many cases where we may wish to access a particular single character from a string. To do this, we use the indexing operator:
letters = 'ABCDE'
print(letters[1])
This will output
B
Notice that this tells us that letters[1]
selects the second character from letters
. In computer science, we always start counting from zero, not one. There are a couple reasons why this is the case:
- Dijkstra’s Why numbering should start at zero
- In lower-level programming languages such as C and C++, indexing is a special case of pointer arithmetic where the 0th index makes the most sense.
Typically, we think of indices as telling us how far to go along the sequence of characters in the string from left to right. There are cases, however, where it may be useful to go from right to left instead. A common example is if we need the last character of a string:
myMsg = 'Hello, World!'
print(myMsg[-1])
This will output
!
That is, the \((-1)\)-index corresponds to the last character in the string. Accordingly, the \((-2)\)-index corresponds to the second to last character in the string, and so on.
Slicing strings
Building off of indexing strings, sometimes we don’t want just one character, but rather a substring of our string. To obtain those substrings, we slice the string. For example,
myString = 'ABCDEFG'
mySlice = myString[1:4]
print(mySlice)
This prints BCD
. That is, myString[1:4]
is interpreted as “return the part of myString
from 1st character to the 4th character, including the first and excluding the last.
There are a few variants on this:
myString = 'ABCDEFG'
print(myString[:3]) # prints ABC
print(myString[3:]) # prints DEFG
We can even use negative indices:
myString = 'ABCDEFG'
print(myString[:-2]) # prints ABCDE
print(myString[-2:]) # prints FG
Length of string
To find the length of a string (i.e. the number of characters in a string), we use the len()
function:
myString = 'ABCDEFG'
print(len(myString)) # prints 7
A common mistake that many programmers (even experienced ones) make is the following:
myString = 'ABCDEFG'
print(myString[len(myString)]) # try to print last character
This gives IndexError: string index out of range
. This occurs because because Python is \(0\)-indexed and the first character of a string starts at index 0. Accordingly, the change we need to make is change myString[len(myString)]
to myString[len(myString) - 1]
.
Immutability of strings
It is possible that, at some point, we may want to change the content of a string. Python prevents this, however:
myString = 'Alice'
myString[0] = 'B' # error
print(myString)
Instead of printing 'Blice'
, we are met with the runtime error TypeError: 'str' object does not support item assignment
. The precise term for this is that strings are immutable (i.e. they cannot be modified).
The concept of strings being immutable is intentional. In fact, it would be bad for strings to be mutable. Why? Suppose we that strings were mutable and we had something like
class Person:
def __init__(self, name):
self.__name = name
def get_name():
return self.__name
aliceName = 'Alice'
alice = Person(aliceName)
aliceName[0] = 'B'
print(alice.get_name())
We haven’t talked about classes yet, so don’t be worried about not knowing what exactly is going on there. For now, interpret that part of the code as a template that defines a custom data type called Person
. In any case, if that code were valid, it would print Blice
. At this scale, that might seem unimportant, but imagine if we had a project that involved thousands of instances of Person
and they all crucially depend on aliceName
. If we modify aliceName
so that its first character is 'B'
, then we would change the name of thousands of Person
’s which would be bad.
Accordingly, it is more simple to prevent strings from being modified. The best we can do, if we want to obtain 'Blice'
from 'Alice'
is through string concatenation:
myString = 'Alice'
myString = 'B' + myString[1:]
print(myString)
Traversing over strings
In many scenarios, it is useful to be able to traverse a string from left to right. This is easily done via the for
loop:
for character in 'Alice':
print(character)
When run, this code outputs
A
l
i
c
e
Another way that strings are traversed through is via indices. For instance:
name = 'Alice'
for i in range(len(name)):
character = name[i]
print(character)
This approach is useful if we need to have access to the position of a particular character in a string. If we don’t need the position, however, then it is recommended to use the previous approach.
The in
operator
There are many cases where one may wish to check if a string is a substring of another string. In Python, we check for this by using the in
operator.
print('A' in 'Alice') # True
print('a' in 'Alice') # False --- case-sensitive
print('li' in 'Alice') # True
print('il' in 'Alice') # False --- order-sensitive
We can also combine the not
operator with in
:
print('A' not in 'Alice') # False
print('a' not in 'Alice') # True
print('li' not in 'Alice') # False
print('il' not in 'Alice') # True
Splitting strings
Suppose we are dealing with data, in the form of a string, that is delimited by commas:
White,Walter,Chemist
We might be interested in splitting the string up into a list of strings so that we can access each entry of the data separately. For instance:
data = 'White,Walter,Chemist'
dataList = data.split(',')
print(dataList)
This outputs
['White', 'Walter', 'Chemist']
Now, of course, we will not want to represent our data initially in real-life as a gigantic string. Because of this, we will later talk about representing data more sanely using the pandas library.
Formatting strings
At some point, you are going to hate having to write string concatenations that look something like this:
a = -7
b = 2
q = a // b
r = a % b
print('Long division on ' + str(a) + ': ' + str(a) + ' = ' + str(q) + '*' + str(b) + ' + ' + str(r))
Not only are such coding statements ugly and verbose, but they also are not particularly easy to read either. One use of string formatting is precisely to get rid of code that looks like this. The above can be rewritten as
a = -7
b = 2
q = a // b
r = a % b
print('Long division on {}: {} = {}*{} + {}'.format(a, a, q, b, r))
The author can hardly do string formatting justice, so the recommendation here is simply to read the Python documentation. String formatting is quiet powerful and is useful for many other purposes than the one example shown above.
Exercises
Write a function
count_occurences(searchStr, letter)
which returns the number of timesletter
occurs in the stringsearchStr
.- One variant of the slice notation not mentioned above is that we can specify a “step” value. What happens when we run the following?
start = 1 stop = 13 step = 2 myStr = 'ABCDEFGHIJKLMNOP' print(myStr[start:stop:step])
Explain why we get that output. Experiment more with this version of slice notation. What happens if
step = -1
? Write a function that reverses the order of characters in a string.
- Write a function that removes all occurences of a substring in a string. (Hint: Use the
find
method)