Strings 2 - operators

Download exercises zip

Browse files online

Python offers several operators to work with strings:

Operator

Syntax

Result

Meaning

len

len(str)

int

Returns the length of the string

indexing

str[int]

str

Reads the character at the specified index

concatenation

str + str

str

Concatenate two strings

inclusion

str in str

bool

Checks whether a string is contained inside another one

slice

str[int:int ]

str

Extracts a sub-string

equality

==,!=

bool

Checks whether strings are equal or different

comparisons

<,<=,>, >=

bool

Performs lexicographic comparison

ord

ord(str)

int

Returns the order of a character

chr

chr(int)

str

Given an order, returns the corresponding character

replication

str * int

str

Replicate the string

What to do

  1. Unzip exercises zip in a folder, you should obtain something like this:

strings
    strings1.ipynb
    strings1-sol.ipynb
    strings2.ipynb
    strings2-sol.ipynb
    strings3.ipynb
    strings3-sol.ipynb
    strings4.ipynb
    strings4-sol.ipynb
    strings5-chal.ipynb
    jupman.py

WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

  • open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook strings2.ipynb

  • Go on reading the exercises file, sometimes you will find paragraphs marked EXERCISE which will ask to write Python commands in the following cells.

Shortcut keys:

  • to execute Python code inside a Jupyter cell, press Control + Enter

  • to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

  • to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

  • If the notebooks look stuck, try to select Kernel -> Restart

Reading characters

A string is a sequence of characters, and often we might want to access a single character by specifying the position of the character we are interested in.

It’s important to remember that the position of characters in strings start from 0. For reading a character in a certain position, we need to write the string followed by square parenthesis and spcify the position inside. Examples:

[2]:
'park'[0]
[2]:
'p'
[3]:
'park'[1]
[3]:
'a'
[4]:
#0123
'park'[2]
[4]:
'r'
[5]:
#0123
'park'[3]
[5]:
'k'

If we try to go beyond the last character, we will get an error:

#0123
'park'[4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-106-b8f1f689f0c7> in <module>
      1 #0123
----> 2 'park'[4]

IndexError: string index out of range

Before we used a string by specifying it as a literal, but we can also use variables:

[6]:
    #01234
x = 'cloud'
[7]:
x[0]
[7]:
'c'
[8]:
x[2]
[8]:
'o'

How is represented the character we’ve just read? If you noticed, it is between quotes like if it were a string. Let’s check:

[9]:
type(x[0])
[9]:
str

It’s really a string. To somebody this might come as a surprise, also from a philosophical standpoint: Python strings are made of… strings! Other programming languages may use a specific type for the single character, but Python uses strings to be able to better manage complex alphabets as, for example, japanese.

QUESTION: Let’s suppose x is any string. If we try to execute this code:

x[0]

we will get:

  1. always a character

  2. always an error

  3. sometimes a character, sometimes an error according to the string

Show answer

QUESTION: Let’s suppose x is an empty string. If we try to execute this code:

x[len(x)]

we will get:

  1. always a character

  2. always an error

  3. sometimes a character, sometimes an error according to the string at hand

Show answer

Exercise - alternate

Given two strings both of length 3, print a string which alternates characters from both strings. You code must work with any string of this length

Example - given:

x="say"
y="hi!"

it should print:

shaiy!
Show solution
[10]:
# write here


shaiy!

Negative indexes

In Python we can also use negative indexes, which instead to start from the beginning they start from the end:

[11]:
#4321
"park"[-1]
[11]:
'k'
[12]:
#4321
"park"[-2]
[12]:
'r'
[13]:
#4321
"park"[-3]
[13]:
'a'
[14]:
#4321
"park"[-4]
[14]:
'p'

If we go one step beyond, we get an error:

#4321
"park"[-5]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-126-668d8a13a324> in <module>
----> 1 "park"[-5]

IndexError: string index out of range

QUESTION: Suppose x is a NON-empty string. What do we get with the following expression?

x[-len(x)]
  1. always a character

  2. always an error

  3. sometimes a character, sometime an error according to the string

Show answer

QUESTION: Suppose x is a some string (possibly empty), the expressions

x[len(x) - 1]

and

x[-len(x) - 1]

are equivalent ? What do they do ?

Show answer

QUESTION: If x is a non-empty string, what does the following expression produce? Can we simplify it to a shorter one?

(x + x)[len(x)]
Show answer

QUESTION: If x is a non-empty string, what does the following expression produce? An error? Something else? Can we simplify it?

'park'[0][0]
Show answer

QUESTION: If x is a non-empty string, what does the following expression produce? An error? Something else? Can we simplify it?

(x[0])[0]
Show answer

Substitute characters

We said strings in Python are immutable. Suppose we have a string like this:

[15]:
    #01234
x = 'port'

and, for example, we want to change the character at position 2 (in this case, the r) into an s. What do we do?

We might be tempted to write like the following, but Python would punish us with an error:

x[2] = 's'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-113-e5847c6fa4bf> in <module>
----> 1 x[2] = 's'

TypeError: 'str' object does not support item assignment

The correct solution is assigning a completely new string to x, obtained by taking pieces from the previous one:

[16]:
x = x[0] + x[1] + 's' + x[3]
[17]:
x
[17]:
'post'

If seeing x to the right of equal sign baffles you, we can decompose the code like this and it will work the same way:

[18]:
x = "port"
y = x
x = y[0] + y[1] + 's' + y[3]

Try it in Python Tutor:

[19]:
x = "port"
y = x
x = y[0] + y[1] + 's' + y[3]

jupman.pytut()
[19]:
Python Tutor visualization

Slices

We might want to read only a subsequence which starts from a position and ends up in another one. For example, suppose we have:

[20]:
    #0123456789
x = 'mercantile'

and we want to extract the string 'canti', which starts at index 3 included. We might extract the single characters and concatenate them with + sign, but we would write a lot of code. A better option is to use the so-called slices: simply write the string followed by square parenthesis containing only start index (included), a colon, and finally end index (excluded):

[21]:
    #0123456789
x = 'mercantile'

x[3:8]   # note the : inside start and end indexes
[21]:
'canti'

WARNING: Extracting with slices DOES NOT modify the original string !!

Let’s see an example:

[22]:
    #0123456789
x = 'mercantile'

print('               x is', x)
print('The slice x[3:8] is', x[3:8])
print('               x is', x)       # note x continues to point to old string!
               x is mercantile
The slice x[3:8] is canti
               x is mercantile

QUESTION: if x is any string of length at least 5, what does this code produce? An error? It works? Can we shorten it?

x[3:4]
Show answer

Exercise - garalampog

Write some code to extract and print alam from the string "garalampog". Try guessing the correct indexes.

Show solution
[23]:
x = "garalampog"

# write here


alam

Exercise - ifEweEfav lkSD lkWe

Write some code to extract and print kD from the string "ifE\te\nfav  lkD lkWe". Be careful of spaces and special characters (before you might want to print x). Try guessing correct indexes.

Show solution
[24]:
x = "ifE\te\nfav  lkD lkWe"

# write here


kD

Slices - limits

Whenever we use slice we must be careful with index limits. Let’s see how they behave:

[25]:
#012345
"chair"[0:3]  # from index 0 *included* to 3 *excluded*
[25]:
'cha'
[26]:
#012345
"chair"[0:4]  # from index 0 *included* to 4 *excluded*
[26]:
'chai'
[27]:
#012345
"chair"[0:5]  # from index 0 *included* to 5 *excluded*
[27]:
'chair'
[28]:
#012345
"sedia"[0:6]   # if we go beyond string length Python doesn't complain
[28]:
'sedia'

QUESTION: if x is any string (also empty), what does this expression do? Can it give an error? Does it return something useful?

x[0:len(x)]
Show answer

Slice - Omitting limits

If we want, it’s possible to omit the starting index, in this case Python will suppose it’s a 0:

[29]:
#0123456789
"catamaran"[:3]
[29]:
'cat'

It’s also possible to omit the ending index, in that case Python will extract until the end of the string:

[30]:
#0123456789
"catamaran"[3:]
[30]:
'amaran'

By omitting both indexes we obtain the full string:

[31]:
"catamaran"[:]
[31]:
'catamaran'

Exercise - ysterymyster

Write some code that given a string x prints the string composed with all the characters of x except the first one, followed by all characters of x except the last one.

  • your code must work with any string

Example 1 - given:

x = "mystery"

must print:

ysterymyster

Example 2 - given:

x = "rope"

must print:

operop

Show solution
[32]:

x = "mystery" #x = "rope" # write here

Slice - negative limits

If we want, it’s also possible to set negative limits, although it’s not always intuitive:

[33]:
#0123456

"vegetal"[3:0]   # from index 3 to positive indexes <= 3 doesn't produce anything
[33]:
''
[34]:
#0123456
"vegetal"[3:1]   # from index 3 to positive indexes <= 3 doesn't produce anything
[34]:
''
[35]:
#0123456
"vegetal"[3:2]  # from index 3 to positive indexes <= 3 doesn't produce anything
[35]:
''
[36]:
#0123456
"vegetal"[3:3]  # from index 3 to positive indexes <= 3 doesn't produce anything
[36]:
''

Let’s see what happens with negative indexes:

[37]:
#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-1]
[37]:
'eta'
[38]:
#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-2]
[38]:
'et'
[39]:
#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-3]
[39]:
'e'
[40]:
#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-4]
[40]:
''
[41]:
#0123456   positive indexes
#7654321   negative indexes
"vegetal"[3:-5]
[41]:
''

Exercise - javarnanda

Given a string x, write some code to extract and print its last 3 characters joined to the to first 3.

  • Your code should work for any string of length equal or greater than 3

Example 1 - given:

x = "javarnanda"

it should print:

javnda

Example 2 - given:

x = "bang"

it should print:

banang
Show solution
[42]:
x = "javarnanda"
#x = "bang"

# write here


javnda

Slice - modifying

Suppose to have the string

[43]:
    #0123456789
s = "the table is placed in the center of the room"

and we want to change s assignment so it becomes associated to the string:

#0123456789
"the chair is placed in the center of the room"

Since both strings are similar, we might be tempted to only redefine the character sequence which corresponds to the word "table", which goes from index 4 included to index 9 excluded:

s[4:9] = "chair"   # WARNING! WRONG!

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-0de7363c6882> in <module>
----> 1 s[4:9] = "chair"   # WARNING! WRONG!

TypeError: 'str' object does not support item assignment

Sadly, we would receive an error, because as repeated many times strings are IMMUTABLE, so we cannot select a chunk of a particular string and try to change the original string. What we can do instead is to build a NEW string from pieces of the original string, concatenates the desired characters and associates the result to the variabile of which we want to modify the assignment:

[44]:
    #0123456789
s = "the table is placed in the center of the room"
s = s[0:4] + "chair" + s[9:]
print(s)
the chair is placed in the center of the room

When Python finds the line

s = s[0:4] + "chair" + s[9:]

FIRST it calculates the result on the right of the =, and THEN associates the result to the variable on the left. In the expression on the right only NEW strings are generated, which once built can be assigned to variable s

Exercise - the run

Write some code such that when given the string s

s = 'The Gold Rush has begun.'

and some variables

what = 'Atom'
happened = 'is over'

substitues the substring 'Gold' with the string in the variable what and substitues the substring 'has begun' with the string in the variable happened.

After exectuing your code, the string associated to s should be

>>> print(s)
"The Atom Rush is over."
  • DON’T use constant characters in your code, i.e. dots '.' aren’t allowed !

Show solution
[45]:
    #01234567890123456789012345678
s = 'The Gold Rush has begun.'
what = 'Atom'
happened = 'is over'

# write here


The Atom Rush is over.

Inclusion operator

To check if a string is included in another one, we use the the in operator.

Note the result of this expression is a boolean:

[46]:
'the' in 'Singing in the rain'
[46]:
True
[47]:
'si' in 'Singing in the rain'  # in operator is case-sensitive
[47]:
False
[48]:
'Si' in 'Singing in the rain'
[48]:
True

Do not abuse in

WARNING: in is often used in a wrong / inefficient way

Always ask yourself:

  1. Could the string not contain the substring we’re looking for? Always remember to handle also this case!

  2. in performs a search on all the string, which might be inefficient: is it really necessary, or do we already know the interval where to search?

  3. if we want to know whether character is in a position we know a priori (i.e. 3), in is not needed, it’s enough to write my_string[3] == character. By using in Python might find duplicated characters which are before or after the one we want to verify!

Exercise - contained 1

You are given two strings x and y, and a third z. Write some code which prints True if x and y are both contained in z.

Example 1 - given:

x = 'cad'
y = 'ra'
z = 'abracadabra'

it should print:

True

Example 2 - given:

x = 'zam'
y = 'ra'
z = 'abracadabra'

it should print:

False
Show solution
[49]:

x,y,z = 'cad','ra','abracadabra' # True #x,y,z = 'zam','ra','abracadabra' # False # write here

Exercise - contained 2

Given three strings x, y, z, write some code which prints True if the string x is contained in at least one of the strings y or z, otherwise prints False

  • your code should work with any set of strings

Example 1 - given:

x = "ope"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"

it should print:

True

Example 2 - given:

x = "nope"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"

it should print:

False

Example 3 - given:

x = "cle"
y = "honesty makes for long friendships"
z = "I hope it's clear enough"

it should show:

True

Show solution
[50]:

x,y,z = "ope","honesty makes for long friendships","I hope it's clear enough" # True #x,y,z = "nope","honesty makes for long friendships","I hope it's clear enough" # False #x,y,z = "cle","honesty makes for long friendships","I hope it's clear enough" # True # write here

Comparisons

Python offers us the possibility to perform a lexicographic comparison among strings, like we would when placing names in an address book. Although sorting names is something intuitive we often do, we must be careful about special cases.

First, let’s determine when two strings are equal.

Equality operators

To check whether two strings are equal, you can use te operator == which as result produces the boolean True or False

WARNING: == is written with TWO equal signs !!!

[51]:
"dog" == "dog"
[51]:
True
[52]:
"dog" == "wolf"
[52]:
False

Equality operator is case-sensitive:

[53]:
"dog" == "DOG"
[53]:
False

To check whether two strings are NOT equal, we can use the operator !=, which we can expect to behave exactly as the opposite of ==:

[54]:
"dog" != "dog"
[54]:
False
[55]:
"dog" != "wolf"
[55]:
True
[56]:
"dog" != "DOG"
[56]:
True

As an alternative, we might use the operator not:

[57]:
not "dog" == "dog"
[57]:
False
[58]:
not "wolf" == "dog"
[58]:
True
[59]:
not "dog" == "DOG"
[59]:
True

QUESTION: what does the following code print?

x = "river" == "river"
print(x)
Show answer

QUESTION: for each of the following expressions, try to guess whether it produces True or False

  1. 'hat' != 'Hat'
    
  2. 'hat' == 'HAT'
    
  3. 'choralism'[2:5] == 'contemporary'[7:10]
    
  4. 'AlAbAmA'[4:] == 'aLaBaMa'
    
  5. 'bright'[9:20] == 'dark'[10:15]
    
  6. 'optical'[-1] == 'crystal'[-1]
    
  7. ('hat' != 'jacket') == ('trousers' != 'bow')
    
  8. ('stra' in 'stradivarius') == ('div' in 'digital divide')
    
  9. len('note') in '5436'
    
  10. str(len('note')) in '5436'
    
  11. len('posters') in '5436'
    
  12. str(len('posters')) in '5436'
    

Exercise - statist

Write some code which prints True if a word begins with the same two characters it ends with.

  • Your code should work for any word

Show solution
[60]:

word = 'statist' # True #word = 'baobab' # False #word = 'maxima' # True #word = 'karma' # False # write here

Comparing characters

Characters have an inherent order we can exploit. Let’s see an example:

[61]:
'a' < 'g'
[61]:
True

another one:

[62]:
'm' > 'c'
[62]:
True

They sound reasonable comparisons! But what about this (notice capital 'Z')?

[63]:
'a' < 'Z'
[63]:
False

Maybe this doesn’t look so obvious. And what if we get creative and compare with symbols such as square bracket or Unicode hearts ??

[64]:
'a' > '♥'
[64]:
False

To determine how to deal with this special cases, we must remember ASCII assignes a position number to each character, defining as a matter of fact an ordering between all its characters.

If we want to know the corresponding number of a character, we can use the function ord:

[65]:
ord('a')
[65]:
97
[66]:
ord('b')
[66]:
98
[67]:
ord('z')
[67]:
122

If we want to go the other way, given a position number we can obtain the corresponding character with chr function:

[68]:
chr(97)
[68]:
'a'

Uppercase characters have different positions:

[69]:
ord('A')
[69]:
65
[70]:
ord('Z')
[70]:
90

EXERCISE: Using the functions above, try to find which characters are between capital Z and lowercase a

Show solution
[71]:

# write here

The ordering allows us to perform lexicographic comparisons between single characters:

[72]:
'a' < 'b'
[72]:
True
[73]:
'g' >= 'm'
[73]:
False

EXERCISE: Write some code that:

  1. prints the ord values of 'A', 'Z' and a given char

  2. prints True if char is uppercase, and False otherwise

  • Would your code also work with accented capitalized characters such as 'Á'?

  • NOTE: the possibile character sets are way too many, so the proper solution would be to use the method isupper we will see in the next tutorial.

Show solution
[74]:

char = 'G' # True #char = 'g' # False #char = 'Á' # True ?? Note the accent! # write here

Also, since Unicode character set includes ASCII, the ordering of ASCII characters can be used to safely compare them against unicode characters, so comparing characters or their ord should be always equivalent:

[75]:
ord('a')   # ascii
[75]:
97
[76]:
ord('♥')   # unicode
[76]:
9829
[77]:
'a' > '♥'
[77]:
False
[78]:
ord('a') > ord('♥')
[78]:
False

Python also offers lexicographic comparisons on strings with more than one character. To understand what the expected result should be, we must distinguish among several cases, though:

  • strings of equal / different length

  • strings with same / mixed case

Let’s begin with same length strings:

[79]:
'mario' > 'luigi'
[79]:
True
[80]:
'mario' > 'wario'
[80]:
False
[81]:
'Mario' > 'Wario'
[81]:
False
[82]:
'Wario' < 'mario'    # capital case is *before* lowercase in ASCII
[82]:
True

Comparing different lengths

Short strings which are included in longer ones come first in the ordering:

[83]:
'troll' < 'trolley'
[83]:
True

If they only share a prefix with a longer string, Python compares characters after the common prefix, in this case it detects that e precedes the corresponding s:

[84]:
'trolley' < 'trolls'
[84]:
True

Exercise - Character intervals

You are given a couple of strings i1 and i2 of two characters each.

We suppose they represent character intervals: the first character of an interval always has order number lower or equal than the second.

There are five possibilities: either the first interval ‘is contained in’, or ‘contains’, or ‘overlaps’, or ‘is before’ or ‘is after’ the second interval. Write some code which tells which containment relation we have.

Example 1 - given:

i1 = 'gm'
i2 = 'cp'

Your program should print:

gm is contained in cp

To see why, you can look at this little representation (you don’t need to print this!):

  c   g     m  p
abcdefghijklmnopqrstuvwxyz

Example 2 - given:

i1 = 'mr'
i2 = 'pt'

Your program should print:

mr overlaps pt

because mr is not contained nor contains nor completely precedes nor completely follows pt (you don’t need to print this!):

            m  p r t
abcdefghijklmnopqrstuvwxyz
  • if i1 interval coincides with i2, it is consideraded as containing i2

  • DO NOT use cycles nor if

  • HINT: to satisfy above constraint, think about booleans evaluation order, for example the expression

'g' >= 'c' and 'm' <= 'p' and 'is contained in'

produces as result the string 'is contained in'

Show solution
[85]:

i1,i2 = 'gm','cp' # gm is contained in cp #i1,i2 = 'dh','dh' # gm is contained in cp #(special case) #i1,i2 = 'bw','dq' # bw contains dq #i1,i2 = 'ac','bd' # ac overlaps bd #i1,i2 = 'mr','pt' # mr overlaps pt #i1,i2 = 'fm','su' # fm is before su #i1,i2 = 'xz','pq' # xz is after pq # write here

Exercise - The Library of Encodicus

In the study room of the algorithmist Encodicus there is a bookshelf divided in 26 alphabetically ordered sections, where he scrupulously keeps his precious alchemical texts. Every section can contain at most 9 books. One day, Encodicus decides to acquire a new tome for his collection: write some code which given a string representing bookshelf with the counts of the books and a new book, finds the right position of the book and updates bookshelf accordingly

  • assume no section contains 9 books

  • assume book names are always lowercase

  • DO NOT use cycles, if, nor string methods

  • DO NOT manually write strings with 26 characters, or even worse create 26 variables

  • USE ord to find the section position

Example - given:

scaffale = "|a 7|b 5|c 5|d 8|e 2|f 0|g 4|h 8|i 7|j 1|k 6|l 0|m 5|n 0|o 3|p 7|q 2|r 2|s 4|t 6|u 1|v 3|w 3|x 5|y 7|z 6|"
libro = "cycling in the wild"

after your code bookshelf must result updated with |c 6|:

>>> print(bookshelf)
|a 7|b 5|c 6|d 8|e 2|f 0|g 4|h 8|i 7|j 1|k 6|l 0|m 5|n 0|o 3|p 7|q 2|r 2|s 4|t 6|u 1|v 3|w 3|x 5|y 7|z 6|
Show solution
[86]:

book = "cycling in the wild" #book = "algorithms of the occult" #book = "theory of the zippo" #book = "zoology of the software developer" bookshelf = "|a 7|b 5|c 5|d 8|e 2|f 0|g 4|h 8|i 7|j 1|k 6|l 0|m 5|n 0|o 3|p 7|q 2|r 2|s 4|t 6|u 1|v 3|w 3|x 5|y 7|z 6|" # write here

Replication operator

With the operator * you can replicate a string n times, for example:

[87]:
'beer' * 4
[87]:
'beerbeerbeerbeer'

Note a NEW string is created, without tarnishing the original:

[88]:
drink = "beer"
[89]:
print(drink * 4)
beerbeerbeerbeer
[90]:
drink
[90]:
'beer'

Exercise - za za za

Given a syllable and a phrase which terminates with a character n as a digit, write some code which prints a string with the syllable repeated n times, separated by spaces.

  • Your code must work with any string assigned to syllable and phrase

Example - given:

phrase = 'the number 7'
syllable = 'za'

after you code, ti should print:

za za za za za za za
Show solution
[91]:

phrase = 'the number 7' syllable = 'za' # za za za za za za za #phrase = 'Give me 5' # za za za za za # write here

Continue

Go on reading notebook Strings 3 - basic methods

[ ]: