Strings 4 - search methods
Download exercises zip
Strings provide methods to search and trasform them into new strings, but beware: the power is nothing without control! Sometimes you will feel the need to use them, and they might even work with some small example, but often they hide traps you will regret falling into. So whenever you write code with one of these methods, always ask yourself the questions we will stress.
WARNING: ALL string methods ALWAYS generate a NEW string
The original string object is NEVER changed (strings are immutable).
| Method | Result | Meaning | 
|---|---|---|
| 
 | Remove strings from the sides | |
| 
 | Remove strings from left side | |
| 
 | Remove strings from right side | |
| 
 | Count the number of occurrences of a substring | |
| 
 | Return the first position of a substring starting from the left | |
| 
 | Return the first position of a substring starting from the right | |
| 
 | Substitute substrings | 
Note: the list is not exhaustive, here we report only the ones we use in the book. For the full list see Python documentation
What to do
- Unzip exercises zip in a folder, you should obtain something like this: 
strings
    strings1.ipynb
    strings1-sol.ipynb
    strings2.ipynb
    strings2-sol.ipynb
    strings3.ipynb
    strings3-sol.ipynb
    strings4.ipynb
    strings4-sol.ipynb
    strings5-chal.ipynb
    jupman.py
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
- open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook - strings3.ipynb
- Go on reading the exercises file, sometimes you will find paragraphs marked Exercises which will ask to write Python commands in the following cells. 
Shortcut keys:
- to execute Python code inside a Jupyter cell, press - Control + Enter
- to execute Python code inside a Jupyter cell AND select next cell, press - Shift + Enter
- to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press - Alt + Enter
- If the notebooks look stuck, try to select - Kernel -> Restart
strip method
Eliminates white spaces, tabs and linefeeds from the sides of the string. In general, this set of characters is called blanks.
NOTE: it does NOT removes blanks inside string words! It only looks on the sides.
[2]:
x = ' \t\n\n\t carpe diem \t  '   # we put white space, tab and line feeds at the sides
[3]:
x
[3]:
' \t\n\n\t carpe diem \t  '
[4]:
print(x)
         carpe diem
[5]:
len(x)   # remember that special characters like \t and \n occupy 1 character
[5]:
20
[6]:
y = x.strip()
[7]:
y
[7]:
'carpe diem'
[8]:
print(y)
carpe diem
[9]:
len(y)
[9]:
10
[10]:
x      # IMPORTANT: x is still associated to the old string !
[10]:
' \t\n\n\t carpe diem \t  '
Specificying character to strip
If you only want Python to remove some specific character, you can specify them in parenthesis. Let’s try to specify only one:
[11]:
'salsa'.strip('s')    #  note internal `s` is not stripped
[11]:
'alsa'
If we specify two or more, Python removes all the characters it can find from the sides
Note the order in which you specify the characters does not matter:
[12]:
'caustic'.strip('aci')
[12]:
'ust'
WARNING: If you specify characters, Python doesn’t try anymore to remove blanks!
[13]:
'bouquet  '.strip('b')    # it won't strip right spaces !
[13]:
'ouquet  '
[14]:
'\tbouquet  '.strip('b')    # ... nor strip left blanks such as tab
[14]:
'\tbouquet  '
According to the same principle, if you specify a space ' ', then Python will only remove spaces and won’t look for other blanks!!
[15]:
'  careful! \t'.strip(' ')   # strips only on the left!
[15]:
'careful! \t'
QUESTION: for each of the following expressions, try to guess which result it produces (or if it gives an error):
- '\ttumultuous\n'.strip() 
- ' a b c '.strip() 
- '\ta\tb\t'.strip() 
- '\\tMmm'.strip() 
- 'sky diving'.strip('sky') 
- 'anacondas'.strip('sad') 
- '\nno way '.strip(' ') 
- '\nno way '.strip('\\n') 
- '\nno way '.strip('\n') 
- 'salsa'.strip('as') 
- '\t ACE '.strip('\t') 
- ' so what? '.strip("") 
- str(-3+1).strip("+"+"-") 
Exercise - Biblio bank
Your dream just become true: you were hired by the Cyber-Library! Since first enrolling to the Lunar Gymnasiuz in 2365 you’ve been dreaming of keeping and conveying the human knowledge collected through the centuries. You will have to check the work of an AI which reads ands transcribes an interesting chronicle named White Pages 2021.
The Pages have lists of numbers in this format:
Name Surname Prefix-Suffix
Alas, the machine is buggy and in each row inserts some blank characters (spaces, control characters like \t and \n, …)
- sometimes it warms the mobile printhead, causing the reading of numerous blank before the test 
- sometimes the AI is so impressed by the content it forgets to turn off the reading, adding some blank at the end 
Instead, it should produce a string with an initial dash and a final dot:
- Name Surname Prefix-Suffix.
Write some code to fix the bungled AI work.
Show solution[16]:
row = '      \t   \n  Mario Rossi 0323-454345 \t \t   '  # - Mario Rossi 0323-454345.
#row = '    Ernesto Spadafesso 0323-454345  \n'          # - Ernesto Spadafesso 0323-454345.
#row = '      Gianantonia Marcolina Carla Napoleone 0323-454345 \t'
#row = '\nChiara Ermellino 0323-454345  \n \n'
#row = '  \tGiada Pietraverde 0323-454345\n\t'
# write here
lstrip method
Eliminates white spaces, tab and line feeds from left side of the string.
NOTE: does NOT remove blanks between words of the string! Only those on left side.
[17]:
x = '\n \t the street \t '
[18]:
x
[18]:
'\n \t the street \t '
[19]:
len(x)
[19]:
17
[20]:
y = x.lstrip()
[21]:
y
[21]:
'the street \t '
[22]:
len(y)
[22]:
13
[23]:
x       # IMPORTANT: x is still associated to the old string !
[23]:
'\n \t the street \t '
rstrip method
Eliminates white spaces, tab and line feeds from left side of the string.
NOTE: does NOT remove blanks between words of the string! Only those on right side.
[24]:
x = '\n \t the lighthouse \t '
[25]:
x
[25]:
'\n \t the lighthouse \t '
[26]:
len(x)
[26]:
21
[27]:
y = x.rstrip()
[28]:
y
[28]:
'\n \t the lighthouse'
[29]:
len(y)
[29]:
18
[30]:
x       # IMPORTANT: x is still associated to the old string !
[30]:
'\n \t the lighthouse \t '
Exercise - Bad to the bone
You have an uppercase string s which contains at the sides some stuff you want to remove: punctuation , a lowercase char and some blanks. Write some code to perform the removal
Example - given:
char = 'b'
punctuation = '!?.;,'
s = ' \t\n...bbbbbBAD TO THE BONE\n!'
your code should show:
'BAD TO THE BONE'
- use only - strip(or- lstripand- rstrip) methods (if necessary, you can do repeated calls)
[31]:
char = 'b'
punctuation = '!?.;,'
s = ' \t\n...bbbbbBAD TO THE BONE\n!'
# write here
[31]:
'BAD TO THE BONE'
count method
The method count takes a substring and counts how many occurrences are there in the string before the dot.
[32]:
"astral stars".count('a')
[32]:
3
[33]:
"astral stars".count('A')    # it's case sensitive
[33]:
0
[34]:
"astral stars".count('st')
[34]:
2
Optionally, you can pass two other parameters to indicate an index to start counting from (included) and where to end (excluded):
[35]:
#012345678901
"astral stars".count('a',4)
[35]:
2
[36]:
#012345678901
"astral stars".count('a',4,9)
[36]:
1
Do not abuse count
WARNING: count is often used in a wrong / inefficient ways
Always ask yourself:
- Could the string contain duplicates? Remember they will get counted! 
- Could the string contain no duplicate? Remember to also handle this case! 
- countperforms a search on all the string, which could be inefficient: is it really needed, or do we already know the interval where to search?
Exercise - astro money
During 2020 lockdown, while looking at the stars above you started feeling… waves. After some thinking, you decided THEY wanted to communicate with you so you you set up a dish antenna on your roof to receive messages from aliens. After months of apparent irrelevant noise, one day you finally receive a message you’re able to translate. Aliens are obviously trying to tell you the winning numbers of lottery!
A message is a sequence of exactly 3 different character repetitions, the number of characters in each repetition is a number you will try at the lottery. You frantically start developing the translator to show these lucky numbers on the terminal.
Example - given:
s = '$$$$€€€€€!!'
it should print:
$ € !
4 5 2
- IMPORTANT: you can assume all sequences have *different* characters 
- DO NOT use cycles nor comprehensions 
- for simplicity assume each character sequence has at most 9 repetitions 
[37]:
    #01234567890      # $ € !
s = '$$$$€€€€€!!'     # 4 5 2
                      # I M Q
#s = 'IIIMMMMMMQQQ'   # 3 6 3
                      # H A L
#s = 'HAL'            # 1 1 1
# write here
$ € !
4 5 2
find method
find returns the index of the first occurrence of some given substring:
[38]:
#0123456789012345
'bingo bongo bong'.find('ong')
[38]:
7
If no occurrence is found, it returns -1:
[39]:
#0123456789012345
'bingo bongo bong'.find('bang')
[39]:
-1
[40]:
#0123456789012345
'bingo bongo bong'.find('Bong')    #  case-sensitive
[40]:
-1
Optionally, you can specify an index from where to start searching (included):
[41]:
#0123456789012345
'bingo bongo bong'.find('ong',10)
[41]:
13
And also where to end (excluded):
[42]:
#0123456789012345
'bingo bongo bong'.find('g',4, 9)
[42]:
-1
rfind method
Like find method, but search starts from the right.
Do not abuse find
WARNING: find is often used in a wrong / inefficient ways
Always ask yourself:
- Could the string contain duplicates? Remember only the first will be found! 
- Could the string not contain the search substring? Remember to also handle this case! 
- findperforms a search on all the string, which could be inefficient: is it really needed, or do we already know the interval where to search?
- If we want to know if a - characteris in a position we already know,- findis useless: it’s enough to write- my_string[3] == character. If you used- find, it could discover duplicate characters which are before or after the one we are interested in!
Exercise - The port of Monkey Island
Monkey Island has a port with 4 piers where ships coming from all the archipelago are docked. The docking point is never precise, and there could arbitrary spaces between the pier borders. The could also be duplicated ships.
- Suppose each pier can only contain one ship, and we want to write some code which shows - Trueif- "The Jolly Rasta"is docked to the pier- 2, or- Falseotherwise.
Have a look at the following ports, and for each one of them try to guess whether or not the following code lines produce correct results. Try then writing some code which doesn’t have the problems you will encounter.
- DO NOT use - ifinstructions, loops nor comprehensions
- DO NOT use lists (so no split) 
[43]:
width = 21  # width of a pier,  INCLUDED the right `|`
pier = 2
# piers    :  1                    2                    3                    4
port  =      "The Mad Monkey      |  The Jolly Rasta   |  The Sea Cucumber  |LeChuck's Ghost Ship|"
#port =      "  The Mad Monkey    |                    | The Sea Cucumber   |LeChuck's Ghost Ship|"
#port =      "    The Mad Monkey  |The Jolly Rasta     |   The Sea Cucumber |                    |"
#port =      "The Jolly Rasta     |                    |    The Sea Cucumber|LeChuck's Ghost Ship|"
#port =      "                    | The Mad Monkey     |   The Jolly Rasta  |LeChuck's Ghost Ship|"
#port =      "    The Jolly Rasta |                    | The Jolly Rasta    |   The Jolly Rasta  |"
print('Is Jolly Rasta docked to pier', pier, '?')
print()
print(port)
print()
print('                     in:', 'The Jolly Rasta' in port)
print()
print('     find on everything:', port.find('The Jolly Rasta') != -1)
print()
print(' find since second pier:', port.find('The Jolly Rasta', width*(pier-1)) != -1)
# write here
Is Jolly Rasta docked to pier 2 ?
The Mad Monkey      |  The Jolly Rasta   |  The Sea Cucumber  |LeChuck's Ghost Ship|
                     in: True
     find on everything: True
 find since second pier: True
               Solution: True
- Suppose now every pier can dock more then one ship, even with the same name. Write some code which shows - Trueif only one Grog Ship is docked to the second pier,- Falseotherwise
[44]:
width = 21  # width of a pier,  INCLUDED the right `|`
pier = 2
# piers    :  1                    2                    3                    4
port =       "The Mad Monkey      |The Jolly Rasta     |  The Sea Cucumber  |LeChuck's Ghost Ship|"
#port =      "The Mad Monkey      | Grog Ship Grog Ship| The Jolly Rasta    |   The Sea Cucumber "
#port =      "   The Jolly Rasta  |   Grog Ship        | The Jolly Rasta    |   The Jolly Rasta  "
#port =      "   Grog Ship        |   Grog Ship        |LeChuck's Ghost Ship|    Grog Ship       "
#port =      "LeChuck's Ghost Ship|                    |   Grog Ship        |   The Jolly Rasta  "
#port =      "The Jolly Rasta     | Grog Ship Grog Ship|       Grog Ship    |   The Jolly Rasta  "
print()
print('Is only one Grog Ship docked to pier', pier, '?')
print()
# write here
Is only one Grog Ship docked to pier 2 ?
Solution Grog Ship: False
Exercise - bananas
While exploring a remote tropical region, an ethologist discovers a population of monkeys which appear to have some concept of numbers. They collect bananas in the hundreds which are then traded with coconuts collected by another group. To comunicate the quantities of up to 999 bananas, they use a series of exactly three guttural sounds. The ethologist writes down the sequencies and formulates the following theory: each sound is comprised by a sequence of the same character, repeated a number of times. The number of characters in the first sequence is the first digit (the hundreds), the number of characters in the second sequence is the second digit (the decines), while the last sequence represents units.
Write some code which puts in variable bananas an integer representing the number.
For example - given:
s = 'bb bbbbb aaaa'
your code should print:
>>> bananas
254
>>> type(bananas)
int
- IMPORTANT 1: different sequences may use the *same* character! 
- IMPORTANT 2: you cannot assume which characters monkeys will use: you just know each digit is represented by a repetition of the same character 
- DO NOT use cycles nor comprehensions 
- the monkeys have no concept of zero 
[45]:
    #0123456789012
s = 'bb bbbbb aaaa'     # 254
#s = 'ccc cc ccc'       # 323
#s = 'vvv rrrr ww'      # 342
#s = 'cccc h jjj'       # 413
#s = '🌳🌳🌳 🍌🍌🍌🍌🍌🍌 🐵🐵🐵🐵'  # 364  (you could get *any* weird character, also unicode ...)
# write here
replace method
str.replace takes two strings and looks in the string on which the method is called for occurrences of the first string parameter, which are substituted with the second parameter. Note it gives back a NEW string with all substitutions performed.
Example:
[46]:
"the train runs off the tracks".replace('tra', 'ra')
[46]:
'the rain runs off the racks'
[47]:
"little beetle".replace('tle', '')
[47]:
'lit bee'
[48]:
"talking and joking".replace('ING', 'ed')  # it's case sensitive
[48]:
'talking and joking'
[49]:
"TALKING AND JOKING".replace('ING', 'ED')  # here they are
[49]:
'TALKED AND JOKED'
As always with strings, replace DOES NOT modify the string on which it is called:
[50]:
x = "On the bench"
[51]:
y = x.replace('bench', 'bench the goat is alive')
[52]:
y
[52]:
'On the bench the goat is alive'
[53]:
x  # IMPORTANT: x is still associated to the old string !
[53]:
'On the bench'
If you give an optional third argument count, only the first count occurrences will be replaced:
[54]:
"TALKING AND JOKING AND LAUGHING".replace('ING', 'ED', 2)  # replaces only first 2 occurrences
[54]:
'TALKED AND JOKED AND LAUGHING'
QUESTION: for each of the following expressions, try to guess which result it produces (or if it gives an error)
- '$£eat the rich£$'.replace('£','').replace('$','') 
- '$£eat the rich£$'.strip('£').strip('$') 
Do not abuse replace
WARNING: replace is often used in a wrong / inefficient ways
Always ask yourself:
- Could the string contain duplicates? Remember they will all get substituted! 
- replaceperforms a search on the whole string, which could be inefficient: is it really needed, or do we already know the interval where the text to substitute is?
Exercise - Do not open that door
QUESTION You have a library of books, with labels like C-The godfather, R-Pride and prejudice o 'H-Do not open that door' composed by a character which identifies the type (C crime, R romance, H horror) followed by a - and the title. Given a book, you want to print the complete label, a colon and then the title, like 'Crime: The godfather'. Look at the following code fragments, and for each try writing labels among the proposed ones or create others
which would give wrong results (if they exists).
book = 'C-The godfather'
book = 'R-Pride and prejudice'
book = 'H-Do not open that door'
- book.replace('C', 'Crime: ').replace('R', 'Romance: ') 
- book[0].replace('C', 'Crime: ') \ .replace('H', 'HORROR: ') \ .replace('R', 'Romance: ') + book[2:] 
- book.replace('C-', 'Crime: ').replace('R-', 'Romance: ') 
- book.replace('C-', 'Crime: ',1).replace('R-', 'Romance: ',1) 
- book[0:2].replace('C-', 'Crime: ').replace('R-', 'Romance: ') + book[2:] 
[55]:
Exercise - The Kingdom of Stringards
Characters Land is ruled with the iron fist by the Dukes of Stringards. The towns managed by them are monodimensional, and can be represented as a string, hosting dukes d, lords s, vassals v and peasants p. To separate the various social circles from improper mingling, some walls |mm|have been erected.
Unfortunately, the Dukes are under siege by the tribe of the hideous Replacerons: with their short-sighted barbarian ways, they are very close to destroy the walls. To defend the town, the Stringards decide to upgrade walls, trasforming them from |mm| to |MM|.
- DO NOT use loops nor list comprehensions 
- DO NOT use lists (so no split) 
Stringards I: upgrading all the walls
Example - given:
town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
after your code, it must result:
>>> town
'ppp|MM|vvvvvv|MM|sss|MM|dd|MM|sssss|MM|vvvvvv|MM|pppppp'
[56]:
town =     'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
# result:  'ppp|MM|vvvvvv|MM|sss|MM|dd|MM|sssss|MM|vvvvvv|MM|pppppp'
# write here
Stringards II: Outer walls
Alas, the paesants don’t work hard enough and there aren’t enough coins to upgrade all the walls: upgrade only the outer walls
- DO NOT use - if, loops nor list comprehensions
- DO NOT use lists (so no split) 
Example - given:
town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
after your code, it must result:
>>> town
'ppp|MM|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|MM|pppppp'
[57]:
town =    'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
#result:  'ppp|MM|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|MM|pppppp'
#town =   '|mm|vvvvvv|mm||mm|ddddd|mm|ssvvv|mm|pp'
#result:  '|MM|vvvvvv|mm||mm|ddddd|mm|ssvvv|MM|pp'
# write here
Stringards III: Power to the People
An even greater threat plagues the Stringards: democracy.
Following the spread of this dark evil, some cities developed right and left factions, which tend to privilege only some parts of the city. If the dominant sentiment in a city is lefty, all the houses to the left of the Duke are privileged with big gold coins, otherwise with righty sentiment houses to the right get more privileged. When a house is privileged, the correponding character is upgraded to capital.
- assume that at least a block with - dis always present, and it is unique
- DO NOT use - if, loops nor list comprehensions
- DO NOT use lists (so no split) 
3.1) privilege only left houses
town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
after your code, it must result:
>>> town
'PPP|mm|VVVVVV|mm|SSS|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
[58]:
town =    'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
# result: 'PPP|mm|VVVVVV|mm|SSS|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
#town =   '|p|ppp||p|pp|mm|vvv|vvvv|mm|sssss|mm|ddd|mm|ssss|ss|mm|vvvvvv|mm|'
# result: '|P|PPP||P|PP|mm|VVV|VVVV|mm|SSSSS|mm|ddd|mm|ssss|ss|mm|vvvvvv|mm|'
# write here
3.2) privilege only right houses
Example - given:
town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
after your code, it must result:
>>> town
'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|SSSSS|mm|VVVVVV|mm|PPPPPP'
[59]:
town =     'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
#result: 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|SSSSS|mm|VVVVVV|mm|PPPPPP'
#town =    '|p|ppp||p|pp|mm|vvv|vvvv|mm|sssss|mm|ddd|mm|ssss|ss|mm|vvvvvv|p|pp|mm|'
#result: '|p|ppp||p|pp|mm|vvv|vvvv|mm|sssss|mm|ddd|mm|SSSS|SS|mm|VVVVVV|P|PP|mm|'
# write here
Stringards IV: Power struggle
Over time, the Dukes family has expanded and alas ruthless feuds occurred. According to the number of town people to the left/right of the dukes, a corresponding number of royal members to the left/right receives support for playing their power games. A member of the dukes palace who receives support becomes uppercase. Each character 'p', 'v' or 's' contributes support (but not the walls). The royal members who are not reached by support are slaughtered by their siblings, and
substituted with a Latin Cross Unicode ✝
- assume at least a block of - dis always present, and it is unique
- assume that for each left/right house, there is at least a left/right duke 
Example - given:
town = "ppp|mm|vv|mm|v|s|mm|dddddddddddddddddddddddd|mm|ss|mm|vvvvv|mm|pppp";
After your code, it must print:
Members of the royal family:24
                       left:7
                      right:11
After the deadly struggle, the new town is
ppp|mm|vv|mm|v|s|mm|DDDDDDD✝✝✝✝✝✝DDDDDDDDDDD|mm|ss|mm|vvvvv|mm|pppp
[60]:
town =   'ppp|mm|vv|mm|v|s|mm|dddddddddddddddddddddddd|mm|ss|mm|vvvvv|mm|pppp'
#result: 'ppp|mm|vv|mm|v|s|mm|DDDDDDD✝✝✝✝✝✝DDDDDDDDDDD|mm|ss|mm|vvvvv|mm|pppp'  tot:24 sx:7 dx:11
#town =  'ppp|mm|ppp|mm|vv|mm|ss|mm|dddddddddddddddddddd|mm|ss|mm|mm|s|v|mm|p|p|'
#result: 'ppp|mm|ppp|mm|vv|mm|ss|mm|DDDDDDDDDD✝✝✝✝DDDDDD|mm|ss|mm|mm|s|v|mm|p|p|' tot:20 sx:10 dx:6
# write here
Other exercises
QUESTION: For each following expression, try to find the result
- 'gUrP'.lower() == 'GuRp'.lower() 
- 'NaNo'.lower() != 'nAnO'.upper() 
- 'O' + 'ortaggio'.replace('o','\t \n ').strip() + 'O' 
- 'DaDo'.replace('D','b') in 'barbados' 
Continue
Go on reading notebook Strings 5 - first challenges