Strings 4 - search methods

Download exercises zip

Browse files online

Strings provide methods to search and trasform them into new strings, but beware: the power is nothing without control! Sometimes you will feel the need to use them, and they might even work with some small example, but often they hide traps you will regret falling into. So whenever you write code with one of these methods, always ask yourself the questions we will stress.

WARNING: ALL string methods ALWAYS generate a NEW string

The original string object is NEVER changed (strings are immutable).

Method

Result

Meaning

str1.strip(str2)

str

Remove strings from the sides

str1.lstrip(str2)

str

Remove strings from left side

str1.rstrip(str2)

str

Remove strings from right side

str1.count(str2)

int

Count the number of occurrences of a substring

str1.find(str2)

int

Return the first position of a substring starting from the left

str1.rfind(str2)

int

Return the first position of a substring starting from the right

str1.replace(str2, str3)

str

Substitute substrings

Note: the list is not exhaustive, here we report only the ones we use in the book. For the full list see Python documentation

What to do

  1. Unzip exercises zip in a folder, you should obtain something like this:

strings
    strings1.ipynb
    strings1-sol.ipynb
    strings2.ipynb
    strings2-sol.ipynb
    strings3.ipynb
    strings3-sol.ipynb
    strings4.ipynb
    strings4-sol.ipynb
    strings5-chal.ipynb
    jupman.py

WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

  1. open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook strings3.ipynb

  2. Go on reading the exercises file, sometimes you will find paragraphs marked Exercises which will ask to write Python commands in the following cells.

Shortcut keys:

  • to execute Python code inside a Jupyter cell, press Control + Enter

  • to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

  • to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

  • If the notebooks look stuck, try to select Kernel -> Restart

strip method

Eliminates white spaces, tabs and linefeeds from the sides of the string. In general, this set of characters is called blanks.

NOTE: it does NOT removes blanks inside string words! It only looks on the sides.

[2]:
x = ' \t\n\n\t carpe diem \t  '   # we put white space, tab and line feeds at the sides
[3]:
x
[3]:
' \t\n\n\t carpe diem \t  '
[4]:
print(x)


         carpe diem
[5]:
len(x)   # remember that special characters like \t and \n occupy 1 character
[5]:
20
[6]:
y = x.strip()
[7]:
y
[7]:
'carpe diem'
[8]:
print(y)
carpe diem
[9]:
len(y)
[9]:
10
[10]:
x      # IMPORTANT: x is still associated to the old string !
[10]:
' \t\n\n\t carpe diem \t  '

Specificying character to strip

If you only want Python to remove some specific character, you can specify them in parenthesis. Let’s try to specify only one:

[11]:
'salsa'.strip('s')    #  note internal `s` is not stripped
[11]:
'alsa'

If we specify two or more, Python removes all the characters it can find from the sides

Note the order in which you specify the characters does not matter:

[12]:
'caustic'.strip('aci')
[12]:
'ust'

WARNING: If you specify characters, Python doesn’t try anymore to remove blanks!

[13]:
'bouquet  '.strip('b')    # it won't strip right spaces !
[13]:
'ouquet  '
[14]:
'\tbouquet  '.strip('b')    # ... nor strip left blanks such as tab
[14]:
'\tbouquet  '

According to the same principle, if you specify a space ' ', then Python will only remove spaces and won’t look for other blanks!!

[15]:
'  careful! \t'.strip(' ')   # strips only on the left!
[15]:
'careful! \t'

QUESTION: for each of the following expressions, try to guess which result it produces (or if it gives an error):

  1. '\ttumultuous\n'.strip()
    
  2. ' a b c '.strip()
    
  3. '\ta\tb\t'.strip()
    
  4. '\\tMmm'.strip()
    
  5. 'sky diving'.strip('sky')
    
  6. 'anacondas'.strip('sad')
    
  7. '\nno way '.strip(' ')
    
  8. '\nno way '.strip('\\n')
    
  9. '\nno way '.strip('\n')
    
  10. 'salsa'.strip('as')
    
  11. '\t ACE '.strip('\t')
    
  12. ' so what? '.strip("")
    
  13. str(-3+1).strip("+"+"-")
    

Exercise - Biblio bank

Your dream just become true: you were hired by the Cyber-Library! Since first enrolling to the Lunar Gymnasiuz in 2365 you’ve been dreaming of keeping and conveying the human knowledge collected through the centuries. You will have to check the work of an AI which reads ands transcribes an interesting chronicle named White Pages 2021.

The Pages have lists of numbers in this format:

Name Surname Prefix-Suffix

Alas, the machine is buggy and in each row inserts some blank characters (spaces, control characters like \t and \n, …)

  • sometimes it warms the mobile printhead, causing the reading of numerous blank before the test

  • sometimes the AI is so impressed by the content it forgets to turn off the reading, adding some blank at the end

Instead, it should produce a string with an initial dash and a final dot:

- Name Surname Prefix-Suffix.

Write some code to fix the bungled AI work.

Show solution
[16]:


row = ' \t \n Mario Rossi 0323-454345 \t \t ' # - Mario Rossi 0323-454345. #row = ' Ernesto Spadafesso 0323-454345 \n' # - Ernesto Spadafesso 0323-454345. #row = ' Gianantonia Marcolina Carla Napoleone 0323-454345 \t' #row = '\nChiara Ermellino 0323-454345 \n \n' #row = ' \tGiada Pietraverde 0323-454345\n\t' # write here

lstrip method

Eliminates white spaces, tab and line feeds from left side of the string.

NOTE: does NOT remove blanks between words of the string! Only those on left side.

[17]:
x = '\n \t the street \t '
[18]:
x
[18]:
'\n \t the street \t '
[19]:
len(x)
[19]:
17
[20]:
y = x.lstrip()
[21]:
y
[21]:
'the street \t '
[22]:
len(y)
[22]:
13
[23]:
x       # IMPORTANT: x is still associated to the old string !
[23]:
'\n \t the street \t '

rstrip method

Eliminates white spaces, tab and line feeds from left side of the string.

NOTE: does NOT remove blanks between words of the string! Only those on right side.

[24]:
x = '\n \t the lighthouse \t '
[25]:
x
[25]:
'\n \t the lighthouse \t '
[26]:
len(x)
[26]:
21
[27]:
y = x.rstrip()
[28]:
y
[28]:
'\n \t the lighthouse'
[29]:
len(y)
[29]:
18
[30]:
x       # IMPORTANT: x is still associated to the old string !
[30]:
'\n \t the lighthouse \t '

Exercise - Bad to the bone

You have an uppercase string s which contains at the sides some stuff you want to remove: punctuation , a lowercase char and some blanks. Write some code to perform the removal

Example - given:

char = 'b'
punctuation = '!?.;,'
s = ' \t\n...bbbbbBAD TO THE BONE\n!'

your code should show:

'BAD TO THE BONE'
  • use only strip (or lstrip and rstrip) methods (if necessary, you can do repeated calls)

Show solution
[31]:
char = 'b'
punctuation = '!?.;,'
s = ' \t\n...bbbbbBAD TO THE BONE\n!'

# write here


[31]:
'BAD TO THE BONE'

count method

The method count takes a substring and counts how many occurrences are there in the string before the dot.

[32]:
"astral stars".count('a')
[32]:
3
[33]:
"astral stars".count('A')    # it's case sensitive
[33]:
0
[34]:
"astral stars".count('st')
[34]:
2

Optionally, you can pass two other parameters to indicate an index to start counting from (included) and where to end (excluded):

[35]:
#012345678901
"astral stars".count('a',4)
[35]:
2
[36]:
#012345678901
"astral stars".count('a',4,9)
[36]:
1

Do not abuse count

WARNING: count is often used in a wrong / inefficient ways

Always ask yourself:

  1. Could the string contain duplicates? Remember they will get counted!

  2. Could the string contain no duplicate? Remember to also handle this case!

  3. count performs a search on all the string, which could be inefficient: is it really needed, or do we already know the interval where to search?

Exercise - astro money

During 2020 lockdown, while looking at the stars above you started feeling… waves. After some thinking, you decided THEY wanted to communicate with you so you you set up a dish antenna on your roof to receive messages from aliens. After months of apparent irrelevant noise, one day you finally receive a message you’re able to translate. Aliens are obviously trying to tell you the winning numbers of lottery!

A message is a sequence of exactly 3 different character repetitions, the number of characters in each repetition is a number you will try at the lottery. You frantically start developing the translator to show these lucky numbers on the terminal.

Example - given:

s = '$$$$€€€€€!!'

it should print:

$ € !
4 5 2
  • IMPORTANT: you can assume all sequences have *different* characters

  • DO NOT use cycles nor comprehensions

  • for simplicity assume each character sequence has at most 9 repetitions

Show solution
[37]:
    #01234567890      # $ € !
s = '$$$$€€€€€!!'     # 4 5 2

                      # I M Q
#s = 'IIIMMMMMMQQQ'   # 3 6 3

                      # H A L
#s = 'HAL'            # 1 1 1

# write here


$ € !
4 5 2

find method

find returns the index of the first occurrence of some given substring:

[38]:
#0123456789012345
'bingo bongo bong'.find('ong')
[38]:
7

If no occurrence is found, it returns -1:

[39]:
#0123456789012345
'bingo bongo bong'.find('bang')
[39]:
-1
[40]:
#0123456789012345
'bingo bongo bong'.find('Bong')    #  case-sensitive
[40]:
-1

Optionally, you can specify an index from where to start searching (included):

[41]:
#0123456789012345
'bingo bongo bong'.find('ong',10)
[41]:
13

And also where to end (excluded):

[42]:
#0123456789012345
'bingo bongo bong'.find('g',4, 9)
[42]:
-1

rfind method

Like find method, but search starts from the right.

Do not abuse find

WARNING: find is often used in a wrong / inefficient ways

Always ask yourself:

  1. Could the string contain duplicates? Remember only the first will be found!

  2. Could the string not contain the search substring? Remember to also handle this case!

  3. find performs a search on all the string, which could be inefficient: is it really needed, or do we already know the interval where to search?

  4. If we want to know if a character is in a position we already know, find is useless: it’s enough to write my_string[3] == character. If you used find, it could discover duplicate characters which are before or after the one we are interested in!

Exercise - The port of Monkey Island

Monkey Island has a port with 4 piers where ships coming from all the archipelago are docked. The docking point is never precise, and there could arbitrary spaces between the pier borders. The could also be duplicated ships.

  1. Suppose each pier can only contain one ship, and we want to write some code which shows True if "The Jolly Rasta" is docked to the pier 2, or False otherwise.

Have a look at the following ports, and for each one of them try to guess whether or not the following code lines produce correct results. Try then writing some code which doesn’t have the problems you will encounter.

  • DO NOT use if instructions, loops nor comprehensions

  • DO NOT use lists (so no split)

Show solution
[43]:
width = 21  # width of a pier,  INCLUDED the right `|`
pier = 2


# piers    :  1                    2                    3                    4
port  =      "The Mad Monkey      |  The Jolly Rasta   |  The Sea Cucumber  |LeChuck's Ghost Ship|"
#port =      "  The Mad Monkey    |                    | The Sea Cucumber   |LeChuck's Ghost Ship|"
#port =      "    The Mad Monkey  |The Jolly Rasta     |   The Sea Cucumber |                    |"
#port =      "The Jolly Rasta     |                    |    The Sea Cucumber|LeChuck's Ghost Ship|"
#port =      "                    | The Mad Monkey     |   The Jolly Rasta  |LeChuck's Ghost Ship|"
#port =      "    The Jolly Rasta |                    | The Jolly Rasta    |   The Jolly Rasta  |"

print('Is Jolly Rasta docked to pier', pier, '?')
print()
print(port)

print()
print('                     in:', 'The Jolly Rasta' in port)

print()
print('     find on everything:', port.find('The Jolly Rasta') != -1)

print()
print(' find since second pier:', port.find('The Jolly Rasta', width*(pier-1)) != -1)

# write here


Is Jolly Rasta docked to pier 2 ?

The Mad Monkey      |  The Jolly Rasta   |  The Sea Cucumber  |LeChuck's Ghost Ship|

                     in: True

     find on everything: True

 find since second pier: True

               Solution: True
  1. Suppose now every pier can dock more then one ship, even with the same name. Write some code which shows True if only one Grog Ship is docked to the second pier, False otherwise

Show solution
[44]:
width = 21  # width of a pier,  INCLUDED the right `|`
pier = 2

# piers    :  1                    2                    3                    4
port =       "The Mad Monkey      |The Jolly Rasta     |  The Sea Cucumber  |LeChuck's Ghost Ship|"
#port =      "The Mad Monkey      | Grog Ship Grog Ship| The Jolly Rasta    |   The Sea Cucumber "
#port =      "   The Jolly Rasta  |   Grog Ship        | The Jolly Rasta    |   The Jolly Rasta  "
#port =      "   Grog Ship        |   Grog Ship        |LeChuck's Ghost Ship|    Grog Ship       "
#port =      "LeChuck's Ghost Ship|                    |   Grog Ship        |   The Jolly Rasta  "
#port =      "The Jolly Rasta     | Grog Ship Grog Ship|       Grog Ship    |   The Jolly Rasta  "

print()
print('Is only one Grog Ship docked to pier', pier, '?')
print()

# write here



Is only one Grog Ship docked to pier 2 ?

Solution Grog Ship: False

Exercise - bananas

While exploring a remote tropical region, an ethologist discovers a population of monkeys which appear to have some concept of numbers. They collect bananas in the hundreds which are then traded with coconuts collected by another group. To comunicate the quantities of up to 999 bananas, they use a series of exactly three guttural sounds. The ethologist writes down the sequencies and formulates the following theory: each sound is comprised by a sequence of the same character, repeated a number of times. The number of characters in the first sequence is the first digit (the hundreds), the number of characters in the second sequence is the second digit (the decines), while the last sequence represents units.

Write some code which puts in variable bananas an integer representing the number.

For example - given:

s = 'bb bbbbb aaaa'

your code should print:

>>> bananas
254
>>> type(bananas)
int
  • IMPORTANT 1: different sequences may use the *same* character!

  • IMPORTANT 2: you cannot assume which characters monkeys will use: you just know each digit is represented by a repetition of the same character

  • DO NOT use cycles nor comprehensions

  • the monkeys have no concept of zero

Show solution
[45]:

#0123456789012 s = 'bb bbbbb aaaa' # 254 #s = 'ccc cc ccc' # 323 #s = 'vvv rrrr ww' # 342 #s = 'cccc h jjj' # 413 #s = '🌳🌳🌳 🍌🍌🍌🍌🍌🍌 🐵🐵🐵🐵' # 364 (you could get *any* weird character, also unicode ...) # write here

replace method

str.replace takes two strings and looks in the string on which the method is called for occurrences of the first string parameter, which are substituted with the second parameter. Note it gives back a NEW string with all substitutions performed.

Example:

[46]:
"the train runs off the tracks".replace('tra', 'ra')
[46]:
'the rain runs off the racks'
[47]:
"little beetle".replace('tle', '')
[47]:
'lit bee'
[48]:
"talking and joking".replace('ING', 'ed')  # it's case sensitive
[48]:
'talking and joking'
[49]:
"TALKING AND JOKING".replace('ING', 'ED')  # here they are
[49]:
'TALKED AND JOKED'

As always with strings, replace DOES NOT modify the string on which it is called:

[50]:
x = "On the bench"
[51]:
y = x.replace('bench', 'bench the goat is alive')
[52]:
y
[52]:
'On the bench the goat is alive'
[53]:
x  # IMPORTANT: x is still associated to the old string !
[53]:
'On the bench'

If you give an optional third argument count, only the first count occurrences will be replaced:

[54]:
"TALKING AND JOKING AND LAUGHING".replace('ING', 'ED', 2)  # replaces only first 2 occurrences
[54]:
'TALKED AND JOKED AND LAUGHING'

QUESTION: for each of the following expressions, try to guess which result it produces (or if it gives an error)

  1. '$£eat the rich£$'.replace('£','').replace('$','')
    
  2. '$£eat the rich£$'.strip('£').strip('$')
    

Do not abuse replace

WARNING: replace is often used in a wrong / inefficient ways

Always ask yourself:

  1. Could the string contain duplicates? Remember they will all get substituted!

  2. replace performs a search on the whole string, which could be inefficient: is it really needed, or do we already know the interval where the text to substitute is?

Exercise - Do not open that door

QUESTION You have a library of books, with labels like C-The godfather, R-Pride and prejudice o 'H-Do not open that door' composed by a character which identifies the type (C crime, R romance, H horror) followed by a - and the title. Given a book, you want to print the complete label, a colon and then the title, like 'Crime: The godfather'. Look at the following code fragments, and for each try writing labels among the proposed ones or create others which would give wrong results (if they exists).

book = 'C-The godfather'
book = 'R-Pride and prejudice'
book = 'H-Do not open that door'
  1. book.replace('C', 'Crime: ').replace('R', 'Romance: ')
    
  2. book[0].replace('C', 'Crime: ')  \
            .replace('H', 'HORROR: ')  \
            .replace('R', 'Romance: ') + book[2:]
    
  3. book.replace('C-', 'Crime: ').replace('R-', 'Romance: ')
    
  4. book.replace('C-', 'Crime: ',1).replace('R-', 'Romance: ',1)
    
  5. book[0:2].replace('C-', 'Crime: ').replace('R-', 'Romance: ') + book[2:]
    
Show solution
[55]:

Exercise - The Kingdom of Stringards

Characters Land is ruled with the iron fist by the Dukes of Stringards. The towns managed by them are monodimensional, and can be represented as a string, hosting dukes d, lords s, vassals v and peasants p. To separate the various social circles from improper mingling, some walls |mm|have been erected.

Unfortunately, the Dukes are under siege by the tribe of the hideous Replacerons: with their short-sighted barbarian ways, they are very close to destroy the walls. To defend the town, the Stringards decide to upgrade walls, trasforming them from |mm| to |MM|.

  • DO NOT use loops nor list comprehensions

  • DO NOT use lists (so no split)

Stringards I: upgrading all the walls

Example - given:

town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'

after your code, it must result:

>>> town
'ppp|MM|vvvvvv|MM|sss|MM|dd|MM|sssss|MM|vvvvvv|MM|pppppp'
Show solution
[56]:

town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp' # result: 'ppp|MM|vvvvvv|MM|sss|MM|dd|MM|sssss|MM|vvvvvv|MM|pppppp' # write here

Stringards II: Outer walls

Alas, the paesants don’t work hard enough and there aren’t enough coins to upgrade all the walls: upgrade only the outer walls

  • DO NOT use if, loops nor list comprehensions

  • DO NOT use lists (so no split)

Example - given:

town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'

after your code, it must result:

>>> town
'ppp|MM|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|MM|pppppp'
Show solution
[57]:

town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp' #result: 'ppp|MM|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|MM|pppppp' #town = '|mm|vvvvvv|mm||mm|ddddd|mm|ssvvv|mm|pp' #result: '|MM|vvvvvv|mm||mm|ddddd|mm|ssvvv|MM|pp' # write here

Stringards III: Power to the People

An even greater threat plagues the Stringards: democracy.

Following the spread of this dark evil, some cities developed right and left factions, which tend to privilege only some parts of the city. If the dominant sentiment in a city is lefty, all the houses to the left of the Duke are privileged with big gold coins, otherwise with righty sentiment houses to the right get more privileged. When a house is privileged, the correponding character is upgraded to capital.

  • assume that at least a block with d is always present, and it is unique

  • DO NOT use if, loops nor list comprehensions

  • DO NOT use lists (so no split)

3.1) privilege only left houses

town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'

after your code, it must result:

>>> town
'PPP|mm|VVVVVV|mm|SSS|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'
Show solution
[58]:

town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp' # result: 'PPP|mm|VVVVVV|mm|SSS|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp' #town = '|p|ppp||p|pp|mm|vvv|vvvv|mm|sssss|mm|ddd|mm|ssss|ss|mm|vvvvvv|mm|' # result: '|P|PPP||P|PP|mm|VVV|VVVV|mm|SSSSS|mm|ddd|mm|ssss|ss|mm|vvvvvv|mm|' # write here

3.2) privilege only right houses

Example - given:

town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp'

after your code, it must result:

>>> town
'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|SSSSS|mm|VVVVVV|mm|PPPPPP'
Show solution
[59]:

town = 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|sssss|mm|vvvvvv|mm|pppppp' #result: 'ppp|mm|vvvvvv|mm|sss|mm|dd|mm|SSSSS|mm|VVVVVV|mm|PPPPPP' #town = '|p|ppp||p|pp|mm|vvv|vvvv|mm|sssss|mm|ddd|mm|ssss|ss|mm|vvvvvv|p|pp|mm|' #result: '|p|ppp||p|pp|mm|vvv|vvvv|mm|sssss|mm|ddd|mm|SSSS|SS|mm|VVVVVV|P|PP|mm|' # write here

Stringards IV: Power struggle

Over time, the Dukes family has expanded and alas ruthless feuds occurred. According to the number of town people to the left/right of the dukes, a corresponding number of royal members to the left/right receives support for playing their power games. A member of the dukes palace who receives support becomes uppercase. Each character 'p', 'v' or 's' contributes support (but not the walls). The royal members who are not reached by support are slaughtered by their siblings, and substituted with a Latin Cross Unicode

  • assume at least a block of d is always present, and it is unique

  • assume that for each left/right house, there is at least a left/right duke

Example - given:

town = "ppp|mm|vv|mm|v|s|mm|dddddddddddddddddddddddd|mm|ss|mm|vvvvv|mm|pppp";

After your code, it must print:

Members of the royal family:24
                       left:7
                      right:11

After the deadly struggle, the new town is

ppp|mm|vv|mm|v|s|mm|DDDDDDD✝✝✝✝✝✝DDDDDDDDDDD|mm|ss|mm|vvvvv|mm|pppp
Show solution
[60]:

town = 'ppp|mm|vv|mm|v|s|mm|dddddddddddddddddddddddd|mm|ss|mm|vvvvv|mm|pppp' #result: 'ppp|mm|vv|mm|v|s|mm|DDDDDDD✝✝✝✝✝✝DDDDDDDDDDD|mm|ss|mm|vvvvv|mm|pppp' tot:24 sx:7 dx:11 #town = 'ppp|mm|ppp|mm|vv|mm|ss|mm|dddddddddddddddddddd|mm|ss|mm|mm|s|v|mm|p|p|' #result: 'ppp|mm|ppp|mm|vv|mm|ss|mm|DDDDDDDDDD✝✝✝✝DDDDDD|mm|ss|mm|mm|s|v|mm|p|p|' tot:20 sx:10 dx:6 # write here

Other exercises

QUESTION: For each following expression, try to find the result

  1. 'gUrP'.lower() == 'GuRp'.lower()
    
  2. 'NaNo'.lower() != 'nAnO'.upper()
    
  3. 'O' + 'ortaggio'.replace('o','\t \n     ').strip() + 'O'
    
  4. 'DaDo'.replace('D','b') in 'barbados'
    

Continue

Go on reading notebook Strings 5 - first challenges