Strings 1 - introduction

Download exercises zip

Browse files online

Strings are immutable character sequences, and one of the basic Python types. In this notebook we will see how to manipulate them.

What to do

  1. Unzip exercises zip in a folder, you should obtain something like this:

strings
    strings1.ipynb
    strings1-sol.ipynb
    strings2.ipynb
    strings2-sol.ipynb
    strings3.ipynb
    strings3-sol.ipynb
    strings4.ipynb
    strings4-sol.ipynb
    strings5-chal.ipynb
    jupman.py

WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !

  1. open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook strings1.ipynb

  2. Go on reading the exercises file, sometimes you will find paragraphs marked Exercises which will ask to write Python commands in the following cells.

Shortcut keys:

  • to execute Python code inside a Jupyter cell, press Control + Enter

  • to execute Python code inside a Jupyter cell AND select next cell, press Shift + Enter

  • to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press Alt + Enter

  • If the notebooks look stuck, try to select Kernel -> Restart

Creating strings

There are several ways to define a string.

Double quotes, in one line

[2]:
a = "my first string, in double quotes"
[3]:
print(a)
my first string, in double quotes

Single quotes, in one line

This way is equivalent to previous one.

[4]:
b = 'my second string, in single quotes'
[5]:
print(b)
my second string, in single quotes

Between double quotes, on many lines

[6]:
c = """my third string
in triple double quotes
so I can put it

on many rows"""
[7]:
print(c)
my third string
in triple double quotes
so I can put it

on many rows

Three single quotes, many lines

[8]:
d = '''my fourth string,
in triple single quotes
also can be put

on many lines
'''
[9]:
print(d)
my fourth string,
in triple single quotes
also can be put

on many lines

Printing - the cells

To print a string we can use the function print:

[10]:
print('hello')
hello

Note that apices are not reported in printed output.

If we write the string without the print, we will see the apices indeed:

[11]:
'hello'
[11]:
'hello'

What happens if we write the string with double quotes?

[12]:
"hello"
[12]:
'hello'

Notice that by default Jupyter shows single apices.

The same applies if we assign a string to a variable:

[13]:
x = 'hello'
[14]:
print(x)
hello
[15]:
x
[15]:
'hello'
[16]:
y = "hello"
[17]:
print(y)
hello
[18]:
y
[18]:
'hello'

The empty string

The string of zero length is represented with two double quotes "" or two single apices ''

Note that even if write two double quotes, Jupter shows a string beginning and ending with single apices:

[19]:
""
[19]:
''

The same applies if we associate an empty string to a variable:

[20]:
x = ""
[21]:
x
[21]:
''

Note that even if we ask Jupyter to use print, we won’t see anything:

[22]:
print("")

[23]:
print('')

Printing many strings

For printing many strings on a single line there are different ways, let’s start from the most simple with print:

[24]:
x = "hello"
y = "Python"

print(x,y)   # note that in the printed characters Python inserted a space:
hello Python

We can add to print as many parameters we want, which can also be mixed with other types like numbers:

[25]:
x = "hello"
y = "Python"
z = 3

print(x,y,z)
hello Python 3

Length of a string

To obtain the length of a string (or any sequence in general), we can use the function len:

[26]:
len("ciao")
[26]:
4
[27]:
len("")   # empty string
[27]:
0
[28]:
len('')   # empty string
[28]:
0

QUESTION: Can we write something like this?

"len"("hello")
Show answerShow solution
[29]:
# write here


QUESTION: can we write something like this? What does it produce? an error? a number? which one?

len("len('hello')")
Show answer

QUESTION: What do we obtain if we write like this?

len(((((("ciao"))))))
  1. an error

  2. the length of the string

  3. something else

Show answer

Counting escape sequences: Note that some particular sequences called escape sequences like for example \t occupy less space of what it seems (with len they count as 1), but if we print them they will occupy even more than 2 !!

Let’s see an example (in the next paragraph we will delve into the details):

[30]:
len('a\tb')
[30]:
3
[31]:
print('a\tb')
a       b

Printing - escape sequences

Some characters sequences called escape sequences are special because instead of showing characters, they force the printing to do particular things like line feed or inserting extra spaces. These sequences are always preceded by the backslash character \:

Description

Escape sequence

Linefeed

\n

Tabulation (ASCII tab)

\t

Example - line feed

[32]:
print("hello\nworld")
hello
world

Note the line feed happens only when we use print, if instead we directly put the string into the cell we will see it verbatim:

[33]:
"ciao\nmondo"
[33]:
'ciao\nmondo'

In a string you can put as many escape sequences as you like:

[34]:
print("Today is\na great day\nisn't it?")
Today is
a great day
isn't it?

Example - tabulation

[35]:
print("hello\tworld")
hello   world
[36]:
print("hello\tworld\twith\tmany\ttabs")
hello   world   with    many    tabs

EXERCISE: Since escape sequences are special, we might ask ourselves how long they are. Use the function len to print the string length. Do you notice anything strange?

  • 'ab\ncd'

  • 'ab\tcd'

[37]:
# write the code here


EXERCISE: Try selecting the character sequence printed in the previous cell with the mouse. What do you obtain? A space sequence, or a single tabulation character? Note this can vary according to the program that actually printed the string.

EXERCISE: find a SINGLE string which printed with print is shown as follows:

This    is
an

apparently  simple      challenge
  • USE ONLY combinations of \t and \n

  • DON’T use spaces

  • start and end the string with a single apex

Show solution
[38]:
# write here


This    is
an

apparently      simple          challenge

EXERCISE: try to find a string which printed with print is shown as follows:

At  te
n

    t   ion
    please!
  • USE ONLY combinations of \t and \n

  • DON’T use any space

  • DON’T use triple quotes

Show solution
[39]:
# write here


At      te
n

        t       ion
        please!

Special characters: if we want special characters like the single apex ' or double quotes " inside a string, we must create a so-called escape sequence, that is, we must first write the backslash character \ and then follow it with the special character we’re interested in:

Description

Escape sequence

Printed result

Single apex

\'

'

Double quote

\"

"

Backslash

\\

\

Example

Let’s print a string containing a single apex ' and a double quote ":

[40]:
my_string = "This way I put \'apices\' e \"double quotes\" in strings"
[41]:
print(my_string)
This way I put 'apices' e "double quotes" in strings

If a string begins with double quotes, inside we can freely use single apices, even without backslash \:

[42]:
print("There's no problem")
There's no problem

If the string begins with single apices, we can freely use double quotes even without the backslash \:

[43]:
print('It Is So "If You Think So"')
It Is So "If You Think So"

EXERCISE: Find a string to print with print which shows the following sequence:

  • the string MUST start and finish with single apices '

This "genius" of strings wants to /\\/ trick me \//\ with atrocious exercises O_o'
Show solution
[44]:
# write here


This "genius" of strings wants to /\\/ trick me \//\ with atrocious exercises O_o'

Encodings

ASCII characters

When using strings in your daily programs you typically don’t need to care much how characters are physically represented as bits in memory, but sometimes it does matter. The representation is called encoding and must be taken into account in particular when you read stuff from external sources such as files and websites.

The most famous and used character encoding is ASCII (American Standard Code for Information Interchange), which offers 127 slots made by basic printable characters from English alphabet (a-z, A-Z, punctuation like .;,! and characters like (, @ …) and control sequences (like \t, \n)

Since original ASCII table lacks support for non-English languages (for example, it lacks Italian accented letters like è,à, …), many extensions were made to support other languages, for examples see Extended ASCII page on Wikipedia.

Unicode characters

Whenever we need particular characters like ✪ which are not available on the keyboard, we can look at Unicode characters. There are a lot, and we can often use them in Python 3 by simple copy-pasting. For example, if you go to this page you can copy-paste the character ✪. In other cases it might be so special it can’t even be correctly visualized, so in these cases you can use a more complex sequence in the format \uxxxx like this:

Description

Escape sequence

Printed result

Example star in a circle in format \uxxxx

\u272A

EXERCISE: Search Google for Unicode heart and try printing a heart in Python, both by directly copy-pasting the character and by using the notation \uxxxx

Show solution
[45]:
# write here


I ♥ Python, with copy-paste
I ♥ Python, also in format \uxxxx

Unicode references: Unicode can be a complex topic we just mentioned, if you ever need to deal with complex character sets like japanese or heterogenous text encodings here a couple of references you should read:

Strings are immutable

Strings are immutable objects, so once they are created you cannot change them anymore. This might appear retrictive, but it’s not so tragic, because we still have available these alternatives:

  • generate a new string composed from other strings

  • if we have a variable to which we assigned a string, we can assign another string to that variable

Let’s generate a new string starting from previous ones, for example by joining two of them with the operator +

[46]:
x = 'hello'
[47]:
y = x + 'world'
[48]:
x
[48]:
'hello'
[49]:
y
[49]:
'helloworld'

The + operation, when executed among strings, it joins them by creating a NEW string. This means that the association to x it didn’t change at all, the only modification we can observe will be the variable y which is now associated to the string 'helloworld. Try making sure of this in Python Tutor by repeatdly clicking on Next button:

[50]:
# WARNING: before using the function jupman.pytut() which follows,
# it is necessary to first execute this cell with Shift+Enter

# it's sufficient to execute it only once, you find it also in all other notebooks in the first cell

import jupman
[51]:
x = 'hello'
y = x + 'world'

print(x)
print(y)

jupman.pytut()
hello
helloworld
[51]:
Python Tutor visualization

Reassign variables

Other variations to memory state can be obtained by reassigning the variables, for example:

[52]:
x = 'hello'
[53]:
y = 'world'
[54]:
x = y        # we assign to x the same string contained in y
[55]:
x
[55]:
'world'
[56]:
y
[56]:
'world'

If a string is created and at some point no variables point to it, Python automatically takes care to eliminate it from the memory. In the case above, the string hello is never actually changed: at some point no variable is associated with it anymore and so Python eliminates the string from the memory. Have a look at what happens in Python Tutor:

[57]:
x = 'hello'
y = 'world'
x = y

jupman.pytut()
[57]:
Python Tutor visualization

Reassign a variable to itself

We may ask ourselves what happens when we write something like this:

[58]:
x = 'hello'

x = x
[59]:
print(x)
hello

No big changes, the assignment of x remained the same without alterations.

But what happens if to the right of the = we put a more complex formula?

[60]:
x = 'hello'

x = x + 'world'

print(x)
helloworld

Let’s try to carefully understand what happened.

In the first line, Python generated the string 'hello' and assigned it to the variable x. So far, nothing extraordinary.

Then, in the second line, Python did two things:

  1. it calculated the result of the expression x + 'world', by generating a NEW string helloworld

  2. it assigned the generated string helloworld to the variable x

It is fundamental to understand that whenever a reassignment is performed both passages occurs, so it’s worth repeating them:

  • FIRST the result of the expression to the right of = is calculated (so when the old value of x is still available)

  • THEN the result is associated to the variable to the left of = symbol

If we check out what happens in Python Tutor, this double passage is executed in a single shot:

[61]:
x = 'hello'
x = x + 'world'

jupman.pytut()
[61]:
Python Tutor visualization

EXERCISE: Write some code that changes memory state in such a way so that in the end the following is printed:

z =  This
w =  was
x =  a problem
y =  was
s =  This was a problem
  • to write the code, USE ONLY the symbols =,+,z,w,x,y,s AND NOTHING ELSE

  • feel free to use as many lines of code as you deem necessary

  • feel free to use any symbol as many times you deem necessary

Show solution
[62]:
# these variables are given

z = "This"
w = 'is'
x = 'a problem'
y = 'was'
s = ' '

# write here the code


[63]:

print("z = ", z) print("w = ", w) print("x = ", x) print("y = ", y) print("s = ", s)

Strings and numbers

Python strings have the type str:

[64]:
type("hello world")
[64]:
str

In strings we can insert characters which represent digits:

[65]:
print("The character 5 represents the digit five, the character 3 represents the digit three")
The character 5 represents the digit five, the character 3 represents the digit three

Obviously, we can also substitute a sequence of digits, to obtain something which looks like a number:

[66]:
print("The sequence of characters 7583 represents the number seven thousand five hundred eighty-three")
The sequence of characters 7583 represents the number seven thousand five hundred eighty-three

Having said that, we can ask ourselves how Python behaves when we have a string which contains only a sequence of characters which represents a number, like for example '254'

Can we use 254 (which we wrote like it were a string) also as if it were a number? For example, can we sum 3 to it?

'254' + 3

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-d39aa62a7e3d> in <module>
----> 1 "254" + 3

TypeError: can only concatenate str (not "int") to str

As you see, Python immediately complains, because we are trying to mix different types.

SO:

  • by writing '254' between apices we create a string of type str

  • by writing 254 we create a number of type int

[67]:
type('254')
[67]:
str
[68]:
type(254)
[68]:
int

BEWARE OF print !!

If you try to print a string which only contains digits, Python will show it without apices, and this might mislead you about its true nature !!

[69]:
print('254')
254
[70]:
print(254)
254

Only in Jupyter, to show constants, variables or results of calculations, as print alternative you can directly insert a formula in the cell. In this case we are simply showing a constant, and whenever it is a string you will see apices:

[71]:
'254'
[71]:
'254'
[72]:
254
[72]:
254

The same reasoning applies also to variables:

[73]:
x = '254'
[74]:
x
[74]:
'254'
[75]:
y = 254
[76]:
y
[76]:
254

So, only in Jupyter, when you need to show a constant, a variable or a calculation often it’s more convenient to directly write it in the cell without using print.

Conversions - from string to number

Let’s go back to the problem of summing '254' + 3. The first one is a string, the second a number. If they were both numbers the sum would surely work:

[77]:
254 + 3
[77]:
257

So we can try to convert the string '254' into an authentic integer. To do it, we can use int as if it were a function, and pass as argument the string to be converted:

[78]:
int('254') + 3
[78]:
257

WARNING: strings and numbers are immutable !!

This means that by writing int('254')' a new number is generated without minimally affecting the string '254' from where we started from. Let’s see am example:

[79]:
x = '254'     # assign to variable x the string '254'
[80]:
y = int(x)    # assign to variable y the number obtained by converting '254' in int
[81]:
x             # variable x is now assigned to string '254'
[81]:
'254'
[82]:
y             # in y now there is a number instead (note we don't have apices here)
[82]:
254

It might be useful to see again the example in Python Tutor:

[83]:
x = "254"

y = int(x)

print(y + 3)


jupman.pytut()
257
[83]:
Python Tutor visualization

EXERCISE: Try to convert a string which represents an ill-formed number (for example a number with inside a character: '43K12') into an int. What happens?

[84]:
# write here


Conversions - from number to string

Any object can be converted to string by using str as if it were a function and by passing the object to convert. Let’s try then to convert a number into a string.

[85]:
str(5)
[85]:
'5'

note the apices in the result, which show we actually obtained a string.

If by chance we want to obtain a string which is the concatenation of objects of different types we need to be careful:

x = 5
s = 'Workdays in a week are ' + x
print(s)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-154-5951bd3aa528> in <module>
      1 x = 5
----> 2 s = 'Workdays in a week are ' + x
      3 print(s)

TypeError: can only concatenate str (not "int") to str

A way to circumvent the problem (even if not the most convenient) is to convert into string each of the objects we’re using in the concatenation:

[86]:
x = 3
y = 1.6
s = "This week I've been jogging " + str(x) + " times running at an average speed of " + str(y) + " km/h"
print(s)
This week I've been jogging 3 times running at an average speed of 1.6 km/h

QUESTION: Having said that, after executing the code in previous cell, variable x is going to be associated to a number or a string ?

If you have doubts, use Python Tutor.

Show answer

Formatting strings

Concatenating strings with plus sign like above is cumbersome and error prone. There are several better solutions, for a thorough review we refer to Real Python website.

Formatting with %

Here we now see how to format strings with the % operator. This solution is not the best one, but it’s widely used and supported in all Python versions, so we adopted it throughout the book:

[87]:
x = 3
"I jumped %s times" % x
[87]:
'I jumped 3 times'

Notice we put a so-called place-holder %s inside the string, which tells Python to replace it with a variable. To feed Python the variable, after the string we have to put a % symbol followed by the variable, in this case x.

If we want to place more than one variable, we just add more %s place-holders and after the external % we place the required variables in round parenthesis, separating them with commas:

[88]:
x = 3
y = 5
"I jumped %s times and did %s sprints" % (x,y)
[88]:
'I jumped 3 times and did 5 sprints'

We can put as many variables as we want, also non-numerical ones:

[89]:
x = 3
y = 5
prize = 'Best Athlet in Town'
"I jumped %s times, did %s sprints and won the prize '%s'" % (x,y,prize)
[89]:
"I jumped 3 times, did 5 sprints and won the prize 'Best Athlet in Town'"

Formatting with f-strings

f-strings allow to directly insert expressions between curly brackets {} into the string. To signal Python to calculate and convert the expressions into strings, the string must be preceded by the f letter. Note the moment you add the f your editor should show the expressions between curly brackets with a different color.

Warning: f-strings are only available since Python \(\geq 3.6\)

[90]:
title = "King of Great Britain"
start = 1760
end = 1801

s1 = f"Giorge III was {title.upper()} from {start} until {end}."
print(s1)

s2 = f"He ruled for {end - start} years."
print(s2)
Giorge III was KING OF GREAT BRITAIN from 1760 until 1801.
He ruled for 41 years.

Exercise - supercars

You’ve got some money, so you decide to buy two models of supercars. Since you already know accidents are on the way, for each model you will buy as many cars as there are characters in each model name.

Write some code which stores in the string s the number of cars you will buy into the strings:

  • sa formatted with %s placeholders

  • sb formatted as f-string

Example - given:

car1 = 'Jaguar'
car2 = 'Ferrari'

After your code, it should show:

>>> s1
'I will buy 6 Jaguar and 7 Ferrari supercars'
>>> s2
'I will buy 6 Jaguar and 7 Ferrari supercars'
Show solution
[91]:

car1, car2 = 'Jaguar','Ferrari' # I will buy 6 Jaguar and 7 Ferrari supercars #car1, car2 = 'Porsche','Lamborghini' # I will buy 7 Porsche and 11 Lamborghini supercars # write here

Continue

Go on reading notebook Strings 2 - operators

[ ]: