This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Introduction to string manipulation

Python offers many convenient features, libraries and functions for you to easily do many things will strings.

I have deliberately not bombarded you with all these convenient features before this, as I wanted you to concentrate on the fundamentals of programming first.

But now that you have (hopefully) mastered programming basics, it is time to let you all loose into all the fancy things you can easily do with Python strings!

So, firstly, to recap,

  • str is a Python built-in data type.
  • A str literal is preresented by surrounding sequence of characters in 'single quotes', "double quotes", '''triple quotes''' or even """triple double-quotes""".
  • In Python’s eyes, 'abc' == "abc" == '''abc''' == """abc"""
  • A str is immutable.

Like everything in Python, it is also an object. To be specific, it is of class type.

>>> type(str)
>>> type(str())

Python provides many convenient string methods and attributes. We will look at these in a bit.

A string is also a sequence, more specifially a text sequence. So you access individual characters, perform slicing and iterate over strings just as you would do with lists, as well as use the +, *, in operators.

str is a sequence because it implements an abstract class called Sequence common to all sequences (like list and tuple). We briefly touched on abstract classes when talking about Polymorphism.

from collection.abc import Sequence
print(issubclass(str, Sequence))
print(isinstance("a string", Sequence))

Sequence is a Python Abstract Base Class (ABC), which says that any classes which inherits it must implement the special methods __getitem__() (to enable indexing, i.e. item[10] ) and __len__ (to allow len() to work correctly). It also provides default implementations for __contains__, __iter__, __reversed__, index, and count, but subclasses can override these. Therefore, str has all these methods implemented.

Important note (a.k.a. Josiah corrects himself)

If looked from this point of view, then Python actually sort of performs duck typing - “if it walks like a duck and quacks like a duck, then it must be a duck”. Here, if a class has a __getitem__, __len__, __contains__, __iter__, __reversed__, index and count as members, then it must be a Sequence.

This is contrary to what I said in the pre-sessional video, but there I was using the term duck typing in the context of strong vs. weak typing (5 != “5”). I have since found that duck typing is nowadays more often used to describe such polymorphic “dynamic typing” as above, rather than the traditional sense of a language being weakly-typed when I first encountered the term. Therefore I stand corrected and will now use the term “duck typing” in this sense. And in this sense, Python does perform duck typing!