Introduction to string manipulation
Python offers many convenient features, libraries and functions for you to easily do many things will strings.
I have deliberately not bombarded you with all these convenient features before this, as I wanted you to concentrate on the fundamentals of programming first.
But now that you have (hopefully) mastered programming basics, it is time to let you all loose into all the fancy things you can easily do with Python strings!
So, firstly, to recap,
str
is a Python built-in data type.- A
str
literal is preresented by surrounding sequence of characters in'single quotes'
,"double quotes"
,'''triple quotes'''
or even"""triple double-quotes"""
. - In Python’s eyes,
'abc' == "abc" == '''abc''' == """abc"""
- A
str
is immutable.
Like everything in Python, it is also an object. To be specific, it is of class type
.
>>> type(str)
>>> type(str())
Python provides many convenient string methods and attributes. We will look at these in a bit.
A string is also a sequence, more specifially a text sequence. So you access individual characters, perform slicing and iterate over strings just as you would do with lists, as well as use the +
, *
, in
operators.
str
is a sequence because it implements an abstract class called Sequence
common to all sequences (like list
and tuple
). We briefly touched on abstract classes when talking about Polymorphism.
from collection.abc import Sequence
print(issubclass(str, Sequence))
print(isinstance("a string", Sequence))
Sequence
is a Python Abstract Base Class (ABC), which says that any classes which inherits it must implement the special methods __getitem__()
(to enable indexing, i.e. item[10]
) and __len__
(to allow len()
to work correctly). It also provides default implementations for __contains__
, __iter__
, __reversed__
, index
, and count
, but subclasses can override these. Therefore, str
has all these methods implemented.
Important note (a.k.a. Josiah corrects himself)
If looked from this point of view, then Python actually sort of performs duck typing - “if it walks like a duck and quacks like a duck, then it must be a duck”. Here, if a class has a __getitem__
, __len__
, __contains__
, __iter__
, __reversed__
, index
and count
as members, then it must be a Sequence
.
This is contrary to what I said in the pre-sessional video, but there I was using the term duck typing in the context of strong vs. weak typing (5 != “5”). I have since found that duck typing is nowadays more often used to describe such polymorphic “dynamic typing” as above, rather than the traditional sense of a language being weakly-typed when I first encountered the term. Therefore I stand corrected and will now use the term “duck typing” in this sense. And in this sense, Python does perform duck typing!