Exercises
I am not perscribing any formal exercises for this topic.
But if you want to practise manipulating files, I have provided you with two “real-world” files to play with.
Here are some possible things you can do:
- Generate a frequency dictionary for the words in the text. Remember that you have to deal with punctuations (just deleting them should be sufficient, although this won’t be perfect). And are you considering Capitalised, upper case and lower case words to all be the same?
- Try to split the file into multiple files (per chapter)?
- Find out whether they are any words with no vowels (I don’t know!)
- Find the ten most frequent words ending with “ing” (I’m sure there are!)
- Find out how many three-letter words are there in the texts? Four-letter words?
- What is the longest word in the texts? How long?
I just came up with these off the top of my head, without really checking the texts. So some of these might be stupid! Please completely feel free to come up with questions of your own!
Remember that these are real texts, so you should not aim for perfection - that is a Text and Natural Language Processing task and well beyond the scope of our course!
Have fun!
This is an entirely optional exercise, so feel free to skip it or come back to it later in your spare time.