I just revisited the classic C textbook K&R. And read the exercise 1-11:
How would you test the word count program? What kinds of input are most likely to uncover bugs if there are any?
Actually, I only have a basic idea to manually count an existing paragraph to get the exact word numbers and compare it with the result word count program calculates.
Is there anything I've missed? And what is the trick of the test?
EDIT
Answers summary:
Semantic definition of word, some special cases:
- link word: "cat-walk"
- small word: a, b,c
- biiiiiig words: "a fooooooooo<40MILLIONLETTERS>ooooooo a" has 3 words
boundary conditions:
- Texts with multiple spaces between words.
- Texts bigger than 2GB
- Words which contain a dash but no whitespace.
- Non-ascii words.
- Files in some different encoding (if your program supports that)
- Characters which are surrounded by whitespace but do not contain any word characters (e.g. "hello - world")
- Texts without any words
- Texts with all words on a single line