Count number of words in string

Counting the number of words can be a bit tricky based on requirements, should you split word on whitespace, a specified length or should you only count a distinct set of words. This example will show how to count the number of words in a string by breaking it apart by whitespace. A related example shows how to count distinct words in a file.

Straight up Java

StringTokenizer uses the space character, the tab character, the newline character, the carriage-return character, and the form-feed character (" \t\n\r\f") as a set of predefined delimiters to split a string. countTokens() will return the number of tokens remaining in the current delimiter set which results in the number of words in a string. One thing to note is that the delimiters themselves will not be treated as tokens.

@Test
public void count_words_in_string_java() {
    StringTokenizer stringTokenizer = new StringTokenizer(phrase);
    assertEquals(8, stringTokenizer.countTokens());
}

Java 8

Calling the String.split() passing in a space as a delimiter will break the string into an array of words. Calling Arrays.stream().count() is a reduction operation to count the number of elements OR we could use the array.length attribute.

@Test
public void count_words_in_string_java8() throws IOException {
    assertEquals(8, Arrays.stream(phrase.split(" ")).count());
}

Google Guava

Guava Splitter will split a string on whitespace and then calling the size on the list will give us the number of words in a string.

@Test
public void count_words_in_string_guava() throws IOException {

    List<String> splitWords = Splitter.on(CharMatcher.WHITESPACE)
            .splitToList(phrase);
    assertEquals(8, splitWords.size());
}