Divide on words and length

BreakIterator is one of those classes that nobody ever remembers but when you need it it comes in handy. Apache word utils and guava splitter contain many ways to split or divide words based on characters but in some cases it might be to much. I answered a question on java ranch where someone wanted to divide a string into substrings with a maximum length not breaking in the middle of a word. The substring should end with the last character of a word or with a period.

Below we will use BreakIterator to loop through each element in order appending each element of the String to a StringBuilder. When the length of StringBuilder plus the length of the next element in the String reaches the maximum defined length we will break the string and add it to an ArrayList.

@Test
public void splitWordAndLength() {

    StringBuilder sb = new StringBuilder();
    List<String> brokenStrings = new ArrayList<String>();

    BreakIterator boundary = BreakIterator.getWordInstance();
    boundary.setText(val);

    int start = boundary.first();
    for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary
            .next()) {

        int lengthOfNext = end - start;

        if ((sb.length() + lengthOfNext) > 180) {
            brokenStrings.add(sb.toString());
            sb = new StringBuilder(); // or set to 0
        }

        sb.append(val.substring(start, end));

        // if last element
        if (end == val.length()) {
            brokenStrings.add(sb.toString());
        }
    }

    for (String x : brokenStrings) {
        System.out.println(x);
    }
}

Output

Start at position 179 of the string (or at the end if it isn't that long) and work backwards until you hit the end of a word or a period. Chop that segment off and throw it into
your pot, then repeat until there's nothing left. Note that the end of a word can be followed by a space, or by a non-period punctuation mark like a comma, or by the end of the
string. Be careful not to go past the end of the string in the latter case.