Java's String.split()

Just ran into a "gotcha" in Java that had me scratching my head. I had a string of the form "abc.def.ghi" and I wanted to split it on the dot (".") character. So I had the following code:

String test = "abc.def.ghi";
String[] parts = test.split(".");

Pop quiz: what does the parts array contain?

  • Did you say {"abc", "def", "ghi"}? Sorry, no. 
  • Maybe {"abc.def.ghi"}? Nope. 
  • "Oh, I got it!" you think, and happily announce the solution is {"a", "b", "c", "d", "e", "f", "g", "h", "i"}. Bad news, friend: you are still wrong.
The real solution? parts = {}. That's right, parts is an empty array.

Huh?

Let's start by looking at the API Docs for String's split function. The first thing to realize is that the parameter it takes is a regular expression, and in regex syntax, the dot matches any non-whitespace character. So instead of matching the dots like I wanted, I was matching everything in the string. To actually match a dot character, I needed to escape it with backslashes: test.split("\\."). 

That's a simple enough error, but why would it result in an empty array? If the dot matches every character, shouldn't I have gotten an array where each character is a separate entry? That certainly feels like the right answer for anyone used to matching things with regex, but we have to remember the regex plays a slightly different role in the split function: it's the delimiter. It's the value on which to split the String and is NOT included in the resulting array. 

So my delightful code snippet ended up matching every character in the input ... and then throwing it away. A fun example of simple code that looks right, but isn't.