Personal tools
You are here: Home Archive 2007 December 21 How to split a string on whitespace when there is quoted stuff in it
Document Actions

How to split a string on whitespace when there is quoted stuff in it

When you are dealing with strings like e.g. the log lines from an apache or other web server log, it is common to find stuff that is inside quotes.

Let's say you want to split that line into its items, e.g. referrer, user agent and other stuff. Since some of that stuff is inside quotes, splitting on whitespace won't work. The trick is to first split on the quotes!

In that way you get an array/list where every odd element is guaranteed to be outside of any quotes, and any even element guaranteed to be inside a pair of quotes. Just iterate through the list: When you are at an odd item, split it on whitespace and store it, when you're on an even item store it away as it is, it's quoted.

Posted by jorgen on 2007-12-21 13:33
Taggar på intressant.se: , , , ,

Escape

Posted by Mikael Ståldal at 2007-12-24 00:18
What if a quoted string contains an escaped quote?

"foo bar" "foo \"bar\" baz" other_stuff

Not in an apache log file

Posted by jorgen at 2007-12-24 00:23
I have never seen that in an apache log file. If the source is http it would be encoded instead of escaped. Otherwise I suppose you could do a search and replace on the string as a first pass and replace the \" with an encoding.


This site conforms to the following standards: