[Chapter 35] 35.21 Using IFS to Split Strings

35.21 Using IFS to Split Strings

It might not be obvious why the Bourne shell has an IFS (internal field separator) shell variable. By default, it holds three characters: SPACE, TAB, and NEWLINE. These are the places that the shell parses command lines. So what?

If you have a line of text - say, from a database - and you want to split it into fields, the IFS variable can help. Put the field separator into IFS temporarily, use the shell's set ( 44.19 ) command to store the fields in command-line parameters; then restore the old IFS .

For example, the chunk of a shell script below gets current terminal settings from stty -g ( 42.4 ) , which looks like this:

2506:5:bf:8a3b:3:1c:8:15:4:0:0:0:11:13:1a:19:12:f:17:16:0:0

The shell parses the line returned from stty by the backquotes ( 9.16 ) . It stores x in $1 . This trick stops errors if stty fails for some reason - without the x , if stty made no standard output, the shell's set command would print a list of all shell variables. Then 2506 goes into $2 , 5 into $3 , and so on. The original Bourne shell can only handle nine parameters (through $9 ); if your input lines may have more than nine fields, this isn't a good technique. But this script uses the Korn shell, which (along with bash ) doesn't have that limit.

#!/bin/ksh oldifs="$IFS" # Change IFS to a colon: IFS=: # Put x in $1, stty -g output in $2 thru ${23}: set x `stty -g` IFS="$oldifs" # Window size is in 16th field (not counting the first "x"): echo "Your window has ${17} rows."

Because you don't need a subprocess to parse the output of stty , this can be faster than using an external command like cut ( 35.14 ) or awk ( 33.11 ) .

There are places where IFS can't be used because the shell separates command lines at spaces before it splits at IFS . It doesn't split the results of variable substitution or command substitution ( 9.16 ) at spaces, though. Here's an example - three different ways to parse a line from /etc/passwd :

% 

cat splitter

 #!/bin/sh IFS=: line='larry:Vk9skS323kd4q:985:100:Larry Smith:/u/larry:/bin/tcsh' set x $line echo "case 1: \$6 is '$6'" set x `grep larry /etc/passwd` echo "case 2: \$6 is '$6'" set x larry:Vk9skS323kd4q:985:100:Larry Smith:/u/larry:/bin/tcsh echo "case 3: \$6 is '$6'"  % 

./splitter

 case 1: $6 is 'Larry Smith' case 2: $6 is 'Larry Smith' case 3: $6 is 'Larry'

Case 1 used variable substitution and case 2 used command substitution; the sixth field contained the space. In case 3, though, with the colons on the command line, the sixth field was split: $6 became Larry and $7 was Smith . Another problem would have come up if any of the fields had been empty (as in larry::985:100: etc... )-the shell would "eat" the empty field and $6 would contain /u/larry . Using sed with its escaped parentheses ( 34.10 ) to do the searching and the parsing could solve the last two problems.

- JP


35.20 Quick Reference: uniq		35.22 Straightening Jagged Columns