Contents:
Streams
Files
Serialization
Data compression
In this chapter, we'll continue our exploration of the Java API by looking at many of the classes in the java.io package. These classes support a number of forms of input and output; I expect you'll use them often in your Java applications. Figure 8.1 shows the class hierarchy of the java.io package.
We'll start by looking at the stream classes in java.io; these classes are all subclasses of the basic InputStream, OutputStream, Reader, and Writer classes. Then we'll examine the File class and discuss how you can interact with the filesystem using classes in java.io. Finally, we'll take a quick look at the data compression classes provided in java.util.zip.
All fundamental I/O in Java is based on streams. A stream represents a flow of data, or a channel of communication with (at least conceptually) a writer at one end and a reader at the other. When you are working with terminal input and output, reading or writing files, or communicating through sockets in Java, you are using a stream of one type or another. So you can see the forest without being distracted by the trees, I'll start by summarizing the different types of streams.
Abstract classes that define the basic functionality for reading or writing an unstructured sequence of bytes. All other byte streams in Java are built on top of the basic InputStream and OutputStream.
Abstract classes that define the basic functionality for reading or writing an unstructured sequence of characters. All other character streams in Java are built on top of Reader and Writer.
"Bridge" classes that convert bytes to characters and vice versa.
Specialized stream filters that add the ability to read and write simple data types like numeric primitives and String objects.
Specialized streams that incorporate buffering for additional efficiency.
A specialized character stream that makes it simple to print text.
"Double-ended" streams that always occur in pairs. Data written into a PipedOutputStream or PipedWriter is read from its corresponding PipedInputStream or PipedReader.
Implementations of InputStream, OutputStream, Reader, and Writer that read from and write to files on the local filesystem.
Streams in Java are one-way streets. The java.io input and output classes represent the ends of a simple stream, as shown in Figure 8.2. For bidirectional conversations, we use one of each type of stream.
InputStream and OutputStream are abstract classes that define the lowest-level interface for all byte streams. They contain methods for reading or writing an unstructured flow of byte-level data. Because these classes are abstract, you can never create a "pure" input or output stream. Java implements subclasses of these for activities like reading and writing files, and communicating with sockets. Because all byte streams inherit the structure of InputStream or OutputStream, the various kinds of byte streams can be used interchangeably. For example, a method often takes an InputStream as an argument. This means the method accepts any subclass of InputStream. Specialized types of streams can also be layered to provide higher-level functionality, such as buffering or handling larger data types.
In Java 1.1, new classes based around Reader and Writer were added to the java.io package. Reader and Writer are very much like InputStream and OutputStream, except that they deal with characters instead of bytes. As true character streams, these classes correctly handle Unicode characters, which was not always the case with the byte streams. However, some sort of bridge is needed between these character streams and the byte streams of physical devices like disks and networks. InputStreamReader and OutputStreamWriter are special classes that use an encoding scheme to translate between character and byte streams.
We'll discuss all of the interesting stream types in this section, with the exception of FileInputStream, FileOutputStream, FileReader, and FileWriter. We'll postpone the discussion of file streams until the next section, where we'll cover issues involved with accessing the filesystem in Java.
The prototypical example of an InputStream object is the standard input of a Java application. Like stdin in C or cin in C++, this object reads data from the program's environment, which is usually a terminal window or a command pipe. The java.lang.System class, a general repository for system-related resources, provides a reference to standard input in the static variable in. System also provides objects for standard output and standard error in the out and err variables, respectively. The following example shows the correspondence:
InputStream stdin = System.in; OutputStream stdout = System.out; OutputStream stderr = System.err;
This example hides the fact that System.out and System.err aren't really OutputStream objects, but more specialized and useful PrintStream objects. I'll explain these later, but for now we can reference out and err as OutputStream objects, since they are a kind of OutputStream by inheritance.
We can read a single byte at a time from standard input with the InputStream's read() method. If you look closely at the API, you'll see that the read() method of the base InputStream class is actually an abstract method. What lies behind System.in is an implementation of InputStream, so it's valid to call read() for this stream:
try { 
    int val = System.in.read(); 
    ... 
} 
catch ( IOException e ) { 
} 
As is the convention in C, read() provides a byte of information, but its return type is int. A return value of -1 indicates a normal end of stream has been reached; you'll need to test for this condition when using the simple read() method. If an error occurs during the read, an IOException is thrown. All basic input and output stream commands can throw an IOException, so you should arrange to catch and handle them as appropriate.
To retrieve the value as a byte, perform the cast:
byte b = (byte) val;
Of course, you'll need to check for the end-of-stream condition before you perform the cast. An overloaded form of read() fills a byte array with as much data as possible up to the limit of the array size and returns the number of bytes read:
byte [] bity = new byte [1024]; int got = System.in.read( bity );
We can also check the number of bytes available for reading on an InputStream with the available() method. Once we have that information, we can create an array of exactly the right size:
int waiting = System.in.available(); 
if ( waiting > 0 ) { 
    byte [] data = new byte [ waiting ]; 
    System.in.read( data ); 
    ... 
} 
InputStream provides the skip() method as a way of jumping over a number of bytes. Depending on the implementation of the stream and if you aren't interested in the intermediate data, skipping bytes may be more efficient than reading them. The close() method shuts down the stream and frees up any associated system resources. It's a good idea to close a stream when you are done using it.
The InputStream and OutputStream subclasses of Java 1.0.2 included methods for reading and writing strings, but most of them operated by assuming that a sixteen-bit Unicode character was equivalent to an eight-bit byte in the stream. This only works for Latin-1 (ISO8859-1) characters, so the character stream classes Reader and Writer were introduced in Java 1.1. Two special classes, InputStreamReader and OutputStreamWriter, bridge the gap between the world of character streams and the world of byte streams. These are character streams that are wrapped around an underlying byte stream. An encoding scheme is used to convert between bytes and characters. An encoding scheme name can be specified in the constructor of InputStreamReader or OutputStreamWriter. Another constructor simply accepts the underlying stream and uses the system's default encoding scheme. For example, let's parse a human-readable string from the standard input into an integer. We'll assume that the bytes coming from System.in use the system's default encoding scheme.
try {
    InputStreamReader converter = new InputStreamReader(System.in);
    BufferedReader in = new BufferedReader(converter);
    
    String text = in.readLine();
    int i = NumberFormat.getInstance().parse(text).intValue();
} 
catch ( IOException e ) { }
catch ( ParseException pe ) { } 
First, we wrap an InputStreamReader around System.in. This object converts the incoming bytes of System.in to characters using the default encoding scheme. Then, we wrap a BufferedReader around the InputStreamReader. BufferedReader gives us the readLine() method, which we can use to retrieve a full line of text into a String. The string is then parsed into an integer using the techniques described in Chapter 7.
We could have programmed the previous example using only byte streams, and it would have worked for users in the United States, at least. So why go to the extra trouble of using character streams? Character streams were introduced in Java 1.1 to correctly support Unicode strings. Unicode was designed to support almost all of the written languages of the world. If you want to write a program that works in any part of the world, in any language, you definitely want to use streams that don't mangle Unicode.
So how do you decide when you need a byte stream and when you need a character stream? If you want to read or write character strings, use some variety of Reader or Writer. Otherwise a byte stream should suffice. Let's say, for example, that you want to read strings from a file that was written out by a Java 1.0.2 application. In this case you could simply create a FileReader, which will convert the bytes in the file to characters using the system's default encoding scheme. If you have a file in a specific encoding scheme, you can create an InputStreamReader with that encoding scheme and read characters from it. Another example comes from the Internet. Web servers serve files as byte streams. If you want to read Unicode strings from a file with a particular encoding scheme, you'll need an appropriate InputStreamReader wrapped around the socket's InputStream.
What if we want to do more than read and write a mess of bytes or characters? Many of the InputStream, OutputStream, Reader, and Writer classes wrap other streams and add new features. A filtered stream takes another stream in its constructor; it delegates calls to the underlying stream while doing some additional processing of its own.
In Java 1.0.2, all wrapper streams were subclasses of FilterInputStream and FilterOutputStream. The character stream classes introduced in Java 1.1 break this pattern, but they operate in the same way. For example, BufferedInputStream extends FilterInputStream in the byte world, but BufferedReader extends Reader in the character world. It doesn't really matter--both classes accept a stream in their constructor and perform buffering. Like the byte stream classes, the character stream classes include the abstract FilterReader and FilterWriter classes, which simply pass all method calls to an underlying stream.
The FilterInputStream, FilterOutputStream, FilterReader, and FilterWriter classes themselves aren't useful; they must be subclassed and specialized to create a new type of filtering operation. For example, specialized wrapper streams like DataInputStream and DataOutputStream provide additional methods for reading and writing primitive data types.
As we said, when you create an instance of a filtered stream, you specify another stream in the constructor. The specialized stream wraps an additional layer of functionality around the other stream, as shown in Figure 8.3. Because filtered streams themselves are subclasses of the fundamental stream types, filtered streams can be layered on top of each other to provide different combinations of features. For example, you could wrap a PushbackReader around a LineNumberReader that was wrapped around a FileReader.
DataInputStream and DataOutputStream are filtered streams that let you read or write strings and primitive data types that comprise more than a single byte. DataInputStream and DataOutputStream implement the DataInput and DataOutput interfaces, respectively. These interfaces define the methods required for streams that read and write strings and Java primitive types in a machine-independent manner.
You can construct a DataInputStream from an InputStream and then use a method like readDouble() to read a primitive data type:
DataInputStream dis = new DataInputStream( System.in ); double d = dis.readDouble();
The above example wraps the standard input stream in a DataInputStream and uses it to read a double value. readDouble() reads bytes from the stream and constructs a double from them. All DataInputStream methods that read primitive types also read binary information.
The DataOutputStream class provides write methods that correspond to the read methods in DataInputStream. For example, writeInt() writes an integer in binary format to the underlying output stream.
The readUTF() and writeUTF() methods of DataInputStream and DataOutputStream read and write a Java String of Unicode characters using the UTF-8 "transformation format." UTF-8 is an ASCII-compatible encoding of Unicode characters commonly used for the transmission and storage of Unicode text.[1]
[1] Check out the URL http://www.stonehand.com/unicode/standard/utf8.html for more information on UTF-8.
We can use a DataInputStream with any kind of input stream, whether it be from a file, a socket, or standard input. The same applies to using a DataOutputStream, or, for that matter, any other specialized streams in java.io.
The BufferedInputStream, BufferedOutputStream, BufferedReader, and BufferedWriter classes add a data buffer of a specified size to the stream path. A buffer can increase efficiency by reducing the number of physical read or write operations that correspond to read() or write() method calls. You create a buffered stream with an appropriate input or output stream and a buffer size. Furthermore, you can wrap another stream around a buffered stream so that it benefits from the buffering. Here's a simple buffered input stream:
BufferedInputStream bis = new BufferedInputStream(myInputStream, 4096); ... bis.read();
In this example, we specify a buffer size of 4096 bytes. If we leave off the size of the buffer in the constructor, a reasonably sized one is chosen for us. On our first call to read(), bis tries to fill the entire 4096-byte buffer with data. Thereafter, calls to read() retrieve data from the buffer until it's empty.
A BufferedOutputStream works in a similar way. Calls to write() store the data in a buffer; data is actually written only when the buffer fills up. You can also use the flush() method to wring out the contents of a BufferedOutputStream before the buffer is full.
Some input streams like BufferedInputStream support the ability to mark a location in the data and later reset the stream to that position. The mark() method sets the return point in the stream. It takes an integer value that specifies the number of bytes that can be read before the stream gives up and forgets about the mark. The reset() method returns the stream to the marked point; any data read after the call to mark() is read again.
This functionality is especially useful when you are reading the stream in a parser. You may occasionally fail to parse a structure and so must try something else. In this situation, you can have your parser generate an error (a homemade ParseException) and then reset the stream to the point before it began parsing the structure:
BufferedInputStream input; 
... 
try { 
    input.mark( MAX_DATA_STRUCTURE_SIZE ); 
    return( parseDataStructure( input ) ); 
} 
catch ( ParseException e ) { 
    input.reset(); 
    ... 
} 
The BufferedReader and BufferedWriter classes work just like their byte-based counterparts, but operate on characters instead of bytes.
Another useful wrapper stream is java.io.PrintWriter. This class provides a suite of overloaded print() methods that turn their arguments into strings and push them out the stream. A complementary set of println() methods adds a newline to the end of the strings. PrintWriter is the more capable big brother of the PrintStream byte stream. PrintWriter is an unusual character stream because it can wrap either an OutputStream or another Writer. The System.out and System.err streams are PrintStream objects; you have already seen such streams strewn throughout this book:
System.out.print("Hello world...\n"); 
System.out.println("Hello world..."); 
System.out.println( "The answer is: " + 17 ); 
System.out.println( 3.14 ); 
In Java 1.1, the PrintStream class has been enhanced to translate characters to bytes using the system's default encoding scheme. Although PrintStream is not deprecated in Java 1.1, its constructors are. For all new development, use a PrintWriter instead of a PrintStream. Because a PrintWriter can wrap an OutputStream, the two classes are interchangeable.
When you create a PrintWriter object, you can pass an additional boolean value to the constructor. If this value is true, the PrintWriter automatically performs a flush() on the underlying OutputStream or Writer each time it sends a newline:
boolean autoFlush = true; PrintWriter p = new PrintWriter( myOutputStream, autoFlush );
When this technique is used with a buffered output stream, it corresponds to the behavior of terminals that send data line by line.
Unlike methods in other stream classes, the methods of PrintWriter and PrintStream do not throw IOExceptions. Instead, if we are interested, we can check for errors with the checkError() method:
System.out.println( reallyLongString ); if ( System.out.checkError() ) // Uh oh
Normally, our applications are directly involved with one side of a given stream at a time. PipedInputStream and PipedOutputStream (or PipedReader and PipedWriter), however, let us create two sides of a stream and connect them together, as shown in Figure 8.4. This provides a stream of communication between threads, for example.
To create a pipe, we use both a PipedInputStream and a PipedOutputStream. We can simply choose a side and then construct the other side using the first as an argument:
PipedInputStream pin = new PipedInputStream(); PipedOutputStream pout = new PipedOutputStream( pin );
Alternatively :
PipedOutputStream pout = new PipedOutputStream( ); PipedInputStream pin = new PipedInputStream( pout );
In each of these examples, the effect is to produce an input stream, pin, and an output stream, pout, that are connected. Data written to pout can then be read by pin. It is also possible to create the PipedInputStream and the PipedOutputStream separately, and then connect them with the connect() method.
We can do exactly the same thing in the character-based world, using PipedReader and PipedWriter in place of PipedInputStream and PipedOutputStream.
Once the two ends of the pipe are connected, use the two streams as you would other input and output streams. You can use read() to read data from the PipedInputStream (or PipedReader) and write() to write data to the PipedOutputStream (or PipedWriter). If the internal buffer of the pipe fills up, the writer blocks and waits until more space is available. Conversely, if the pipe is empty, the reader blocks and waits until some data is available. Internally, the blocking is implemented with wait() and notifyAll(), as described in Chapter 6, Threads.
One advantage to using piped streams is that they provide stream functionality in our code, without compelling us to build new, specialized streams. For example, we can use pipes to create a simple logging facility for our application. We can send messages to the logging facility through an ordinary PrintWriter, and then it can do whatever processing or buffering is required before sending the messages off to their ultimate location. Because we are dealing with string messages, we use the character-based PipedReader and PipedWriter classes. The following example shows the skeleton of our logging facility:
import java.io.*; 
 
class LoggerDaemon extends Thread { 
    PipedReader in = new PipedReader();  
 
    LoggerDaemon() { 
        setDaemon( true ); 
        start(); 
    } 
 
    public void run() { 
        BufferedReader din = new BufferedReader( in ); 
        String s; 
  
        try { 
           while ( (s = din.readLine()) != null ) { 
                // process line of data 
                // ... 
            } 
        }  
        catch (IOException e ) { } 
    } 
 
    PrintWriter getWriter() throws IOException { 
        return new PrintWriter( new PipedWriter( in ) ); 
    } 
} 
 
class myApplication { 
    public static void main ( String [] args ) throws IOException { 
        PrintWriter out = new LoggerDaemon().getWriter(); 
 
        out.println("Application starting..."); 
        // ... 
        out.println("Warning: does not compute!"); 
        // ... 
    } 
} 
LoggerDaemon is a daemon thread, so it will die when our application exits. LoggerDaemon reads strings from its end of the pipe, the PipedReader in. LoggerDaemon also provides a method, getWriter(), that returns a PipedWriter that is connected to its input stream. Simply create a new LoggerDaemon and fetch the output stream to begin sending messages.
In order to read strings with the readLine() method, LoggerDaemon wraps a BufferedReader around its PipedReader. For convenience, it also presents its PipedWriter as a PrintWriter, rather than a simple Writer.
One advantage of implementing LoggerDaemon with pipes is that we can log messages as easily as we write text to a terminal or any other stream. In other words, we can use all our normal tools and techniques. Another advantage is that the processing happens in another thread, so we can go about our business while the processing takes place.
There is nothing stopping us from connecting more than two piped streams. For example, we could chain multiple pipes together to perform a series of filtering operations.
The StringReader class is another useful stream class. The stream is created from a String; StringReader essentially wraps stream functionality around a String. Here's how to use a StringReader:
String data = "There once was a man from Nantucket..."; StringReader sr = new StringReader( data ); char T = (char)sr.read(); char h = (char)sr.read(); char e = (char)sr.read();
Note that you will still have to catch IOExceptions thrown by some of the StringReader's methods.
The StringReader class is useful when you want to read data in a String as if it were coming from a stream, such as a file, pipe, or socket. For example, suppose you create a parser that expects to read tokens from a stream. But you want to provide a method that also parses a big string. You can easily add one using StringReader.
Turning things around, the StringWriter class lets us write to a character string through an output stream. The internal string grows as necessary to accommodate the data. In the following example, we create a StringWriter and wrap it in a PrintWriter for convenience:
StringWriter buffer = new StringWriter(); 
PrintWriter out = new PrintWriter( buffer ); 
 
out.println("A moose once bit my sister."); 
out.println("No, really!"); 
 
String results = buffer.toString(); 
First we print a few lines to the output stream, to give it some data, then retrieve the results as a string with the toString() method. Alternately, we could get the results as a StringBuffer with the getBuffer() method.
The StringWriter class is useful if you want to capture the output of something that normally sends output to a stream, such as a file or the console. A PrintWriter wrapped around a StringWriter competes with StringBuffer as the easiest way to construct large strings piece by piece. While using a StringBuffer is more efficient, PrintWriter provides more functionality than the normal append() method used by StringBuffer.
Before we leave streams, let's try our hand at making one of our own. I mentioned earlier that specialized stream wrappers are built on top of the FilterInputStream and FilterOutputStream classes. It's quite easy to create our own subclass of FilterInputStream that can be wrapped around other streams to add new functionality.
The following example, rot13InputStream, performs a rot13 operation on the bytes that it reads. rot13 is a trivial algorithm that shifts alphanumeric letters to make them not quite human-readable; it's cute because it's symmetric. That is, to "un-rot13" some text, simply rot13 it again. We'll use the rot13InputStream class again in the crypt protocol handler example in Chapter 9, Network Programming, so we've put the class in the example.io package to facilitate reuse. Here's our rot13InputStream class:
package example.io; 
import java.io.*; 
 
public class rot13InputStream extends FilterInputStream { 
 
    public rot13InputStream ( InputStream i ) { 
        super( i ); 
    } 
 
    public int read() throws IOException { 
        return rot13( in.read() ); 
    } 
 
    private int rot13 ( int c ) { 
        if ( (c >= 'A') && (c <= 'Z') )            c=(((c-'A')+13)%26)+'A';        if ( (c >= 'a') && (c <= 'z') ) 
            c=(((c-'a')+13)%26)+'a'; 
        return c; 
    } } 
The FilterInputStream needs to be initialized with an InputStream; this is the stream to be filtered. We provide an appropriate constructor for the rot13InputStream class and invoke the parent constructor with a call to super(). FilterInputStream contains a protected instance variable, in, where it stores the stream reference and makes it available to the rest of our class.
The primary feature of a FilterInputStream is that it overrides the normal InputStream methods to delegate calls to the InputStream in the variable in. So, for instance, a call to read() simply turns around and calls read() on in to fetch a byte. An instance of FilterInputStream itself could be instantiated from an InputStream; it would pass its method calls on to that stream and serve as a pass-through filter. To make things interesting, we can override methods of the FilterInputStream class and do extra work on the data as it passes through.
In our example, we have overridden the read() method to fetch bytes from the underlying InputStream, in, and then perform the rot13 shift on the data before returning it. Note that the rot13() method shifts alphabetic characters, while simply passing all other values, including the end of stream value (-1). Our subclass now acts like a rot13 filter. All other normal functionality of an InputStream, like skip() and available() is unmodified, so calls to these methods are answered by the underlying InputStream.
Strictly speaking, rot13InputStream only works on an ASCII byte stream, since the underlying algorithm is based on the Roman alphabet. A more generalized character scrambling algorithm would have to be based on FilterReader to handle Unicode correctly.