Rapid Deployment Data Streams
VERSION 4.1

[The following sections are generated directly from the source file: RapidCpc.h
Revision: 1.6 Date: 1996/12/26 04:01:26
©1996 Cartesian Products, Inc. All rights reserved.]

The CPC Rapid Deployment API defines a framework for building CPC compression and decompression applications in which the raw CPC data is stored in files. While files serve as a suitable storage framework for many environments, there are certain situations in which the application requires greater control of the storage and retrieval process. For example, in a network environment, there may not exist a file-based abstraction of network I/O, instead requiring the application to use special network-defined functions for the transfer of data. Similarly, a database environment is likely to require that the application use special database-defined functions for storage and retrieval. To provide this level of application control, CPC defines a polymorphic abstraction of a file, known as a Data Stream, and performs all data storage and retrieval by manipulating application-supplied instances of this abstraction. This document describes the Rapid Deployment Data Streams API, which allows a developer to implement application-specific Data Streams, and thereby take complete control of the storage and retrieval of CPC data.

Note: This document assumes that the reader is already familiar with the Rapid Deployment API. Cartesian recommends that developers use that API for situations in which a file-based API is appropriate. It is simpler, and the CPC-specific portions of the application will be smaller.


1. Data Streams

A Data Stream can be viewed abstractly as a contiguous array of bytes that can be read or written at arbitrary offsets. The methods used to manipulate a Data Stream are polymorphic in that each Data Stream supplies its own implementation. The Stream user (e.g., the CPC library) is totally unaware of the actual type of storage that sits behind each Data Stream. It could be a local disk file, a memory buffer, a transactional database, an RPC circuit, or some other entity. The Stream user merely tells the Stream that it wants to read or write data; the actual procedures used to perform the operation are totally up to the specific Stream's implementation.

The CPC library provides a set of functions for the generic manipulation of Data Streams. The following sections describe these manipulation interfaces.

Note: This document describes only those functions that may be needed for interacting with the Rapid Deployment API.

Opaque Type: DataStr
The opaque data type used to represent a Data Stream to a Stream user.
typedef struct DataStr DataStr;

1.1. Error Handling

A Data Stream maintains a latched error state, which can be used by the application to provide descriptive errors to the end-user. The error state consists of a pointer to a string that describes the error, and a categorization of the error as a hard error or soft error. A soft error can be cleared and will be overridden by a hard error. A hard error can not be cleared and is not overridden by subsequent hard errors.

Note: After an error is latched, it is up to the Stream's implementation as to whether or not subsequent I/Os fail.

Function Definition: strGetError
Returns 0 if the Stream, str, is not in error. Otherwise, returns a zero-terminated ASCII string describing the nature of the error. The returned string will remain valid even after the Stream is closed.
Warning: The returned string is owned by the CPC library and should not be modified or deallocated by the application.
char const * strGetError(DataStr *str);

Function Definition: strSetError
Put the Data Stream, str, into the error state described by err. If isHard is non-zero, the error is a permanent hard error. Otherwise, the error is a transient soft error. This specific error will be latched if the Stream is not currently in error, or if the Stream is in a soft error state and isHard is non-zero.
Warning: err must be a valid pointer even after str is closed. Typically, this restriction is met by using string constants.
void strSetError(DataStr *str, char const* err, unsigned isHard);

Function Definition: strClearError
If the Stream, str, is in a soft error state, clears the error state of the Stream. Otherwise, this function is a no-op.
void strClearError(DataStr *str);

1.2. Closing a Stream

When the application is done with a Stream, it must be closed.

Note: This is typically done automatically when the application invokes cpcEnc_destroy or cpcDec_destroy.

Function Definition: strClose
Close the Data Stream, str, deallocating or detaching all of its resources. If propagateClose is non-zero, any subsidiary resources owned by the Stream are also closed and deallocated. Otherwise, the Stream merely detaches itself from any subsidiary resources. (The definition of subsidiary resources is determined by the implementation of the Stream.)
Returns 0 if the Stream was closed without error (and the Stream was not in error at the time of the close). Otherwise, returns a zero-terminated ASCII string describing the nature of the error.
Warning: On return from this routine, str is no longer valid and should not be used on any subsequent operations.
Warning: The returned string is owned by the CPC library. It should not be modified or deallocated by the application.
char const *strClose(DataStr *str, unsigned propagateClose);


2. Stream Agents

A Data Stream Agent is the entity that implements the behavior of a Data Stream. Typically, there is a many-to-one relationship between Data Streams and Data Stream Agents. For example, there can be several open FooBar Data Streams, but there is only one FooBar Agent. Borrowing from object-oriented terminology, we refer to a set of Data Streams that share a common Agent as a Class of Data Stream.

2.1. Methods

A Data Stream Agent must implement six methods, of which two are optional. This section describes the methods.

Note: As described below, some of the methods are allowed to return errors. In addition to returning an error, the Agent can also call strSetError to provide an Agent-specific message describing the error. (If the Agent does not call strSetError, a generic error description is used.)

2.1.1. Seek Pointer

The Agent must maintain an independent seek pointer for each Stream. The seek pointer determines the offset within the Stream at which to perform the next read or write.

DataStrAgent Method: GetPos
Prototype: int (*GetPos)(DataStr *str)
Example: GetPos_File
Returns the current position of the seek pointer for str, or a negative value if str is in error and the current position of the seek pointer can not be ascertained.

DataStrAgent Method: SetPos
Prototype: unsigned (*SetPos)(DataStr *str, unsigned long pos)
Example: SetPos_File
Set the current position of the seek pointer for str to pos.
Returns non-zero if the operation was successful, zero otherwise.
Note: The Agent is not required to return an error on a failed seek; it can wait until the I/O is attempted and return an error there.

2.1.2. Data Transfer

The following two methods are used to perform I/O on the Stream. After each I/O, the seek pointer should automatically advance by the number of bytes transferred.

DataStrAgent Method: Read
Prototype: int (*Read)(DataStr *str, void *buf, Ulong cnt)
Example: Read_File
Read cnt bytes of data from the Stream, str, into buf. The data should be read from the current position of the Stream's seek pointer.
Returns the number of bytes of data transferred to buf, or a negative value if an error occurred. On return, the Agent should advance the seek pointer by the number of bytes transferred.
Note: A CpcEncoder will never attempt to read from its output Data Stream. Hence, an encode-only application need not implement this method.
Warning: If all of the requested data is not yet available, the Agent should block the caller until it is available. The Agent should only return a value less than cnt if an error occurs or the end of the Stream has been reached.

DataStrAgent Method: Write
Prototype: int (*Write)(DataStr*str, void *buf, Ulong cnt)
Example: Write_File
Write cnt bytes of data from buf into the Stream, str. The data should be written at the current position of the Stream's seek pointer.
Returns the number of bytes of data transferred to buf, or a negative value if an error occurred. On return, the Agent should advance the seek pointer by the number of bytes transferred.
Note: A CpcDecoder will never attempt to write to its input Data Stream. Hence, a decode-only application need not implement this method.
Warning: The Agent should only return a value less than cnt if an error occurs.

2.1.3. Close

The following method is invoked when a Stream is closed, allowing the Agent to deallocate the resources which the Stream is consuming. This includes deallocation of the DataStr structure itself (since the structure is allocated by the Agent).

The method is passed a parameter indicating whether or not subsidiary resources of the Stream should also be closed. The definition of subidiary resource is up to the Agent. The general idea is that a subsidiary resource is one that could potentially continue to be used after the Stream is closed. For example, an Agent that interacts with an open file might consider the open file to be a subsidiary resource. This would allow the Stream user to close the Stream without closing the open file.

Note: There is no requirement that the Agent consider any of the Stream's resources to be subsidiary.

DataStrAgent Method: Close
Prototype: char const *(*Close)(DataStr *str, Boolean propagateClose)
Example: CloseFile
Close the Data Stream, str, deallocating its resources. If propagateClose is non-zero, any subsidiary resources owned by the Stream are also closed and deallocated. Otherwise, the Stream is merely detached from its subsidiary resources.
Returns 0 if the Stream was closed without error. Otherwise, returns a pointer to a zero-terminated ASCII string describing the nature of the error.
Note: On return from this routine, str is assumed to be deallocated and will not be passed to any subsequent method invocations.
Warning: Since on return from this routine the Stream is closed, the returned string must be valid after the Stream is deallocated. Typically, this restriction is met by using string constants.

2.1.4. Optional Methods

The remaining methods are optional.

Error Notification

The Agent can (optionally) provide a method that is invoked on any change to the error state of a Stream. The Agent can use strGetError to retrieve the specific error.

DataStrAgent Method: OnErrorChange
Prototype: void (*OnErrorChange)(DataStr *str)
Invoked by the CPC Library on any change to the error state of the Stream, str.

The Stream Length

The Agent can (optionally) provide a method to query the total number of bytes contained in the Stream.

Note: This method is not used in the Rapid Deployment API. There is no reason to implement it.

DataStrAgent Method: GetLen
Prototype: int (*GetLen)(DataStr *str)
Returns the total number of bytes of data contained in str, or a negative value if the length of the Stream is not known.

2.2. Agent Definition

An Agent is defined by the DataStrAgent structure, which contains pointers to the functions which implement the Agent's methods.

Type Definition: DataStrAgent
Defines the Agent implementations of the methods for a Data Stream Class. The semantics of the methods were discussed in the preceding sections. If an Agent does not implement an optional method, it should set the corresponding pointer to zero.
typedef struct DataStrAgent {
  char const *(*Close)(DataStr *, unsigned);

  int (*GetPos)(DataStr *);
  unsigned (*SetPos)(DataStr *, unsigned long);
 
 This is an optional method. It is not used in the Rapid Deployment API.
int (*GetLen)(DataStr *); int (*Read)(DataStr *, void *, unsigned long); int (*Write)(DataStr *, void *, unsigned long);
 
 This is an optional method.
void (*OnErrorChange)(DataStr *); }DataStrAgent;

2.3. Instances

Typically, the Agent will provide some sort of factory function for creating Data Streams of its Class (for example, see fileStr_open). The Agent is responsible for allocating the necessary instance memory for the Stream, including a DataStr to represent the Stream generically. The following function must be invoked by the Agent to initialize a newly allocated Stream.

Function Definition: strInit
Initialize the newly allocated Data Stream, str, which is controlled by the Data Stream Agent, agent.
void strInit(DataStr *str, DataStrAgent const *agent);

Each of the Agent methods is passed a DataStr pointer identifying the Stream instance on which to perform the method. (This is the same pointer that was passed by the Agent to strInit.) If the Agent maintains any per-Stream data structures, it will need some mechanism for mapping the DataStr pointer to its own per-Stream structure. One simple solution to this problem is to embed the DataStr as the first field of a larger structure which contains the Agent-specific Stream information. This allows the Agent to map the DataStr pointer to its own structure by direct pointer coercion (since both structures always start at the same address).

Note: There is no requirement that an Agent be implemented in this manner.

Type Definition: DataStr
The generic structure of a Data Stream. The Agent should never have to directly manipulate the fields of this structure. The structure is exposed to allow Agents to embed it in their own internal structures.
Warning: The semantics and fields of this structure are not defined by this API. They are subject to change without notice.
struct DataStr {
    DataStrAgent const *agent; 
    unsigned char normalizeValues, 
      exceptionsEnabled, errorIsHard;
    char const *error;
};


3. CPC Coding

The procedures for using Data Streams to compress and decompress CPC images are virtually identical to the procedures described in the Rapid Deployment API. The only difference is the function used to create the CpcEncoder or CpcDecoder.

3.1. Encoding

Applications that compress CPC image data use the CpcEncoder object of the Rapid Deployment API. The only differences from the procedures described there are:

Function Definition: cpcEnc_createFromStream
Create a CPC encoder that sends its compressed CPC data to the Data Stream, sink, starting at the Stream's current seek position. If progressive is non-zero, the document is encoded using the CPC-Progressive format. Otherwise, it is encoded using the CPC-Normal format.
Returns a pointer to the encoder, or 0 if the encoder could not be created. The only reason for failure is that memory could not be allocated.
Note: The CPC encoder never tries to read from sink.
CpcEncoder *cpcEnc_createFromStream(DataStr *sink, unsigned progressive);

3.2. Decoding

Applications that decompress CPC image data use the CpcDecoder object of the Rapid Deployment API. The only differences from the procedures described there are:

Function Definition: cpcDec_createFromStream
Create a CPC decoder that reads its compressed CPC data from the Data Stream, data, starting at the Stream's current seek position. If sequential is non-zero, the decoder is configured for sequential access to the pages. Otherwise, the decoder is configured for random access. (On large documents, random access uses an additional 750k of memory.)
Returns a pointer to the decoder, or 0 if the decoder could not be created. The only reason for failure is that memory could not be allocated.
Note: The CPC decoder never tries to write to data.
CpcDecoder *cpcDec_createFromStream(DataStr *data, unsigned sequential);

3.3. Data Signatures

In applications that deal with multiple image formats, it is often desirable to determine the particular format of an incoming Data Stream by examining the contents of the Stream. The following function can be used to determine if the data contained in a Stream appears to be CPC-formatted image data.

Function Definition: cpcDec_checkSignature
Returns non-zero if the data at the current seek position of the Data Stream data, appears to be CPC data. Otherwise, returns zero.
unsigned cpcDec_checkSignature(DataStr *data);


4. An Example

[The following sections are generated directly from the source file: CpcStr.c
Revision: 1.4 Date: 1996/12/26 21:57:31
©1996 Cartesian Products, Inc. All rights reserved.]

In this section, we develop a full sample application, which compresses and decompresses CPC data using the Rapid Deployment Data Streams API.

Note: The code in this example is taken directly from the CpcStr sample application of the Rapid Deployment SDK.

4.1. File Streams

In order to use the Data Stream API, we need to implement a Stream Agent. In this section, we implement an Agent that uses files for the underlying Stream storage and retrieval, referred to as the File Agent.

Note: The File Agent is provided for expository purposes only. It provides no additional functionality over the file-based Rapid Deployment API. (In fact, the File Agent is a simplification of the Agent used to implement the file-based API.)

4.1.1. Instance Data

We use the ANSI stdio API for manipulating the files. Hence, in order to process the Stream methods, we will need to know the FILE pointer corresponding to each open Stream. As recommended in § Instances, the File Agent augments the generic Data Stream with additional information by embedding a DataStr as the first field of its Agent-specific data structure.

Type Definition: FileStr
Describes an open File Stream. gen contains the generic description of the stream. fp is a pointer to the open file that stands behind the stream.
typedef struct { DataStr gen; FILE *fp; } FileStr;

Mapping

Since the first field of a FileStr is the DataStr, both structures will start at the same address. Hence, the mapping of a DataStr pointer to a FileStr pointer can be performed by simple pointer coercion.

Function Definition: SubClass
Convert the generic Data Stream, str, to the corresponding FileStr.
Warning: str must truly be a File Stream.
static FileStr *SubClass(DataStr *str) { return (FileStr *)str; }

Errors

When an error occurs in a stdio operation, we set the error state of the Stream to be the stdio description of the error.

Function Definition: SetError
Set the error state for str to be the error contained in the underlying stdio file. dfltMsg is used as the error if the underlying file does not contain a known error code (or does not appear to be in error).
static void SetError(DataStr *str, char const *dfltMsg) 
{ 
  FILE *fp = SubClass(str)->fp; 
  if(ferror(fp) && errno < sys_nerr) {
    dfltMsg = sys_errlist[errno];
  }
  strSetError(str, dfltMsg, 0);
}

4.1.2. Method Implementations

This section provides the implementation of the Agent methods.

The Seek Pointer

A seek pointer (with the appropriate semantics) is maintained by the underlying stdio file. Hence, we use the file's seek pointer to implement the Stream's seek pointer.

Function Definition: GetPos_File
Returns the current position of the seek pointer for str.
static int GetPos_File(DataStr *str) 
{ 
  return ftell(SubClass(str)->fp);
}

Function Definition: SetPos_File
Set the current position of the seek pointer for str to pos.
Returns non-zero if the seek was successful, zero otherwise.
static unsigned 
SetPos_File(DataStr *str, unsigned long pos)
{
  FILE *fp = SubClass(str)->fp;
  unsigned worked = !fseek(fp, pos, SEEK_SET);
  if(!worked) { SetError(str, "Seek error"); }
  return worked;
}

Transferring Data

The following two functions implement the File Agent's data transfer methods.

Function Definition: Read_File
Read cnt bytes of data from str into buf. The data is read from the current position of the Stream's seek pointer.
Returns the number of bytes of data transferred to buf, or a negative value if an error occurred. On return, the seek pointer is advanced by the number of bytes transferred.
static int 
Read_File(DataStr *str, void *buf, unsigned long cnt)
{
 
 fread returns a short count on an error or end-of-file. We use ferror to disambiguate the two cases.
FILE *fp = SubClass(str)->fp; size_t numRead = fread(buf, 1, cnt, fp); if(numRead==cnt || !ferror(fp)) { return numRead; }
 
 Set the error state of the stream.
SetError(str, "fread error"); return -1; }

Function Definition: Write_File
Write cnt bytes of data from buf into the stream, str. The data is written at the current position of the Stream's seek pointer.
Returns the number of bytes of data transferred to buf, or a negative value if an error occurred. On return, the seek pointer is advanced by the number of bytes transferred.
static int 
Write_File(DataStr *str, void *buf, unsigned long cnt)
{
 
 We always consider a short write to be an error.
FILE *fp = SubClass(str)->fp; size_t numWritten = fwrite(buf, 1, cnt, fp); if(numWritten == cnt) { return numWritten; }
 
 Set the error state of the stream.
SetError(str, "fwrite error"); return -1; }

Closing the Stream

The File Agent defines the file to be a subsidiary resource, and hence, will optionally leave the file open after the Stream is closed.

Function Definition: Close_File
Close the Data Stream, str, deallocating its resources. If closeFp is non-zero, the underlying file owned by the Stream is also closed. Otherwise, the Stream merely detaches itself from the underlying file.
Returns 0 if the Stream was closed without error. Otherwise, returns a pointer to a zero-terminated ASCII string describing the nature of the error.
static char const *
Close_File(DataStr *str, unsigned closeFp)
{
 
 Flush the file and latch any final error.
char const *err; FILE *fp = SubClass(str)->fp; if(fflush(fp)) { SetError(str, "fflush error"); }
 
 Close the file if requested.
if(closeFp) { fclose(fp); }
 
 Deallocate the str (which we allocated in fileStr_open).
err = strGetError(str); free(str); return err; }

File Agent Definition

The Agent is defined by initializing a DataStrAgent structure with pointers to our internal methods.

Variable: Agent
Defines the interfaces to the File Stream Agent.
static const DataStrAgent Agent_File = { 
  Close_File, GetPos_File, SetPos_File,
  0, // Optional method: GetLen
  Read_File, Write_File,
  0, // Optional method: OnErrorChange
};

4.1.3. Creating a Stream

The File Agent provides the following function to create a File Stream and attach it to an open stdio file.

Function Definition: fileStr_open
Create a new instance of a File Stream and attach it to the open file, fp.
Returns a pointer to the newly created Stream, or zero if an error occurred. The Stream's seek pointer is positioned at the same offset as the seek pointer for fp
DataStr *fileStr_open(FILE *fp)
{       
  FileStr *str;
 
 Fail if the file is invalid.
if(!fp) { return 0; }
 
 Allocate a new FileStr.
str = malloc(sizeof(*str)); if(!str) { return 0; }
 
 Initialize the generic portion.
strInit(&str->gen, &Agent_File);
 
 Attach it to the file and return the generic pointer.
str->fp = fp; return &str->gen; }

4.2. The Application

Now that we have a Stream Agent, we are ready to implement the application. The sample program will decode a CPC input file and re-encode each input page to produce a CPC output file. Since CPC is a lossy compression algorithm, the resultant output will (potentially) differ from the input.

Note: (Regenerative CPC codings typically stabilize after 2-4 generations. Once the coding has stabilized, subsequent generations are bitwise identical.)

4.2.1. Opening the Decoder

First, we implement a function to create a CpcDecoder that uses a File Stream as its input.

Function Definition: OpenDecoder
Returns a pointer to a CpcDecoder which reads its raw CPC data from the file named, name, or zero if the CpcDecoder can not be created.
static CpcDecoder *OpenDecoder(char const *name)
{
 
 Open the source file and attach it to a File Stream. If we are unable to open the Stream, give up.
CpcDecoder *cpc; DataStr *str = fileStr_open(fopen(name, "rb")); if(!str) { fprintf(stderr, "Unable to open <%s>\n", name); return 0; }
 
 Check for a CPC signature. (This is not really necessary, since the CpcDecoder would detect the error.)
if(!cpcDec_checkSignature(str)) { fprintf(stderr, "<%s> does not contain CPC data\n", name); strClose(str, 1
/*closeInput*/); return 0; }
 
 Create the CPC decoder, using the File Stream as the data source.
cpc = cpcDec_createFromStream(str, 1
/*sequential*/); if(!cpc) { fprintf(stderr, "Unable to create CPC decoder\n"); strClose(str, 1/*closeInput*/); return 0; } return cpc; }

4.2.2. Opening the Encoder

Next, we implement a function to create a CpcEncoder that uses a File Stream as its output.

Function Definition: OpenEncoder
Returns a pointer to a CpcEncoder that writes its raw CPC data to the file named name, or zero if the CpcEncoder can not be created.
static CpcEncoder *OpenEncoder(char const *name)
{
 
 Create the output file and attach it to a File Stream. If we are unable to open the Stream, give up.
CpcEncoder *cpc; DataStr *str = fileStr_open(fopen(name, "wb")); if(!str) { fprintf(stderr, "Unable to create <%s>\n", name); return 0; }
 
 Create the CPC encoder, using the File Stream as the data sink.
cpc = cpcEnc_createFromStream(str, 1
/*CPC-Progressive*/); if(!cpc) { fprintf(stderr, "Unable to create CPC encoder\n"); strClose(str, 1/*closeOutput*/); return 0; } return cpc; }

4.2.3. Entry Point

Finally, we implement the entry-point for the application, main. The application accepts two command-line parameters. The first specifies the name of the CPC input file. The second specifies the name of the CPC output file. The application decompresses each page of the input file, and writes it to the output file.

Function Definition: main
The entry point for the application.
int main(int argc, char **argv)
{
  CpcEncoder *encoder; CpcDecoder *decoder; 
  char const *err; unsigned long i;
 
 There must be exactly two arguments.
if(argc != 3) { fprintf(stderr, "Usage: %s <inCpcFile> <outCpcFile>\n", argv[0]); return -1; }
 
 Open the encoder and decoder. If either fails, give up.
decoder = OpenDecoder(argv[1]); encoder = OpenEncoder(argv[2]); if(!decoder || !encoder) { return -1; }
 
 Iterate over the pages in the input document.
for(i=0; i<cpcDec_getPageCount(decoder); i++) {
   
 Retrieve the page from the decoder.
ImBitMap *ibm = cpcDec_getPage(decoder, i); if(!ibm) { fprintf(stderr, "Get Page %ld failed (%s)\n", i+1, cpcDec_getError(decoder)); return -1; }
   
 Add the page to the encoder.
cpcEnc_addPage(encoder, ibm);
   
 Destroy the page.
ibm_destroy(ibm); }
 
 Destroy the decoder, checking for errors.
err = cpcDec_destroy(decoder, 1
/*close stream*/); if(err) { fprintf(stderr, "Cpc decoder error: <%s>\n", err); return -1; }
 
 Destroy the encoder, checking for errors.
err = cpcEnc_destroy(encoder, 1
/*close stream*/); if(err) { fprintf(stderr, "Cpc encoder error: <%s>\n", err); return -1; } return 0; }


Index



THE FINE PRINT (regarding copyrights and trademarks)

Cartesian Products, Inc.
cpi@cartesianinc.com