Next Previous Contents

7. Making an IR Interface for Your Database with YAZ

7.1 Introduction

NOTE: If you aren't into documentation, a good way to learn how the backend interface works is to look at the backend.h file. Then, look at the small dummy-server in server/ztest.c. Finally, you can have a look at the seshigh.c file, which is where most of the logic of the frontend server is located. The backend.h file also makes a good reference, once you've chewed your way through the prose of this file.

If you have a database system that you would like to make available by means of Z39.50/SR, YAZ basically offers your two options. You can use the APIs provided by the ASN, ODR, and COMSTACK modules to create and decode PDUs, and exchange them with a client. Using this low-level interface gives you access to all fields and options of the protocol, and you can construct your server as close to your existing database as you like. It is also a fairly involved process, requiring you to set up an event-handling mechanism, protocol state machine, etc. To simplify server implementation, we have implemented a compact and simple, but reasonably full-functioned server-frontend that will handle most of the protocol mechanics, while leaving you to concentrate on your database interface.

NOTE: The backend interface was designed in anticipation of a specific integration task, while still attempting to achieve some degree of generality. We realise fully that there are points where the interface can be improved significantly. If you have specific functions or parameters that you think could be useful, send us a mail (or better, sign on to the mailing list referred to in the toplevel README file). We will try to fit good suggestions into future releases, to the extent that it can be done without requiring too many structural changes in existing applications.

7.2 The Database Frontend

We refer to this software as a generic database frontend. Your database system is the backend database, and the interface between the two is called the backend API. The backend API consists of a small number of function prototypes and structure definitions. You are required to provide the main() routine for the server (which can be quite simple), as well as functions to match each of the prototypes. The interface functions that you write can use any mechanism you like to communicate with your database system: You might link the whole thing together with your database application and access it by function calls; you might use IPC to talk to a database server somewhere; or you might link with third-party software that handles the communication for you (like a commercial database client library). At any rate, the functions will perform the tasks of:

(more functions will be added in time to support as much of Z39.50-1995 as possible).

Because the model where pipes or sockets are used to access the backend database is a fairly common one, we have added a mechanism that allows this communication to take place asynchronously. In this mode, the frontend server doesn't have to block while the backend database is processing a request, but can wait for additional PDUs from the client.

7.3 The Backend API

The headers files that you need to use the interface are in the include/ directory. They are called statserv.h and backend.h. They will include other files from the include directory, so you'll probably want to use the -I option of your compiler to tell it where to find the files. When you run make in the toplevel YAZ directory, everything you need to create your server is put the lib/libyaz.a library. If you want OSI as well, you'll also need to link in the libmosi.a library from the xtimosi distribution (see the mosi.txt file), a well as the lib/librfc.a library (to provide OSI transport over RFC1006/TCP).

7.4 Your main() Routine

As mentioned, your main() routine can be quite brief. If you want to initialize global parameters, or read global configuration tables, this is the place to do it. At the end of the routine, you should call the function

int statserv_main(int argc, char **argv);

Statserv_main will establish listening sockets according to the parameters given. When connection requests are received, the event handler will typically fork() to handle the new request. If you do use global variables, you should be aware, then, that these cannot be shared between associations, unless you explicitly disallow forking by command line parameters (we advise against this for any purposes except debugging, as a crash or hang in the server process will affect all users currently signed on to the server).

The server provides a mechanism for controlling some of its behavior without using command-line options. The function

statserv_options_block *statserv_getcontrol(void);

Will return a pointer to a struct statserv_options_block describing the current default settings of the server. The structure contains these elements:

int dynamic

A boolean value, which determines whether the server will fork on each incoming request (TRUE), or not (FALSE). Default is TRUE.

int loglevel

Set this by ORing the constants defined in include/log.h.

char logfile[ODR_MAXNAME+1]

File for diagnostic output ("": stderr).

char apdufile[ODR_MAXNAME+1]

Name of file for logging incoming and outgoing APDUs ("": don't log APDUs, "-": stderr).

char default_listen[1024]

Same form as the command-line specification of listener address. "": no default listener address. Default is to listen at "tcp:@:9999". You can only specify one default listener address in this fashion.

enum oid_proto default_proto;

Either PROTO_SR or PROTO_Z3950. Default is PROTO_Z39_50.

int idle_timeout;

Maximum session idletime, in minutes. Zero indicates no (infinite) timeout. Default is 120 minutes.

int maxrecordsize;

Maximum permissible record (message) size. Default is 1Mb. This amount of memory will only be allocated if a client requests a very large amount of records in one operation (or a big record). Set it to a lower number if you are worried about resource consumption on your host system.

char configname[ODR_MAXNAME+1]

Passed to the backend when a new connection is received.

char setuid[ODR_MAXNAME+1]

Set user id to the user specified, after binding the listener addresses.

The pointer returned by statserv_getcontrol points to a static area. You are allowed to change the contents of the structure, but the changes will not take effect before you call

void statserv_setcontrol(statserv_options_block *block);

Note that you should generally update this structure before calling statserv_main().

7.5 The Backend Functions

For each service of the protocol, the backend interface declares one or two functions. You are required to provide implementations of the functions representing the services that you wish to implement.

bend_initresult *bend_init(bend_initrequest *r);

This function is called once for each new connection request, after a new process has been forked, and an initRequest has been received from the client. The parameter and result structures are defined as

typedef struct bend_initrequest
{
    char *configname;
} bend_initrequest;

typedef struct bend_initresult
{
    int errcode;       /* 0==OK */
    char *errstring;   /* system error string or NULL */
    void *handle;      /* private handle to the backend module */
} bend_initresult;

The configname of bend_initrequest is currently always set to "default-config". We haven't had use for putting anything special in the initrequest yet, but something might go there if the need arises (account/password info would be obvious).

In general, the server frontend expects that the bend_*result pointer that you return is valid at least until the next call to a bend_* function. This applies to all of the functions described herein. The parameter structure passed to you in the call belongs to the server frontend, and you should not make assumptions about its contents after the current function call has completed. In other words, if you want to retain any of the contents of a request structure, you should copy them.

The errcode should be zero if the initialization of the backend went well. Any other value will be interpreted as an error. The errstring isn't used in the current version, but one option would be to stick it in the initResponse as a VisibleString. The handle is the most important parameter. It should be set to some value that uniquely identifies the current session to the backend implementation. It is used by the frontend server in any future calls to a backend function. The typical use is to set it to point to a dynamically allocated state structure that is private to your backend module.

bend_searchresult *bend_search(void *handle, bend_searchrequest *r,
                               int *fd);
bend_searchresult *bend_searchresponse(void *handle);

typedef struct bend_searchrequest
{
    char *setname;       /* name to give to this set */
    int replace_set;     /* replace set, if it already exists */
    int num_bases;       /* number of databases in list */
    char **basenames;    /* databases to search */
    Z_Query *query;      /* query structure */
} bend_searchrequest;

typedef struct bend_searchresult
{
    int hits;            /* number of hits */
    int errcode;         /* 0==OK */
    char *errstring;     /* system error string or NULL */
} bend_searchresult;

The first thing to notice about the search request interface (as well as all of the following requests), is that it consists of two separate functions. The idea is to provide a simple facility for asynchronous communication with the backend server. When a searchrequest comes in, the server frontend will fill out the bend_searchrequest tructure, and call the bend_search function. The fd argument will point to an integer variable. If you are able to do asynchronous I/O with your database server, you should set *fd to the file descriptor you use for the communication, and return a null pointer. The server frontend will then select() on the *fd, and will call bend_searchresult when it sees that data is available. If you don't support asynchronous I/O, you should return a pointer to the bend_searchresult immediately, and leave *fd untouched. This construction is common to all of the bend_ functions (except bend_init). Note that you can choose to support this facility in none, any, or all of the bend_ functions, and you can respond differently on each request at run-time. The server frontend will adapt accordingly.

The bend_searchrequest is a fairly close approximation of a protocol searchRequest PDU. The setname is the resultSetName from the protocol. You are required to establish a mapping between the set name and whatever your backend database likes to use. Similarly, the replace_set is a boolean value corresponding to the resultSetIndicator field in the protocol. Num_bases/basenames is a length of/array of character pointers to the database names provided by the client. The query is the full query structure as defined in the protocol ASN.1 specification. It can be either of the possible query types, and it's up to you to determine if you can handle the provided query type. Rather than reproduce the C interface here, we'll refer you to the structure definitions in the file include/proto.h. If you want to look at the attributeSetId OID of the RPN query, you can either match it against your own internal tables, or you can use the oid_getentbyoid function provided by YAZ.

The result structure contains a number of hits, and an errcode/errstring pair. If an error occurs during the search, or if you're unhappy with the request, you should set the errcode to a value from the BIB-1 diagnostic set. The value will then be returned to the user in a nonsurrogate diagnostic record in the response. The errstring, if provided, will go in the addinfo field. Look at the protocol definition for the defined error codes, and the suggested uses of the addinfo field.

bend_fetchresult *bend_fetch(void *handle, bend_fetchrequest *r,
                             int *fd);
bend_fetchresult *bend_fetchresponse(void *handle);

typedef struct bend_fetchrequest
{
    char *setname;       /* set name */
    int number;          /* record number */
    oid_value format;
} bend_fetchrequest;

typedef struct bend_fetchresult
{
    char *basename;      /* name of database that provided record */
    int len;             /* length of record */
    char *record;        /* record */
    int last_in_set;     /* is it?  */
    oid_value format;
    int errcode;         /* 0==success */
    char *errstring;     /* system error string or NULL */
} bend_fetchresult;

NOTE: The bend_fetchresponse() function is not yet supported in this version of the software. Your implementation of bend_fetch() should always return a pointer to a bend_fetchresult.

The frontend server calls bend_fetch when it needs database records to fulfill a searchRequest or a presentRequest. The setname is simply the name of the result set that holds the reference to the desired record. The number is the offset into the set (with 1 being the first record in the set). The format field is the record format requested by the client (See section Object Identifiers). The value VAL_NONE indicates that the client did not request a specific format. The stream argument is an ODR stream which should be used for allocating space for structured data records. The stream will be reset when all records have been assembled, and the response package has been transmitted. For unstructured data, the backend is responsible for maintaining a static or dynamic buffer for the record between calls.

In the result structure, the basename is the name of the database that holds the record. Len is the length of the record returned, in bytes, and record is a pointer to the record. Last_in_set should be nonzero only if the record returned is the last one in the given result set. Errcode and errstring, if given, will currently be interpreted as a global error pertaining to the set, and will be returned in a nonSurrogateDiagnostic.

NOTE: This is silly. Add a flag to say which is which.

If the len field has the value -1, then record is assumed to point to a constructed data type. The format field will be used to determine which encoder should be used to serialize the data.

NOTE: If your backend generates structured records, it should use odr_malloc() on the provided stream for allocating data: This allows the frontend server to keep track of the record sizes.

The format field is mapped to an object identifier in the direct reference of the resulting EXTERNAL representation of the record.

NOTE: The current version of YAZ only supports the direct reference mode.

bend_deleteresult *bend_delete(void *handle, bend_deleterequest *r,
                               int *fd);
bend_deleteresult *bend_deleteresponse(void *handle);

typedef struct bend_deleterequest
{
    char *setname;
} bend_deleterequest;

typedef struct bend_deleteresult
{
    int errcode;         /* 0==success */
    char *errstring;     /* system error string or NULL */
} bend_deleteresult;

NOTE: The "delete" function is not yet supported in this version of the software.

NOTE: The delete set function definition is rather primitive, mostly because we have had no practical need for it as of yet. If someone wants to provide a full delete service, we'd be happy to add the extra parameters that are required. Are there clients out there that will actually delete sets they no longer need?

bend_scanresult *bend_scan(void *handle, bend_scanrequest *r,
    int *fd);
bend_scanresult *bend_scanresponse(void *handle);

typedef struct bend_scanrequest
{
    int num_bases;      /* number of elements in databaselist */
    char **basenames;   /* databases to search */
    Z_AttributesPlusTerm *term;
    int term_position;  /* desired index of term in result list */
    int num_entries;    /* number of entries requested */
} bend_scanrequest;

typedef struct bend_scanresult
{
    int num_entries;
    struct scan_entry
    {
        char *term;
        int occurrences;
    } *entries;
    int term_position;
    enum
    {
        BEND_SCAN_SUCCESS,
        BEND_SCAN_PARTIAL
    } status;
    int errcode;
    char *errstring;
} bend_scanresult;

NOTE: The bend_scanresponse() function is not yet supported in this version of the software. Your implementation of bend_scan() should always return a pointer to a bend_scanresult.

7.6 Application Invocation

The finished application has the following invocation syntax (by way of statserv_main()):

appname [-szSu -a apdufile -l logfile -v loglevel]
[listener ...]

The options are

-a

APDU file. Specify a file for dumping PDUs (for diagnostic purposes). The special name "-" sends output to stderr.

-S

Don't fork on connection requests. This is good for debugging, but not recommended for real operation: Although the server is asynchronous and non-blocking, it can be nice to keep a software malfunction (okay then, a crash) from affecting all current users.

-s

Use the SR protocol.

-z

Use the Z39.50 protocol (default). These two options complement eachother. You can use both multiple times on the same command line, between listener-specifications (see below). This way, you can set up the server to listen for connections in both protocols concurrently, on different local ports.

-l

The logfile.

-v

The log level. Use a comma-separated list of members of the set {fatal,debug,warn,log,all,none}.

-u

Set user ID. Sets the real UID of the server process to that of the given user. It's useful if you aren't comfortable with having the server run as root, but you need to start it as such to bind a privileged port.

-w

Working directory.

-i

Use this when running from the inetd server.

-t

Idle session timeout, in minutes.

-k

Maximum record size/message size, in kilobytes.

A listener specification consists of a transport mode followed by a colon (:) followed by a listener address. The transport mode is either osi or tcp.

For TCP, an address has the form

hostname | IP-number [: portnumber]

The port number defaults to 210 (standard Z39.50 port).

For osi, the address form is

[t-selector /] hostname | IP-number [: portnumber]

The transport selector is given as a string of hex digits (with an even number of digits). The default port number is 102 (RFC1006 port).

Examples

tcp:dranet.dra.com

osi:0402/dbserver.osiworld.com:3000

In both cases, the special hostname "@" is mapped to the address INADDR_ANY, which causes the server to listen on any local interface. To start the server listening on the registered ports for Z39.50 and SR over OSI/RFC1006, and to drop root privileges once the ports are bound, execute the server like this (from a root shell):

my-server -u daemon tcp:@ -s osi:@

You can replace daemon with another user, eg. your own account, or a dedicated IR server account. my-server should be the name of your server application. You can test the procedure with the ztest application.

7.7 Summary and Synopsis

#include <backend.h>

bend_initresult *bend_init(bend_initrequest *r);

bend_searchresult *bend_search(void *handle, bend_searchrequest *r,
                                 int *fd);

bend_searchresult *bend_searchresponse(void *handle);

bend_fetchresult *bend_fetch(void *handle, bend_fetchrequest *r,
                               int *fd);

bend_fetchresult *bend_fetchresponse(void *handle);

bend_scanresult *bend_scan(void *handle, bend_scanrequest *r, int *fd);

bend_scanresult *bend_scanresponse(void *handle);

bend_deleteresult *bend_delete(void *handle, bend_deleterequest *r,
                                  int *fd);

bend_deleteresult *bend_deleteresponse(void *handle);

void bend_close(void *handle);


Next Previous Contents