mvmf: mfl built-in functions

mvmf: mfl built-in functions

This is a subsection of the mfl (mvmf language) document. It describes some of the built-in functions available in the language. Each application that incorporates MFL may also supply other built-in functions as well, so check the documentation for those applications.

Be sure to see the overview following the functions table as well as other introductory sections.

Functions list

Following is a list of functions, organized by category. Note that mvmf is in a development state and function names, syntax, and operations are all in a potential state of flux.
$admin and $control -- operational controls
These relate to operational modes and permissions.

$cdb and $cdbu_ -- access to 'cdb' database files.
These provide quick access to cdb-format keyed data files.

$cusp and $cuspu_ -- interface to external service programs
These provide an interface to commonly used service programs.

$dns -- some DNS access functions.
These provide some trivial access to DNS information.

$env -- about the execution environment.
These functions provide some interfaces to the application environment.

$mfl -- functions relating to MFL code and control.
These are strictly related to MFL code execution.

$msg -- message-related.
These have some relation to the open mail message, in general.

$msgpart -- relating to message parts.
These have to do with the structure of the currently open message.

$msns -- message store namespace-related.
These are related to namespaces. Namespaces define how folder names are to be interpreted.

$msst -- message store specific type -related.
These are related to MSSTs (Message Store Specific Type).

$str -- String operations
Functions that do things with strings.

Unclassified
General functions that don't belong to one of the classes above.

 


Overview

A built-in-function is an mfl function that's implemented as part of the basic language (i.e. in C code).

Naming conventions: built-in function names begin with a dollarsign character '$' (all variable and other names beginning with '$' should be reserved for mvmf development/implementation use). In general, the dollarsign is followed by a word that indicates the category the function belongs to, then an underscore, and other name parts. We try to use a hierarchical naming convention with words separated by underscores, each successive word indicating a class.

   $class_subclass_subclass2_etc()
But that's just a guideline. Unclassified functions may follow no naming convention other than the '$' one.

An interesting aspect of a built-in function is that they are invoked with the parse tree representing the actual function arguments, rather than the evaluated arguments. Each built-in function is responsible for interpreting that parse tree. Because of this, it's possible to provide the function '$_parse' which prints out a representation of the parse tree. Another side effect is that, though most built-in functions are described with typed arguments, the arguments really are typeless until they are interpreted by the function. Another side effect is that a built-in function has access to variable's lvalues even if the variable is only mentioned by name and not by address.

 


Controls

The $control_ and $admin_ functions allow setting of various operational controls and limits. Every control value has a name and a value type. The control value types are listed below, along with some of the standard controls (and their names). Each application may also have its own controls; see the specific application for any information about those.
Integer controls
These controls have numeric values, which also include true/false flags. Each integer control has a current value, a low and high limit for user mode setting, and a low and high absolute limit for admin control.

The standard integer controls include the following. Others may be defined by each specific application.

String controls
These are strings and specify some operating value, as described for each specific control name. Strings may be restricted to setting by the administrator.

The standard string controls include the following. Others may be defined by each specific application.

 


Data Types

MFL defines some data types used to communicate with some built-in functions. With a few noted exceptions, all mvmf-defined names (types, variables, etc) will begin with a dollarsign.


$CDB$ -- cdb file access

Some of the cdb access functions make use of a $CDB$ structure. This is a handle that MFL code can make use of for cdb file operations. There are no MFL-visible elements inside of this structure, so its definition is not important.


$CUSP$ -- cusp access

A cusp is a Commonly Used Service Program that the system administrator (or users with the correct control enabled) can make available to mfl. Some of the cusp functions make use of a $CUSP$ structure, which is a handle for interacting with the service program. There are as yet no useful MFL-visible elements in this structure, although there are likely to be at some point.


$HDR$ -- header information

The $HDR$ type is a structure with information about a header. Some $msgpart functions take and/or return a pointer to a $HDR$ type.
typedef struct $hdr$ {
    string      *h_nameP;	// Header name
    string	*h_tailP;	// "body" of header: the part after the name
}  $HDR$;

The h_nameP string will always contain the terminating character (usually a colon).

These elements point to substrings within the same string. The $str_union function can be used to get a pointer to the string containing both:

    $HDR$ *hdrP;
    string *strP;
    /* .. some code to set hdrP .. */
    strP = $str_union( *hdrP->h_nameP, *hdrP->h_tailP );
Or one could use string concatenation to get a copy:
    $HDR$ *hdrP;
    string str;
    /* .. */
    str = *hdrP->h_nameP + *hdrP->h_tailP;

 


$admin_ and $control_ functions: relating to operating modes


int $control_int_get( string name )

Returns the current value of the specified integer control.

name is the name of the control; the associated value is returned.

Noisily returns -1 if the control name is invalid or if there is any other problem.


$admin_int_set( string name, int value )

$control_int_set( string name, int value )

Sets an integer control to a new value, within the bounds allowed. $control_int_set sets the value on behalf of the end user (the normal case); the new value must lie between the limits allowed for that value for setting by an end user. $admin_int_set sets the value on behalf of the administrator; the new value must lie between the limits allowed for that value for setting by the administrator. The admin function is only permitted to execute if administrator mode is set, which is itself a control value.

The function returns true (nonzero) if it succeeds, and false (zero) if it fails.

    $admin_int_set( "pipe_allow", 1 );
    $admin_int_set( "admin", 0 );


$admin_int_setlim( string name, int low, int high )

$control_int_setlim( string name, int low, int high )

Sets new user limits for the named integer control value. The end user (that is, with the $control_int_setlim function) can only change the limits to be the same as, or more restrictive than, the current limits. The admin call, which will succeed only if admin mode is set, can only change the user limits to be within the absolute bounds predefined for that value.
    $admin_int_setlim( "pipe_allow", 0, 1 );
    $control_int_set( "pipe_allow", 1 );

 


string $control_string_get( string name )

Returns the current value of the specified string control.

name is the name of the control; the associated value is returned.

Noisily returns an empty string if the control name is invalid or if there is any other problem.


$admin_string_set( string name, string value )

$control_string_set( string name, string value )

Sets a string control to a new value. $control_string_set sets the value on behalf of the end user (the normal case); the named control must be settable by an end user. $admin_string_set sets the value on behalf of the administrator. The admin function is only permitted to execute if administrator mode is set, which is itself a control value.

The function returns true (nonzero) if it succeeds, and false (zero) if it fails.

    $control_string_set( "log_private_name", "~/mvmda.log" );

 


$cdb_ and $cdbu_ functions: access to cdb files

A 'cdb' file is a "constant database" in a format defined by D. J. Bernstein. It allows for for rapid access to indexed data, and these function provide access to those indexed files. The mvmf package includes a cdbgen utility to create a cdb file from a text file.

At this writing, the cdb files being accessed must lie in the data directory that is associated with the program you are running. The data directory for mvmda, for example, is ".mvmda" in the user's home directory, but this default may be overwritten by a control setting. (See the $control_ functions.)

Some of the functions make use of a $CDB$ structure, which the mfl application will define.

Some of the functions provide a "unified," all in one open, access, and close operation. These functions do not require the maintenance of a $CDB$ handle. Their names begin with "$cdbu_" .


$cdb_close( $CDB$ *cdbP )

Closes a previously opened cdb file. cdbP is a pointer to a $CDB$ type, as previously returned by a cdb opening function (e.g., $cdb_open).

No value is returned.


string *$cdb_get_string( $CDB$ *cdbP, string key )

Looks up a key in a previously opened cdb file. cdbP is a pointer to a $CDB$ type, as previously returned by a cdb opening function. key is a string; the key to look up in the cdb file.

Returns a string pointer (i.e., a (string *) ) to the value associated with the key in the cdb file. If the key is not found in the cdb file, then a NULL pointer is returned.

    $CDB$ *cdbP;
    string *valP;

    cdbP = $cdb_open( "from-folders.cdb" );
    valP = $cdb_get_string( cdbP, "mem@mvmf.org" );
    $cdb_close( cdbP );
    if ( valP != 0 ) {
	sieve { fileinto [ *valP ]; stop; }
    }


$CDB *$cdb_open( string cfname )

Opens a cdb file for later access. cfname is a string, and is the name of the cdb file to open.

Returns a pointer to a $CDB$ type. This pointer must be used when calling other $cdb_ functions that require a handle for an open cdb file.


string *$cdbu_get_string( string cfname, string key )

Provides a unified cdb open, lookup, and close function. cfname is a string, and is the name of the cdb file to open. key is a string, and is the key to lookup in the cdb file.

Returns a string pointer (i.e., a (string *) ) to the value associated with the key in the cdb file. If the key is not found in the cdb file, then a NULL pointer is returned.

    string *valP;

    valP = $cdbu_get_string( "from-folders.cdb", "mem@mvmf.org" );
    if ( valP != 0 ) {
	sieve { fileinto [ *valP ]; stop; }
    }

 


$cusp_ and $cuspu_ functions: access to external utilities

A CUSP is a commonly used service program that can be invoked within an MFL script. Definitions for CUSPs can be established by the system administrator or by users who have the appropriate control enabled.

NB: The CUSP interface is an experimental work in progress; these functions provide some of the building blocks and/or high-level accesses. As the experiment progresses, the interfaces may change.

Some of the functions make use of a $CUSP$ structure, which the mfl application will define.

Some of the functions provide a "unified," all in one open, access, and close operation. These functions do not require the maintenance of a $CUSP$ handle. Their names begin with "$cuspu_" . (None have as yet been defined, but look soon.)


int $cusp_close( $CUSP$ *cuspP )

Closes a previously opened CUSP interface. cuspP is a pointer to a $CUSP$ type, as previously returned by a cusp-invoking function (e.g., $cusp_open).

The return value is the exit code of the CUSP, or one of the special internal status codes defined by the mvmf interface. These latter include:


int $cusp_define( string name, string path )

Make a CUSP known to the MFL program.

name is the name that the CUSP will be known by in future references; path is the absolute path to the executable that provides the external service.

This function may only be used when MFL is in admin mode, or if the user has the appropriate control enabled. This allows the admininstrator to either establish restricted CUSP access and/or to delegate such access to trusted users (or all users).


$CUSP$ *$cusp_open( string name )

Access (runs) a CUSP by its name. The name must have been previously defined e.g. via $cusp_define().

Returns a pointer to a $CUSP$ handle that may be used in further interactions with the utility.


string *$cusp_read_line( $CUSP$ *cuspP )

Reads and returns a line of text from the CUSP's stdout.

cuspP is a handle for the CUSP, e.g. as returned by cusp_open().

If no text is available, a NULL pointer is returned.


int $cusp_write_end( $CUSP$ *cuspP )

At a low level, CUSP interaction is accomplished by sending data to the service program in one or more parts, and at some point using this function to tell the application that the data has ended. This function flushes all pending output and closes the output channel to the CUSP. The CUSP must be prepared for its stdin to be closed in this way.

cuspP is the handle for the open CUSP.

This function returns a TRUE (non-zero) result if it completed successfully, otherwise it returns 0.


int cusp_write_message( $CUSP$ *cuspP )

Writes the currently open mail message to the CUSP. All headers and body parts (except any MIME part flagged as deleted) will be written to the service program.

cuspP is the handle for the open CUSP.

This function returns a TRUE (non-zero) result if it completed successfully, otherwise it returns 0.

This example shows the use of several of the underlying $cusp functions.

    int i;
    string *sP;			// scratch string pointer.
    $CUSP$ *cP;			// CUSP pointer

    // Define the "clamdif" interface to clamav/clamd daemon
    //  (note: clamdif is provided with mvmf)
    $cusp_define( "clamdif", "/usr/local/share/mvmf/cusp/clamdif" );

    // A q&d function to pass a message through the clamdif CUSP
    int clamcheck() {
       if ( ( cP = $cusp_open( "clamdif" ) ) == NULL )
           return -1;
       $cusp_write_message( cP );
       $cusp_write_end( cP );
       sP = $cusp_read_line( cP );
       i = $cusp_close( cP );

       return ( i );
    };

    /* One might use the clamcheck function thusly */
    i = clamcheck();
    sieve { addheader "X-CLAMAV-Exitcode" [(string)i]; }
    if ( ( sP != NULL ) && ( *sP != "" ) ) {
        sieve { addheader "X-CLAMAV" [*sP]; }
        sieve { fileinto :copy "Spam/clam"; }
    }


int $cusp_write_string( $CUSP$ *cuspP, string str )

Writes a string to the CUSP.

cuspP is the handle for the open CUSP.

This function returns a TRUE (non-zero) result if it completed successfully, otherwise it returns 0.

Note that all output to CUSPs is buffered. The service program may not see this string until it is flushed (e.g. via cusp_write_end() ).


string $cuspu_message( string name )

A "unified" CUSP function that opens a CUSP, sends the current message to it, and reads and returns a result. This is essentially a synthesis of $cusp_open, $cusp_write_message, $cusp_write_end, and $cusp_close. This provides less discrete control over the CUSP operation than using other functions separately, but may be more convenient.

name is the name of a CUSP. The name must have been previously defined e.g. via $cusp_define().

The normal result is a string that contains the CUSP exit code, a space, and the first line (if any) of any CUSP output to its stdout. If there is a failure to execute the CUSP, the return is a blank string.

This example shows how this function might be used with the clamdif CUSP.

    string s;			// scratch string variable
    string *ecP;		// String pointer for exitcode
    string *rsP;		// String pointer for result string

    // Define the "clamdif" interface to clamav/clamd daemon
    //  (note: clamdif is provided with mvmf)
    $cusp_define( "clamdif", "/usr/local/share/mvmf/cusp/clamdif" );

    // Pass this message through clamdif.
    s = $cuspu_message( "clamdif" );

    // Let's say that s now contains "0 Worm.Gibe.F".  The next two lines
    //  will make ecP point to "0" and rsP point to "Worm.Gibe.F".  Note
    //  that if s is "", both pointers will be to ""
    ecP = $str_find_token_delimited( s, "", " " );
    rsP = $str_find_token_delimited( s, "", " " );

An alternative to fetching tokens is to simply use string matching, e.g.:
    int exitcode;
    string result;

    // s is still "0 Worm.Gibe.F"
    if ( s =? "* *" ) {		// Anything with a space
        exitcode = (int)$str_match(1);
	result = $str_match(2);
    }

 


$dns_ functions: DNS access

These are some functions providing some fairly simple DNS access. These functions are not meant to implement generic resolver functionality, but are intended to contribute to the goal of processing mail. For example, the $dns_query function will follow at least one CNAME indirection, where as one would not expect a generic resolver function to do this.


string *$dns_query( string name, string qtype )

Queries the DNS, returns result as a string pointer. If the underlying result of the query is a CNAME, that CNAME pointer will be followed and the result for its target will be returned instead.

name is the name to query, e.g. "www.mvmf.org"; type is the query type, e.g. "mx" . Only a limited number of query types are supported.

The returned string may contain more than one piece of information; in this case, the string is formatted for (hopefully) easy dissection. The results for each query type are listed here.

 

Query Type Returned String Comments
A n.n.n.n free-form string with 4 decimal values, e.g. 127.0.0.1
MX nnn sssss 3-digit MX priority, a space, and the domain name target.
PTR sssss domain name target string as found in the DNS
TXT sssss literal string as found in the DNS


string *$dns_result()

Returns the last query result, again.


string *$dns_result_next()

Returns the next result from the last DNS query. A query for an "A" record might, for example, return multiple records. The initial query will return the first record, and subsequent calls to dns_result_next will return the others.

 


$env_ functions: about the execution environment.

These functions provide some interfaces between the application and the environment that it finds itself in.


string $env_cwd()

Return the path to the current working directory. An empty string is returned if there is a problem getting the working directory path.


int $env_cwd_set( string path )

Sets the current working directory.

path is the path to the new directory to change into.

Returns 1 if successful, 0 otherwise.

    string cwd;

    if ( $env_cwd_set( "/tmp" ) )
	cwd = $env_cwd();		// should be "/tmp"


string $env_homedir()

Return the path to the home directory. An empty string is returned if there is a problem getting the home directory path.

    // look up something in a .cdb file from my homedir
    string *valP;

    valP = $cdbu_get_string( $env_homedir() + "/file.cdb", "key1" );


string $env_variable( string name )

Returns the value of an environment variable.

name is the name of the environment variable.

Returns the environment variable's value, or a blank string if there is no such environment variable. If you need to see if the variable exists, you can use $env_variable_assign instead.


$env_variable_assign( result, envname )

NB: name changed from $_getenv

Provides access to an environment variable.

result is the name of a variable which will accept the environment variable contents. name is the environment variable wanted.

Returns non-zero if the environment variable is present, 0 if not. This differs from traditional getenv().

    string user;
    if ( ! $env_variable_assign( user, "USER" ) )
	user = "";

This was an old function that has been through several revisions in name and function. The $env_variable function is likely much more useful in almost all cases.

 


$mfl_ functions: MFL code related

These functions relate to the execution of MFL code and the MFL environment.


int $mfl_hookdir_add( string path )

Adds a directory path to the hooks directory search list.

path is the directory path to add.

An MFL hook is a specially-named function that an application will run at a certain point in its execution. The user or administrator can supply that function in order to influence the operation of the application at that point.

If the function isn't defined, MFL goes looking for it. It will look in each directory along the hooks search list for a file named after the hook (the filename will be the hook name without the $hook_ prefix. If the file is found, it is presumed to contain the hook function definition, and it is run as MFL code. If the hook still isn't defined, the search continues.

    string s;
    s = $env_variable( "MVMF_LIBDIR" );
    $mfl_hookdir_add( s + "/mvmda/hooks" );

or

    $mfl_hookdir_add( $env_variable( "MVMF_LIBDIR" ) + "/mvmda/hooks" );

Directories are searched in the order that they have been supplied.

The application may or may not initialize one or more standard directories to look in: check the application documentation.


int $mfl_incdir_add( string type, string path )

Adds a directory path to the list of directories that MFL looks in for include files. When you use a @include directive in an MFL script, the application looks for the specified file in a list of directories. This function allows you to add a directory to that list.

type is the type of include file, either "user" or "system". There are two sorts of include file searches as indicated by the syntax. If the target filename is enclosed in anglebrackets, the application looks for the file in the search list designated as system-wide. If the target filename is encloded in doublequotes, the application looks for the file first relative to the current directory, and then in each directory in the system list.

path is the path for the directory to add to the list.

Returns 0 if there was some failure, non-zero otherwise.

    $mfl_incdir_add( "system", "/usr/local/scripts" );
    @include 	// search will now include "/usr/local/scripts/x.mfl"

    $mfl_incdir_add( "user", "/tmp" );
    @include "x.mfl"	// search will now include "/tmp/x.mfl"

Directories are searched in the order that they have been supplied.

The application may or may not initialize one or more standard system include directories to look in: check the application documentation.


$mfl_parse( mfl-expr )

NB: name changed from $_parse

Shows the parse tree corresponding to the MFL expression passed to it.

mfl-expr is any MFL expression fragment. A comma-separated list of values or expressions is, for example, an expression fragment, as is a compound statement enclosed in curly braces.

This function exists mainly to test the implementation of built-in functions. It's conceivably useful in interactive mode, where it effectively duplicates the ":parse" command.

    mvmda> $mfl_parse( (string)33 + 2 );
      2. PN_INTEGER: 2
    1. PN_OPR: "+"  prio=12
        3. PN_INTEGER: 33
      2. PN_CAST:string
Turn the output 90 degrees clockwise to get an idea of the tree structure.


$mfl_protect_admin( mfl-code )

Executes mfl-code in non-admin mode while protecting the current admin state.

mfl-expr is any MFL expression fragment. A comma-separated list of values or expressions is, for example, an expression fragment, as is a compound statement enclosed in curly braces.

This is a special function that saves the value of the "admin" integer control, while temporarily setting that value to 0 (non-admin) and running the passed mfl code. The "admin" control is a special control that enables or disables some application facilities. It's sometimes useful for MFL code to revoke its admin state (while, for example, invoking user-supplied MFL code) and regaining that state later.

Returns the value of the executed mfl-code.

Note again that the mfl-code argument is an expression fragment. This is often enough, as in, for example:

    $mfl_protect_admin( $mfl_run_fname( "some-user-file", "" ) );

But if you need to run more complex code, you can include it in a compound statement, e.g.:

    $mfl_protect_admin( {
	int argnum;
	for ( argnum = $ArgN; argnum < $ArgC; ++argnum )
	    $mfl_run_fname( $ArgV[ argnum ], "Ow" );
    } );

If you find this kind of calling sequence to $mfl_protect_admin confusing: remember that arguments to built-in functions aren't evaluated during the function call. Each built-in function is handed the results of a parse, not the results of execution, and is responsible for interpreting that parse result however it wants.


int $mfl_run_fname( string fname, string modes )

Opens the named file (fname), and runs the MFL code that's inside of it.

modes is a string of characters. Within this string, an uppercase character indicates a category, and any lowercase letters immediately following it apply to that category. The 'O' category applies to file opening, with following lowercase letters meaning:

If none of those mode characters are given, none of those things will be done. Other characters in the modes string will be interpreted by the mfl runner, in the same way as for $mfl_run_string.

Returns -1 if the file could not be run (no access to file), 1 if the file is run and returns an OK status, or 0 otherwise. Note that a 0 might be returned simply if the file ended with a "break;" or "sieve { stop; }", so the important return at this time is the -1 value.

    /* Run "one.mfl" and don't complain or abort if it can't be done */
    $mfl_run_fname( "one.mfl", "" );

    /* Run "two.mfl" and give a warning and stop if it can't be done. */
    $mfl_run_fname( "two.mfl", "Oew" );

    /* Run "xxx.sv" starting in sieve mode, warning if not available. */
    $mfl_run_fname( "xxx.sv", "MsOw" );


int $mfl_run_string( string mflstr, string modes )

Executes the MFL code from the passed string. mflstr is a string containing MFL code to parse and execute.

modes is a string of characters. Within this string, an uppercase character indicates a category, and any lowercase letters immediately following it apply to that category. The 'M' category applies to the running of MFL code, with following lowercase letters meaning:

If neither the 'c' and 's' M mode characters are given, the parsing will be dictated by the setting of the mfl_mode_sieve control (0 for C-like, 1 for Sieve).

Returns 0 if there was some failure, non-zero otherwise.

Let's say you have a CDB file 'froms.cdb' that has a number of sender addresses (on the From line) that you know you want to keep. The data associated with each sender might optionally be some special MFL code to handle mail for that sender. The input file 'froms' might look like:

    mem@mvmf.org:
    mem@geezer.org:
    mem@example.com:sieve { discard; }
Some code to consult that file might look like:
    string *dataP;

    // Get address from From header
    if ( sieve { address :matches "From" "*" } ) {

	// Look up address in cdb file
	if ( ( dataP = $cdbu_get_string( "froms.cdb", $str_match(1) ) ) != 0 ) {

	    /* If blank data, just keep the message, otherwise execute the
	       data as MFL code.
	    */
	    if ( *dataP == "" )
		sieve { keep; }
	    else
		$mfl_run_string( *dataP, "Mc" );

	    sieve { stop; }
	}
    }


$mfl_trace( int flag )

Enables or disables tracing of MFL script execution.

flag is passed as 0 to disable tracing, and non-zero to enable it. This function simply enables or disables tracing on a global basis: specific types of things to trace must also be enabled via control strings. The global enable/disable of tracing is provided to speed up normal execution when tracing is disabled, as the constant checking of the individual tracing control values would add overhead.

Trace messages go into the private logfile.

You could use several techniques to trace an MFL script. Here's one that you might use for mvmda under a UNIX-like shell environment.

  1. Save a message into a plain file (say, "testmsg"), and edit out any extra fluff that might have been added during delivery.

  2. Create a startfile for mvmda that looks something like this:
    // Where I want private log messages
    $control_string_set( "log_private_name", "~/mfl-trace.log" );
    
    // Enable/disable tracing
    $mfl_trace( 1 );
    
    // Enable specific traces
    $control_int_set( "mfl_trace_bif", 1 );
    $control_int_set( "mfl_trace_functions", 1 );
    

  3. Tell mvmda to use this file as its startfile, e.g. by setting the MVMDA_RC environment variable to point to it:
    $ MVMDA_RC=`pwd`/tracerc.mfl
    $ export MVMDA_RC
    

  4. Invoke mvmda in report-only mode, telling it the name of the script you wish to trace, and feeding on its stdin the mail message you saved:
    $ mvmda -n myscript.mfl < testmsg
    

  5. Look in the private log file (the one that you specified in the startfile) for trace lines.

 


$msg_ functions: relating to the open mail message


string $msg_address( string s )

Extracts the next email address from a string.

s is the string that may contain one or more addresses, in the form that might be found in an RFC2822 address-list header field body. The address is extracted from the passed string's current access point (e.g. as returned by $str_bx). If the access point is at a peculiar place, this function might return peculiar results.

Returns the next email address as a string if present, or an empty string if not present.

Let's say you have an email message that contains this header field:

To: "mem@mv.com, Mark Mallett <mem@geezer.org>"

You might extract addresses thusly:

    string field_body;
    string addr;

    // One could walk the headers with $msgpart_hdr_xxx functions,
    //  but this will work on the single To: field
    if ( sieve { header :matches "To" "*" } ) {
        field_body = $str_match(1);

        while ( (addr = $msg_address( field_body ) ) != "" ) {
            /* do stuff here...
                 the first time, addr will have "mem@mv.com"
                 the second time, addr will have "mem@geezer.org"
            */
            ;
        }
    }
It's a toss-up whether this should have been a $str_ function or a $msg_ function.


string $msg_envelope( string thing )

Accesses envelope information if available.

thing is the envelope information wanted, either "sender" or "from" for the sender, or "recipient" or "to" for the recipient . Note that this information can only be accessed if it's provided by the MTA in use, and if the application is invoked in a way that it knows how to access it.

Returns the envelope name as a string if present, or an empty string if not present.

    string rcpt;
    if ( ( rcpt = $msg_envelope( "sender" ) ) == "" )
	rcpt = "";


string $msg_open_fname( string fname, long bodycache )

Opens a file as a message for processing.

fname is the name of a file that contains the message. If this is a blank string (""), stdin will be opened. bodycache is the maximum number of bytes of message body to cache in memory, more or less. This is somewhat fuzzy since some portion of each MIME part will also have some amount of data cached, regardless of whether the maximum has been used up by other parts. The value will be silently constrained to lie between a minimum and maximum size specified when the application is built. Passing a value of zero will use a default (normally the configured minimum).

Returns 0 on failure, non-zero on success.

    if ( ! $msg_envelope( "", 0 ) )
	sieve { stop; }


int $msg_reject( int rejectonly )

Flags a reject/bounce request for the current message.

rejectonly is a flag: if it's non-zero (true), then the only kind of rejection that is wanted is an SMTP-time reject. Otherwise, if the flag is zero (false), this means that a non-SMTP-time bounce may be issued.

This is intended to be used inside of a delivery hook, so that the mvmf user can have some control over whether a message is returned due to an error. (mvmda does not (yet!) operate in an SMTP-time environment, but is expected to at some point.)

Note: Generating bounces post-SMTP-time is a very problematic thing to do. Such bounces will often go to a wrong address, usually of some innocent third party. Take extreme care when requesting a bounce at other than SMTP time, make sure you vet them properly if you do them at all.

Note also: this only flags a request for rejection; this rejection happens when the processing of the control script is done, and bounces are only done subject to further anti-bounce logic.

A sample $hook_delivery_error hook might employ this function:

    int $hook_delivery_error( string dtype, string err ) {
        if ( ( dtype == "quota" ) || ( dtype == "hard" ) ) {
            // non-SMTP bounce is OK to people that I know personally-- they
	    //  can come after me if they want.
            if ( sieve { address "Return-Path" "bill@example.com" } ) {
                $msg_reject( 0 );
		return 1;
            }
        }
	// Otherwise reject only at SMTP time.
	$msg_reject( 1 );
	return 1;
    };


$msg_rip_add( string ip )

Adds to the list of "responsible IP addresses." There's a list of responsible IP addresses associated with the open mail message. Each is the IP address of a system that is believed to have been in some way responsible for sending the message. This list is used by default in some operations (like the sieve "dnsbl" statement) or can conceivably be accessed for other purposes.

Initially, this list is empty. An mfl script can add an IP address to the list. This function is used, for example, in an mfl function hook that's called after the message has been opened and scanned.


int $msg_rip_n()

Returns the number of "responsible IP addresses" that have been collected.


string $msg_rip_nth( int ripN )

Returns the nth "responsible IP address" that has been collected. If the number is out of bounds, or there is no message open, an empty string is returned.

 


$msgpart_ functions: relating to the message structure

These are provided, perhaps on an interim basis, as a way to have some script access to MIME message parts, before more elaborate script elements or functions are invented and provided. I say "interim" but they (at least the $msgpart_go_xxx functions) will probably stay around.


string *$msgpart_cur_body()

Returns a pointer to the cached body of the current message part. In mvmf terms, the body is the portion of a MIME part (or top level of the message) which is not the message part's header and which is not a child MIME part of the current message part. A message part that has a content type of "multipart" will have MIME children, but there may also be a non-MIME body which is any free text that appears before the first MIME boundary (mailers will often use this to convey something outside of the multipart children; for example some helpful plain text saying "this is a multipart attachment that should be viewed with a MIME-capable reader" (or some arrogant plain text saying "get a capable mail program").

Body text can be very large and sometimes may be partly stored on disk. mvmf always caches some portion of each MIME part's body in memory, but not all of it is guaranteed to be available. Most scripts will not want to look at massive amounts of text anyway; however, be prepared for this to return a pointer to a lot of data.

Note well that this returns a pointer to the cached body text -- not a copy.

Let's say you want to check the first line of the current MIME part to see if its first 9 bytes matches some known bad base64-encoded content:


    string *sP, *s1P;
    string s;
    @ifndef NULL
    @ define NULL 0
    @endif /* NULL */

    if ( sieve { allof (
	    not header :matches "content-type" "*multipart/*",
	    header :contains "Content-Transfer-Encoding", "base64" ) } ) {
	sP = $msgpart_cur_body();	// Access the body
	$str_bx_set( *sP, 0 );		// and start at the beginning.

	// Skip to and get pointer to the first non-blank line.
        while ( ( s1P = $str_find_token_delimited( *sP, "", "\n", "s" ) ) != NULL ) {
	    if ( *s1P != "" ) {
		// Test the first 9 bytes against some known strings.
		s = *$str_sub( *s1P, 0, 9 );
		if ( ( s == "TVqQAAMAA" ) ||
		     ( s == "TVpQAAIAA" ) ||
		     ( s == "UEsDBBQAA" ) ) {
		    sieve {
			fileinto "bad-attachment";
			stop;
		    }
		}
	    }
	}
    }
One could replace the explicit tests with a cdb file lookup:
    // Test the first 9 bytes against some known strings.
    if ( $cdbu_get_string( "badsigs.cdb", *$str_sub( *s1P, 0, 9 ) ) ) {
	sieve {
	    fileinto "bad-attachment";
	    stop;
	}
    }
or have some more elaborate tests against variable-length initial substrings, e.g. by having the cdb file lookup return more bytes that also have to match.

Note: example patterns above taken from Russell Nelson's 'viruscan' patch for qmail.


$msgpart_cur_delete()

Flags the currently selected message part for deletion. Parts marked for deletion will not be written when the message is stored or forwarded, nor will any of the message part's MIME children.


string *$msgpart_cur_epilog()

Returns a pointer to the cached part of the MIME epilog of the current message part, if there is any epilog. The epilog is any bytes that follow the final MIME boundary in any multipart message part. This function is just like msgpart_cur_body, except that it returns the epilog rather than the body/prolog.

Most message parts will not have any epilog; where one is present, it may be one or a few blank lines, or it may be the result of a broken MIME structure of the message. (Or, gasp, it may indicate a MIME parsing bug in mvmf!). Few (if any) MUAs let you do anything with the epilog; an epilog might contain hash-busting text as part of spam.


long $msgpart_size_body( childrenF )

Returns the number of bytes in the current messagepart's body.

childrenF specifies whether to count the message part's children as well.

When childrenF is false (zero), the size returned is that of the data directly associated with the message part only. This data consists of the body/prolog and any epilog part -- i.e., anything outside of any MIME children.

When childrenF is true (non-zero), the size returned includes the directly associated data (the body/prolog and epilog) plus the total size of any child MIME parts if the current message part is a multipart type. The child totals include both their header size and their body size (and that of their children).

In any case, the returned size may not match exactly the size of the message when it is stored. The size may vary due to differences in line ending sequences or in the way child headers are folded when stored.


long $msgpart_size_hdr()

Returns the number of bytes in the current messagepart's header.

The returned size represents the size of the headers in memory (including one byte each for a line terminator), and may be different from the size that would be on disk or on the wire. The size may vary due to differences in line ending sequences or in the way headers are folded when emitted.


$msgpart_cur_undelete()

Removes any deletion flag for the currently selected message part.


$msgpart_go_child()

Selects the first mime child of the currently selected message part as the new currently selected part. Returns TRUE if successful, and FALSE if not (e.g. no child of this part).


$msgpart_go_next()

Selects the next sibling of the currently selected message part as the new currently selected part. Returns TRUE if successful, and FALSE if not (e.g. no next sibling, or current part is not a child of a multipart parent).


$msgpart_go_parent()

Selects the parent of the currently selected message part as the new currently selected part. Returns TRUE if successful, and FALSE if not (e.g. no parent of this part).


$msgpart_go_top()

Selects the top message part of the current message as the the new currently selected part. Returns TRUE if successful, and FALSE if not (e.g. no message is open).

Here's an example script fragment that will use msgpart selection functions to walk the MIME structure, and mark any image/* or text/html types for deletion:


int ws;					// flag for walking
string *ctP;				// ptr to content type header value

    ws = $msgpart_go_top();		// Start at the top
    while ( ws ) {
        if ( sieve { header :matches "content-type" "*" } ) {
            ctP = $str_match(1);	// Remember the match
            if ( ( *ctP =?^ "image/*" ) ||
                 ( *ctP =?^ "text/html*" ) ) {
                $msgpart_cur_delete();	// Flag a deletion
            }
        }

	/* Iterative step to next in messagepart tree.
           child first, sibling next, then parent's sibling.
	   Note we could have done this recursively, but why
	   bother..
        */
        if ( !$msgpart_go_child() )
            while ( ws && !$msgpart_go_next() )
                ws = $msgpart_go_parent();
    }


$HDR$ *$msgpart_hdr_first()

Returns a ($HDR$ *) pointer to the first header in the currently selected messagepart. (See $HDR$ type definition elsewhere.)

$HDR$ *hP;

    if ( ( hP = $msgpart_hdr_first() ) != 0 ) {
	// .. some code .. 
    }


$HDR$ *$msgpart_hdr_last()

Returns a ($HDR$ *) pointer to the last header in the currently selected messagepart. (See $HDR$ type definition elsewhere.)

$HDR$ *hP;

    if ( ( hP = $msgpart_hdr_last() ) != 0 ) {
	// .. some code .. 
    }


$HDR$ *$msgpart_hdr_next( $HDR$ *hP )

Given a ($HDR$ ) pointer to a header, as previously returned by a $msgpart_hdr function, returns a ($HDR$ *) pointer to the next header in the currently selected messagepart. (See $HDR$ type definition elsewhere.)

$HDR$ *hP;

    hP = $msgpart_hdr_first();
    if ( ( hP = $msgpart_hdr_next( hP ) ) != 0 ) {
	// .. some code .. 
    }


$HDR$ *$msgpart_hdr_prev( $HDR$ *hP )

Given a ($HDR$ ) pointer to a header, as previously returned by a $msgpart_hdr function, returns a ($HDR$ *) pointer to the previous header in the currently selected messagepart. (See $HDR$ type definition elsewhere.)

$HDR$ *hP;

    hP = $msgpart_hdr_first();
    if ( ( hP = $msgpart_hdr_prev( hP ) ) != 0 ) {
	// .. some code .. 
    }

 


$msns_ functions: dealing with namespaces

All folder references are done via namespaces. Namespaces are better documented elsewhere, but fundamentally a namespace identifies how a folder is handled, and has attributes that include:

Each namespace has a set of characteristics and attributes. Some namespace attributes are associated with its specific type, and others with its general type. You never actually make reference to the general type: only to the namespace prefix (and, when defining a namespace, to its specific type). It can help to know about the grouping of specific types according to their underlying general type, as the grouping will imply some things about the specific type. For example, all namespaces of a general "file" type have an associated path, which is the filesystem path to the root of the namespace. However, only a specific type "maildir" will have an attribute that controls how its IMAP keywords are stored. Thus you can (and must) set a path attribute for every "file" type namespace, but you can't set a keyword storage attribute on a "mbox" specific type namespace. (Clear as mud?)

It's expected that most namespaces will be predefined by the system administrator.


$msns_attr_int_set( string prefix, string attr, int value )

This function sets the value of an integer attribute that is associated with a namespace.

prefix is the prefix string that identifies the namespace.

attr is the name of the attribute to set, and value is the new value to give to that attribute.


$msns_attr_string_set( string prefix, string attr, string value )

This function sets the value of a string attribute that is associated with a namespace.

prefix is the prefix string that identifies the namespace.

attr is the name of the attribute to set, and value is the new value to give to that attribute.


$msns_define( string prefix, string stype )

Defines a new namespace. A namespace controls how a mailbox is accessed (e.g., when it is written to).

prefix is the prefix string that identifies the namespace.

stype is the specific message store type that, by default, a message will have when it is created within that namespace.

A namespace essentially provides a view into a message store area. When a message is stored, the tail part of the name is used by the namespace as a specific folder handle. (You can use a tailhook function to edit the tail before it's used by the namespace.)

You can define a namespace with an empty prefix: if defined, this will be the namespace of last resort. Namespace examples:

    $msns_define( "VARSPOOL/", "mbox" );
    $msns_attr_string_set( "VARSPOOL/", "path", "/var/spool" );
    $msns_attr_string_set( "VARSPOOL/",
    			       "must-match",
			       $env_variable( "USER" ) );

    $msns_define( "HOME/", "mbox" );
    $msns_attr_string_set( "HOME/", "path", $env_homedir() + "/Mail" );
    $msns_attr_string_set( "HOME/", "layout", "fs" );

    $msns_define( "/", "maildir" );
    $msns_attr_string_set( "/", "path", $env_homedir() + "/Maildir" );
    $msns_attr_string_set( "/", "layout", "maildir" );
A script could then reference a folder named "VARSPOOL/mem" to indicate the /var/spool/mem box; a folder named "HOME/work" to reference a ~/Mail/work folder, and a folder named "/something" to reference a subfolder "something" within the Maildir in their home directory.


string $msns_tailhook( string prefix, string hookname )

Associates a "tailhook" function with a namespace. This is a way to adjust the tail foldername before it is used in creating the folder access name (e.g. the filename).

prefix is the prefix associated with the namespace, and hookname is the name of an MFL function that you have written to edit the tail part of the foldername.

NB:This is kind of hackish, and is done this way because MFL does not yet have any sort of lambda notation. If it gains that capability, this function may be phased out or changed.

The MFL function used for the tailhook must exist before the namespace is used. It should be written to accept a string (the tail part of the folder after the namespace prefix is removed) and to return a string that is to be used instead of that original tail. Say, for example, that you always want your folders to be prefixed with the word "INBOX." . Write a tailhook function like:

    string inbox_tail( string tail ) {
	if ( tail !=? "INBOX.*" )
	    return ( (string)"INBOX." + tail );
	return ( tail );
    };
and then associate this function with your namespace:
    $msns_tailhook( "INBOX/", "inbox_tail" );

 


$msst_ functions: dealing with message store specific types

Any given message store belongs to a message store specific type, through which it is manipulated and parameterized. These functions provide access to MSSTs.

MSSTs are identified via their name: "mbox" and "maildir" are names of two of the specific types.


$msst_attr_int_set( string name, string attr, int value )

This function sets the value of an integer attribute that is associated with a specific messagestore type.

name is the name of the specific type.

attr is the name of the attribute to set, and value is the new value to give to that attribute.


$msst_attr_string_set( string name, string attr, string value )

This function sets the value of a string attribute that is associated with a specific messagestore type.

name is the name of the specific type.

attr is the name of the attribute to set, and value is the new value to give to that attribute.

 


$str_ functions: dealing with strings


int $str_bx( string str )

Returns the byte index associated with the passed string.

Every string has an associated byte index. This index is mostly used internally by the MFL processor, but it has some uses in the MFL function interface as well, e.g. to keep track of the point where the next piece of data will be extracted from or inserted into the string, or to provide some context information about an operation that was performed on the string.

The special byte index value of -1 means that the byte index is uninitialized, and will be interpreted according to context. In general the uninitialized value will refer to either the beginning or the end of the string, whichever makes sense.

This function returns the byte index for the passed string str.


int $str_bx_set( string str, int n )

Sets the byte index associated with a string (see $str_bx().)

str is the string to associate the new index with; n is the new byte index.

The special byte index value of -1 means that the byte index is uninitialized, and will be interpreted according to context. In general the uninitialized value will refer to either the beginning or the end of the string, whichever makes sense.

For values other than -1, the index is silently constrained to lie within the number of bytes in the string.

The return value is the previous byte index.


int $str_byte( string str, int n )

Returns the character value of the nth byte of a string.

    string ss = "hello there";
    int ch;
    ch = $str_byte(ss, 4 );
'ch' will contain the integer value of the character 'o'.


int $str_count_control( string str [, int max ] )

Counts the number of control characters in a string.

max, if given and positive, is an upper limit for the count; if the count reaches max, max is returned.

Returns non-zero if so; 0 if not.

Printable characters in the control range, such as space and tab, do not count as control characters for this test.


int $str_count_digit( string str [, int max ] )

Counts the number of ASCII digit characters in a string.

max, if given and positive, is an upper limit for the count; if the count reaches max, max is returned.

Returns non-zero if so; 0 if not.


int $str_count_highbit( string str [, int max ] )

Counts the number of bytes in a string that have the high bit (0x80) set.

max, if given and positive, is an upper limit for the count; if the count reaches max, max is returned.

Returns non-zero if so; 0 if not.


int $str_count_lower( string str [, int max ] )

Counts the number of lowercase ASCII characters in a string.

max, if given and positive, is an upper limit for the count; if the count reaches max, max is returned.

Returns non-zero if so; 0 if not.


int $str_count_upper( string str [, int max ] )

Counts the number of uppercase ASCII characters in a string.

max, if given and positive, is an upper limit for the count; if the count reaches max, max is returned.

Returns non-zero if so; 0 if not.


$str_debug( string str )

Prints some debugging information about a string and its related references.

    string s = "a string";
    string *sp = $str_sub( s, 4, 4 );

    $str_debug( *sp );
This will show some probably-useless information about the substring "ring" as well as the related string "a string" .


string *$str_find( string str, string target [, int nth] )

Finds an occurance of one string within another.

str is the string to search in, and target is the string being looked for. nth is an optional argument; if given, it specifies the count of times the target should be found (i.e., find the nth occurance). If nth is negative, the search begins from an end position and searches backwards (find the nth last occurance).

The search is always bound by the current byte position (see $str_bx et al) and whichever end of the string the search is moving towards. i.e. a positive search starts at the current byte position and ends at the end of the string; a negative search starts at the current byte position and ends at the beginning of the string. When the nth target string is successfully located, the base string str's byte position is updated to exclude the found string. On a positive search, this position is one past the target found, while on a negative search, it is at the beginning of the found string. This allows successive calls to $str_find to locate successive strings.

This function returns a pointer to the substring that matches the target. If no nth target was located, a NULL pointer is returned.

    string s = "now we are all here in one place";
    string *sp;

    sp = $str_find( s, "re", 2 );	// points to the "re" in "here"
    sp = $str_find( s, "one", -1 );	// null, since the previous $str_find
					//  set the bx at the "r" in "here"

    $str_bx_set( s, -1 );
    $str_find( s, "are", -1 );		// Now points to "are"

See also $str_findi for case-insensitive version.


int $str_find_byte( string str, int target [, int nth] )

Finds an occurance of a byte in a string.

str is the string to search in, and target is the byte being looked for. nth is an optional argument; if given, it specifies the count of times the target should be found (i.e., find the nth occurance). If nth is negative, the search begins from an end position and searches backwards (find the nth last occurance).

The search is always bound by the current byte position (see $str_bx et al) and whichever end of the string the search is moving towards. i.e. a positive search starts at the current byte position and ends at the end of the string; a negative search starts at the current byte position and ends at the beginning of the string. When the nth target string is successfully located, the base string str's byte position is updated to exclude the found byte. On a positive search, this position is one past the target found, while on a negative search, it is at the position of the found byte. This allows successive calls to $str_find_byte to locate successive bytes.

This function returns the offset of the found byte in the underlying string, with an offset of 0 being the first position. If no nth target was located, -1 is returned.

    int i;
    string s = "I'm a sushi zealot";
    string *sp;

    i = $str_find_byte( s, 'a' );	// returns 4, sets s's bx to 5

    $str_bx_set(s, -1);
    i = $str_find_byte( s, 'a', -2 );	// returns 4, sets s's bx to 4

    // The substring from a to z, inclusive
    sp = $str_sub( s, i, $str_find_byte( s, 'z' ) +1 - i);
    

See also $str_findi_byte for case-insensitive version.


string *$str_find_token_delimited( string str, string vsep, string isep [, string flags] )

Finds a token within a string, based on delimiters.

str is the string to find the token in. The token-finding starts at the current byte position associated with the string (see $str_bx() and friends).

The characters in vsep are visible separators. Each of these characters will separate tokens, but will also be returned as a token themselves.

The characters in isep specify invisible separators. Each of these characters will separate tokens, and will not be returned as a token themselves.

flags are an optional set of characters that affect the tokenizing. These are:

This function returns a string pointer to the substring within the source string. Note that the token is not extracted, it is merely located. When no more tokens are present, a NULL pointer is returned.

    string s = "one  'two' three/four";

    while ( $str_find_token_delimited( s, "/", " \t" ) != 0 )
         // Returns pointers to "one", "two", "three", "/", "four", 0

    while ( $str_find_token_delimited( s, "", " \t", "s" ) != 0 )
         // Returns pointers to "one", "", "two", "three/four", 0

    while ( $str_find_token_delimited( s, "", "/ \t", "q" ) != 0 )
         // Returns pointers to "one", "'two'", "three", "four", 0


string *$str_findi( string str, string target [, int nth] )

Finds an occurance of one string within another, ignoring case.

Functions just like $str_find except that case (uppercase vs lowercase) is ignored when matching the target.


int $str_findi_byte( string str, int target [, int nth] )

Finds an occurance of a byte in a string, ignoring case.

Functions just like $str_find_byte except that case (uppercase vs lowercase) is ignored when matching the target.


int $str_has_control( string str )

Tests whether a string contains any ASCII control characters.

Returns non-zero if so; 0 if not.

Printable characters in the control range, such as space and tab, do not count as control characters for this test.


int $str_has_digit( string str )

Tests whether a string contains any ASCII digit characters.

Returns non-zero if so; 0 if not.


int $str_has_highbit( string str )

Tests whether a string contains any bytes with the high bit (0x80) set.

Returns non-zero if so; 0 if not.


int $str_has_lower( string str )

Tests whether a string contains any lowercase ASCII characters.

Returns non-zero if so; 0 if not.


int $str_has_upper( string str )

Tests whether a string contains any uppercase ASCII characters.

Returns non-zero if so; 0 if not.


int $str_is_control( string str )

Tests whether a string consists entirely of ASCII control characters.

Returns non-zero if so; 0 if not.

Printable characters in the control range, such as space and tab, do not count as control characters for this test.


int $str_is_digit( string str )

Tests whether a string consists entirely of ASCII digit characters.

Returns non-zero if so; 0 if not.


int $str_is_highbit( string str )

Tests whether a string consists entirely of bytes with the high bit (0x80) set.

Returns non-zero if so; 0 if not.


int $str_is_lower( string str )

Tests whether a string consists entirely of lowercase ASCII characters.

Returns non-zero if so; 0 if not.


int $str_is_upper( string str )

Tests whether a string consists entirely of uppercase ASCII characters.

Returns non-zero if so; 0 if not.


int $str_length( string str )

Returns the number of bytes in the passed string.

    string ss = "one two three";
    string *sP;
    sP = $str_sub( ss, 0, 4 );   // sP now points to "one "
    sP += $str_length( *sP );    // sP now points to "two "


string $str_lower( string str )

make a lowercased version copy of a string.

    string s, sl;
    s = "Hello There";
    sl = $str_lower( s );

'sl' will contain the string "hello there" .

Note that this makes sense only for strings composed of ASCII characters.


string $str_match( int matchnum )

Returns the string corresponding to a matched [sub]string.

matchnum is the submatch number, or 0 for the entire match.

For regular expression matching, a submatch corresponds to a parenthesized regexp grouping. For glob-style wildcard matching, a submatch is the nth occurance of a sequence of wildcard characters.

    string domain, subdomain;
    if ( sieve { address :domain :matches "To" "*.example.com" } )
	subdomain = $str_match(1);
    else
	subdomain = "";
    domain = $str_match(0);
or:
    subdomain = sieve { address :domain :matches "To" "*.example.com" } ?
	$str_match(1) : "";
    domain = $str_match(0);

Submatches may also be produced by standard operators on string types. For example:

    string addr;
    string prefix;
    string domain;
    int n;

    addr = $msg_envelope( "recipient" );

    /* Use regexp comparison to test for prefix-nnn format localpart,
       e.g. mem-333@example.org
       Note: "=." is regexp comparison; "=?" styles may also be used
         for wildcard type matches and may be preferable in many cases.
    */
    if ( addr =. "\(.+\)-\([0-9]+\)@\(.*\)" ) {
        prefix = $str_match(1);    // "mem" in the given case
        n = $str_match(2);         // 333, converted to int
        domain = $str_match(3);    // "example.org"
    }


string $str_quote_regex( string str )

make a copy of a string, with special regular expression characters quoted.

str is the source string to quote.

The returned value will be a string with special regular expression characters quoted via a backslash. This is useful when you would like to use an unknown string (e.g. the result of some prior match) in a new regular expression comparison, such as sieve's ":regex" match type tests, or MFL's "=." -style operators. In order for this string to be a literal match with strings in the new target, any special characters need to be quoted.

Let's say, for example, that you are looking to see if the domain given in the "Return-path" header is the same as the right-hand part of the message-id (after the '@'). (This does not need regular expressions, but it's a simple case to show why quoting can be useful in more elaborate cases.) A first-pass approach might be:

    sieve {
    	if allof (
	    address :matches :domain "return-path" "*",
	    header :regex "message-id"
	    		   [ (string)"<.*@" + $str_match(1) + ">" ] ) {
		/* matches ... do something here */
	}
    }
However, a message containing:
    Return-Path: mem@a.b.example.com
    Message-Id:  junk1234@arb.example.com>
will pass this test, since the "a.b.example.com" result returned in the first match string will match, regular-expression-wise, the string "arb.example.com" in the message-id string. Using unquoted unknown strings in regular expression tests can also result in regular expression syntax errors. The above test can be improved with quoting functions:
    sieve {
    	if allof (
	    address :matches :domain "return-path" "*",
	    header :regex "message-id"
	    		   [ (string)"<.*@" +
			       $str_quote_regex( $str_match(1) ) + ">" ] ) {
		/* matches ... do something here */
	}
    }

For the example given above, the matched domain string "a.b.example.com" will be turned into something like "a\.b\.example\.com" which is suitable for use as a regular expression pattern.


string $str_quote_wild( string str )

make a copy of a string, with special wildcard matching characters quoted.

str is the source string to quote.

This is analogous to the str_quote_regex function, except that it prepares the quoted string for use in glob-style matching (e.g. sieve's ":matches" test or MFL's "=?" -style operators.


string *$str_sub( string s, int start, int len )

Creates a reference to a portion of a string. String s is the source string being referenced; start is the character position of the start of the substring, starting with 0; len is the number of characters in the substring. The following magic values may also be given for start and len:

 

which -1 -2 -3
start always anchored at the start of the base string the current start of the base string one character past the current end of the reference string
len always anchored at the end of the base string the current end of the base string one character before the beginning of the reference string

 

$str_sub returns a (string *) (i.e., a string pointer).

    string s = "An example";
    string *sp = $str_sub(s, 3, 2);	// returns pointer to "ex"

    $str_sub( *sp, -2, -3 );            // "An " -- the string prior to *sp
    $str_sub( *sp, -3, -2 );            // "ample" -- string after *sp

    *$str_sub( *sp, 1, 1 ) = "gg s";	// replace "x" with "gg s"
					// s is now "An egg sample"
Note, as shown above:


string *$str_supe( string s, int sd, int ed )

Return a string reference with adjusted bounds.

s is the source string; sd and ed specify deltas for the start and end positions. The return is a new reference string with adjusted boundaries. sd and ed are simply added to the start and end points of the source string.

Returns a pointer to the substring as adjusted.

    string s;
    string *s1P;

    s = "give me some money";
    s1P = $str_find( s, "me" );		// s1P points to "me"
    s1P = $str_supe( *s1P, -1, 1 );	// s1P now points to " me "

    // Find the string between the first two "m" characters, removing
    //   the "m"s at the ends:
    $str_bx_set( s, 0 );		// Start at the beginning
    s1P = $str_supe(
    	      *$str_union( *$str_find( s, "m" ), *$str_find( s, "m" ) ),
	      1, -1 );

    // s1P now points to "e so"

NB: $str_supe is probably a bad name, it's supposed to suggest "superstring". Got another?


string *$str_union( string s1, string s2 )

Find a substring that overlaps two related strings.

s1 and s2 are two strings that are references to the same underlying base string (e.g. as created by $str_sub). The result is a pointer to a substring that contains both related strings.

    string s;
    string *s1P, *s2P;
    s = "this mail is not very interesting";
    s1P = $str_sub( s, 5, 4 );		// ptr to "mail"
    s2P = $str_union( *s1P, $str_find( s, "y " ) );
	    // s2P now points to "mail is not very "

    s2P = $str_supe( *s2P, 0, -1 );	// "mail is not very"


string $str_upper( string str )

make an uppercased version copy of a string.

    string s, su;
    s = "Hello There";
    su = $str_upper( s );

'su' will contain the string "HELLO THERE" .

Note that this makes sense only for strings composed of ASCII characters.


Unclassified functions


string $ip_rev( string ip )

Reverses an IP address string.

ip is an IP address string. This function returns the IP address with its octets in the reverse order, e.g. for use in a DNSBL lookup. If the input is invalid or the operation can not be performed for some reason, an empty string is returned.