AFC - Abacus Formula Compiler for Java

Database Functions

I have started implementing the database functions like DSUM(). Here I intend to sketch out the cases SEJ is going to support, and how.

General Remarks

At least in the first implementation, I will not support constant folding of the D...() functions. Their arguments will, of course, be folded normally. This is because I don’t think we will, in practice, see any sheets with constant applications of these functions.

Base Data

In the following examples, I will always consider references to this sheet:

A B C D E F G H I J K
2 Test table Tree Height Yield Profit Date
3 Apple 18 14 105.00 8/2/05 14:15
4 Pear 12 10 96.00 8/2/05 14:16
5 Cherry 13 9 105.00 9/3/05
6 Apple 14 10 75.00 8:45
7 Pear 9 8 76.80 8:44
8 Apple 8 6 45.00 8:43
9
10 Criteria sets Tree Height Height Tree Height
11 Apple >10.0 <16 Nonexistent <1.4e6
12 Pear <1'000
13
14 Tree Height Height RefTree RefHeight RefHeight
15 Apple
=E15
>10
=">"&F15
<16
=CONCATENATE("<",G15)
Apple 10 16
16
17 Date Date Date RefDate Date Date RefDate Date Date RefDate
18 8/2/05 14:16 >=2.9.2005 >=38597
=">="&E18
9/2/05
(ValDate1)
>=2.8.05 14:16 >=38566.5944444444
=">="&H18
8/2/05 14:16
(ValDate2)
>=8:45 >=0.364583333333333
=">="&K18
8:45
(ValDate3)
19
20 Free Form
21 false
=AND(B3="Apple",C3>$F$15,C3<$G$15)

B3:E4 (Data1)
B11:B12 (Vals1)
E15:G15 (Vals2)

Simple Sum

Consider

DSUM(Table1;"Yield";Crit2)

selecting all Apple trees. I shall generate the methods (with names properly made unique):

boolean isMatch( String Bx, String B11 ) {
    return Bx.equals( B11 );
}
double computeDSUM() {
    String B11 = getB11();
    double r = 0.0;
    if (isMatch( getB3(), B11 )) r+= getE3();
    ...
    if (isMatch( getB8(), B11 )) r+= getE8();
    return r;
}

Now consider

DSUM(Table1;"Yield";Crit5)

which a different match function

boolean isMatch( String Bx, String B11, String B12 ) {
    return Bx.equals( B11 ) || Bx.equals( B12 );
}

and an analogous computeDSUM method.

Section Sum

If the table is or contains a dynamic range, the code becomes (assuming a dynamic section in B4:F7):

double computeDSUM() {
    String B11 = getB11();
    double r = 0.0;
    if (isMatch( getB3(), B11 )) r+= getE3();
    for (SectionObj e : getSection()) {
        if (isMatch( e.getB4(), B11 )) r+= e.getE4();
    }
    if (isMatch( getB8(), B11 )) r+= getE8();
    return r;
}

Conditional Fold

Consider

DSUM(Table1;"Yield";Crit2)

selecting all Apple trees. This could be rewritten to

_LET( b11: B11;
    _DFOLD( col: `col0 = `b11; r: 0; xi: `r + `xi; 3; `table ))

Where col: means that the columns in the range are accessible as col0, col1, ..., and the 3 indicates the column to be summed – this could also be an expression. Likewise,

DSUM(Table1;"Yield";Crit5)

would become

_LET( b11: B11;
_LET( b12: B12;
    _DFOLD( col: OR(`col0 = `b11, `col0 = `b12); r: 0; xi: `r + `xi; 3; `table )))

Now, _DFOLD() is the thing that gets compiled to the two helper functions, the matcher and the folder. The normal closure computation is useful for the matcher. We will also need _DREDUCE() for _DMIN() and _DMAX().

The rewrite rules given above will, of course, have to be implemented in plain Java, not through rewrite templates, as they involve some quite complex logic.

In the first version, I will not try to detect multiple references to the same cell in the criteria.

Comparisons

Consider

DSUM(Table1;D22;Crit6)

which has computed criteria. I will support this as long as the computation follows the pattern

CONCATENATE( "comparison", value )

where comparison must be one of =, <>, <, <=, >, >=. Given the possibility for free form criteria, one might question whether SEJ has to support this. I believe it is worthwhile, though, because it is a simple extension of the the base criteria support that will likely be more familiar to users than free form criteria.

Criteria computed by any other expression are assumed to mean equality, even if they return a string starting with one of the comparison operators. This is different from Excel and unfortunate, as it means you may get different results without any warning. One possibilty to catch this, at least for numbers, would be to analyze the type of the referenced cell. If it’s a string used for a numeric column, SEJ might reject it.

Missing Things

  • Wildcards with = (=?u*)
  • String searches are prefix searches
  • String searches are case-insensitive
  • Test free-form criteria with missing label
  • DSUM needs shaped arguments. However, SEJ currently does not support dynamic sections in shaped arguments.
  • Maybe rewrite to two ranges, one for all used criteria, one for the data. Might even rewrite to n ranges for all used cols.