AFC - Abacus Formula Compiler for Java

Adding A New Primitive Spreadsheet Function To AFC

As a simple, but probably typical example, we will look at how I added ABS() to the list of supported spreadsheet functions. So for the rest of this page, we will assume ABS() has not been implemented yet.

Rule 1:
Document first

Documentation, well, documents, but it also focuses our thinking. This is why we start with it – and, of course, so we don’t forget it. For larger, novel features, it can be quite elaborate. In our case, we may safely assume that people know why we would want ABS() to be supported, and that they will consult the Excel help file for information on it.

Release Notes

Nevertheless, we still need to document that ABS() is, in fact, supported. First, we announce the improvement in the release notes in src/doc/releasenotes.rextile. Just look at the file and you’ll see how (as per the house rule mimick existing code).

Rextile

AFC’s documentation is written in Textile, a very concise and text editor-friendly format for writing XHTML. To convert the Textile sources to XHTML, we use Rextile—a Ruby based tool. You will need to install it. Once it’s installed, you can simply run ant doc to regenerate the documentation in doc/ from the sources in src/doc/. (It is quicker to run ant -q doc-main-update if you just want see the updated release notes, and can live with some warnings and missing citations.)

Rule 2:
Cite tests

So far, we simply said: ABS() is supported. How do we know it is? How does the reader know? This is where test-driven development gets into the picture – with a twist. We prove what we say directly to our reader. The proof becomes an integral part of the documentation (which is, after all, what we are saying). So, to integrate the proof, we always give examples which are cited from automated tests.

Automated Formula Tests

AFC’s existing testing and citing infrastructure makes this easy, especially for adding a new spreadsheet function. The spreadsheet files in

components/system/src/test-reference/java/org/formulacompiler/tests/reference

contain formula tests. So, to prove we implemented ABS(), all we still need to do is add test cases to one of the test sheets.

For ABS(), we extend the sheet called NumericFunctions.xls. By citing it, I can show it right here (only the relevant subset is shown):

A B C D E F G H I J K L M N O P Q
1 Expected
=IF(Q1,"Expected","FAILED!")
Actual Inputs # of Inputs Name Highlight Excel says Skip for Custom check true
=AND(Q2:Q10000)
2 1 1
=ABS(C2)
-1 1 ABS ABS
3 0 0
=ABS(C3)
0 1
4 1 1
=ABS(C4)
1 1
5 2 2
=ABS(C5)
-2 1
6 2 2
=ABS(C6)
2 1
7 0 0
=ABS(C7)
false 1
8 1 1
=ABS(C8)
true 1
9 3 3
=ABS(C9)
-3 1
10 4 4
=ABS(C10)
-4 1

And here is the documentation produced from it. Note also how the list of supported functions in the reference index automatically includes ABS.

See here for details on exactly what the columns of this sheet mean.

Rule 3:
Mimick existing code

This rule is the monkey see, monkey do rule from the book on contributing to Eclipse. It means we look for an already implemented function closely matching ABS(), and then mimick its implementation and code style (aka copy/paste). We will mimick ROUND() for ABS(). In fact, we already did. When documenting ABS() in the list of supported functions, remember? This rule basically states something we would do anyway.

So, to add the test cases for ABS(), I already copied lines from ROUND() and adjusted them. You can see them in the sheet above. (Copying the lines made sure that the conditional formatting for input values was copied too.)

As explained in the topic on the reference test sheets, you also have to update the corresponding .ods and .yaml files. If you forget, running the tests will tell you.

Rule 4:
Implement only what the tests require

This is test-driven development. We implement ABS() by just adding missing code until the tests run. The first step is, then, to run the tests just added (which will fail, of course). Every formula test sheet corresponds to a test class in

components/system/src/test-reference/java/org/formulacompiler/tests/reference

For the sheet NumericFunctions.xls, we need to run the test class NumericFunctionsTest. Doing this now (ant -q test-ref) returns something like:

org.formulacompiler.spreadsheet.SpreadsheetException$UnsupportedExpression:
Unsupported function ABS encountered in expression ABS( <<? C2); error location indicated by <<?.
Cell containing expression is B2.
    at org.formulacompiler.spreadsheet.internal.CellWithLazilyParsedExpression.getExpression(CellWithLazilyParsedExpression.java:71)
...

Running these tests can take quite a while. See here for how to speed this up.

Extending The Parser

Unsupported function encountered means AFC’s Excel formula parser cannot parse something. So we must make it support ABS() as our first implementation step. The parser is generated from a special parser description language using JavaCC. Its input file is

components/compiler/src/build/javacc/ExpressionParser.jj

ROUND looks like this there:

|	"ROUND" fun2( Function.ROUND )

So we simply add the following line (we keep the list sorted alphabetically and ABS comes first, so it starts with { instead of |):

{	"ABS"	fun1( Function.ABS )

Look through the list there to see the different standard options for defining the argument syntax. In particular, fun1() is a unary function, fun2() a binary function, etc. For details on how JavaCC works, please refer to its grammar documentation.

We run ant -q build to regenerate the parser. Now, the Java symbol org.formulacompiler.compiler.Function.ABS is undefined. Extending this enumeration is straightforward. We do it, and the code compiles again.

Implementing The Function

Running the test again, we get:

org.formulacompiler.compiler.CompilerException$UnsupportedExpression:
Function ABS is not supported for double engines.
In expression  >> ABS( C2 ) << ; error location indicated by >>..<<.
Cell containing expression is B2.
Referenced by cell B2.
    at ...

This is from the heart of AFC, the byte code compiler. To implement ABS(), let’s look at how ROUND() is implemented. This is where I have to introduce you to one of the niftier parts of AFC: the JVM byte-code decompiler.

Template Methods

AFC is, like javac, a byte-code compiler for the JVM. But, luckily, you don’t have to know the JVM byte code instruction set to add primitive functions like ABS() to AFC. Instead, you write simple template methods in plain Java, like this:

public double fun_ROUND( double a, double b )
{
  return RuntimeDouble_v2.round( a, (int) b );
}

Any method you put into the class

components/compiler/src/impl/java/org/formulacompiler/compiler/internal/templates/ExpressionTemplatesForDoubles.java

that is called fun_XY is automatically decompiled by AFC’s build process to generate a byte-code compiler for expression nodes of the kind new ExpressionNodeForFunction( Function.XY ).

Expression Rewriting

There is another approach for functions that can be rewritten to expressions involving only other, more primitive expression functions. This is shown in the topic on adding high-level functions.

Runtime Support

As you can see, ROUND() is not implemented directly, but mapped to a runtime support function. The runtime is subclassed for each supported numeric type. It is also versioned so AFC can easily provide backwards compatibility with older compiled engines. Here is the implementation from org.formulacompiler.runtime.internal.RuntimeDouble_v2:

public static double round( final double _val, final int _maxFrac )
{
  final double shift = pow10( _maxFrac );
  if (0 > _val) {
    return Math.ceil( _val * shift - 0.5 ) / shift;
  }
  else {
    return Math.floor( _val * shift + 0.5 ) / shift;
  }
}

Numeric Types

This would be very easy to do for ABS() too. But ABS() is really so basic, we will implement it directly instead:

public double fun_ABS( double a )
{
  return Math.abs( a );
}

This covers double. We also need to handle BigDecimal and scaled long. This is done in the template class ...templates.ExpressionTemplatesForScaledLongs:

public long fun_ABS( long a )
{
  return (a < 0) ? -a : a;
}

and in ...templates.AbstractExpressionTemplatesForBigDecimals:

@ReturnsAdjustedValue
public BigDecimal fun_ABS( BigDecimal a )
{
  return a.abs();
}

The latter is OK if you don’t need to take the MathContext or scaling into account. If you do, as does the + operator, then you have to use ...templates.ExpressionTemplatesForPrecisionBigDecimals:

@ReturnsAdjustedValue
public BigDecimal op_PLUS( BigDecimal a, BigDecimal b )
{
  return a.add( b, this.mathContext );
}

and ...templates.ExpressionTemplatesForScaledBigDecimals:

@ReturnsAdjustedValue
public BigDecimal op_PLUS( BigDecimal a, BigDecimal b )
{
  return a.add( b );
}

The annotation ReturnsAdjustedValue means that the method result already has the proper scale. If you omit it, AFC will generate rescaling code after your method body.

Once all this is done, run ant -q build to rebuild the compiler code from the template methods.

Note
With Eclipse, it is crucial to use ant -q build to update the compiler code rather than running org.formulacompiler.compiler.internal.build.bytecode.PatternCompiler directly from within the IDE. This is because Eclipse’s compiler produces different byte-code sequences for the pattern methods than javac as invoked by AFC’s build script. While Eclipse’s version is also correct, it breaks the binary comparisons of generated engines with reference versions in some of the unit tests for AFC (see org.formulacompiler.tests.utils.AbstractTestBase.checkEngine().

Strings

To implement functions working on strings, we need to differentiate two cases.

Arguments

String arguments are straightforward:

public double fun_VALUE( String _text )
{
  return RuntimeDouble_v2.fun_VALUE( _text, this.environment, this.computationMode );
}

Returns

String return values are different. Firstly, you need to implement them in org.formulacompiler.compiler.internal.templates.ExpressionTemplatesForStrings (rather than the numeric templates). Note how you can access numeric arguments as int. AFC will handle the conversion for you automatically:

public String fun_MID( String s, int start, int len )
{
  return Runtime_v2.fun_MID( s, start, len );
}

Secondly, you have to teach AFC’s type annotator that the function is String-valued. You do this in org.formulacompiler.compiler.internal.model.analysis.TypeAnnotator:

private DataType typeOf( ExpressionNodeForFunction _expr ) throws CompilerException
{
  annotateArgs( _expr );
  switch (_expr.getFunction()) {
    // ...
    case CONCATENATE:
    case MID:
      // ...
      return DataType.STRING;

    case ERROR:
      return null; // must be provided by surrounding context, like a CASE statement

    default:
      return DataType.NUMERIC;
  }
}

Conclusion

That’s it. All the tests run green. Running ant -q doc updates the documentation and—lo and behold—ABS() is now included and fully documented.

Now read on to see that other approach using expression rewriting: adding high-level functions.