AFC - Abacus Formula Compiler for Java

AFC Developer’s Journal

These are oldish entries of my journal regarding the development of AFC. The newer entries are on the main journal page. Note that the former name of AFC was SEJ (Spreadsheet Engine for Java).

September 12, 2006

Code Coverage

I have added code coverage analysis to SEJ using cobertura. Overall results right now:

  • 84% line coverage
  • 93% branch coverage

Not bad, eh? Nevertheless, this already helped identify a few untested areas:

  • value conversions in bytecode type compilers
  • lots of error handling in the bytecode compiler
  • compilation of comparison in IF with inverse branch target (TestCompilerBranchingWhenTrue)
  • literal pattern chars in string pattern expressions
  • some sheet builder constructs are untested
  • no test of auto-mark-support of engine loader input stream

Some leaks are caused by JRE 1.4, which is not run during coverage analysis right now. Many others are due to untested error cases (throw).

August 30, 2006

Reference Tests

I have restructured the Excel function tests. They are no longer a single big sheet, but split into multiple sheets. The code which runs these tests has also seen some improvements:

  • BigDecimal tests now check for the proper scale of returned values.
  • Tests with dynamic inputs are now run with all possible permutations of bound and unbound input cells (which, alas, makes the running time of a full system test significant even on my fast machine).
  • The values of bound input cells are temporarily modified so the tests would detect if SEJ still uses the constant cell value instead of the bound input.
  • The test code is itself tested for whether it really tests the way it should.

In addition to this, the tests automatically emit HTML fragment files which document the test cases. They are then included in the reference, again making sure I cite proven results.

August 23, 2006

Constant Folding

I am implementing string-valued expressions. Pondering how to implement CONCATENATE and the & operator efficiently, I started thinking again about a still pending constant folder optimization:

  1 + 2 + x + 3  =  6 + x

Currently, I do (3 + x) + 3. Then again, what about this one:

  999..9 + x + 10  =  (999..9 + 10) + x

assuming that 999..9 is the highest number representable by the given numeric type (for example, the maximum long value). All x less than -9 will make this expression work. But folding prematurely to (999..9 + 10) + x will lead to an overflow in the folder. Meaning this type of optimization is only useful for freely scaled BigDecimal. Not your typical case.

How about the optimizer only handling consecutive runs of terms, as in:

  x + 1 + 2  =  x + 3

which currently is represented internally as (x + 1) + 2 and thus not optimized at all? Or:

  x + 1 - 2 + 3  =  x + 2

which I want to first transform into (x + 1 + (-2) + 3) and then fold. Is this safe? Let’s see:

  x + 99..9 + 10  =  x + (99..9 + 10)

again overflows in the folder, but x <= -10 would make it work. With double, I can construct pathological cases, too:

  x + 1e10 + 1e-10  =  x  + (1e10 + 1e-10)

The folder will reduce this (due to rounding) to x + 1e10, but for x = -1e10, the result of the original expression would be 1e-10. For the folded expression it is 0.

This got me wondering. And I found that the Java compiler does not, in fact, fold such expressions either. It does not even optimize things like:

  x + "a" + "b"  =  x + "ab" 

while it does optimize x + ("a" + "b") and "a" + "b" + x. This seems to imply that the Java folder is just as dumb as my folder currently is, following the normal parse tree structure which would render the latter as ("a" + "b") + x and the first, non-optimized one as (x + "a") + "b" which now contains no foldable term.

Given all this, I shall, for the moment, confine myself to handling only string concatenation smartly. There can be no overflow or rounding conditions there that would not occur anyway because concatenation never takes data away from the string.

Side Note

This means that things like:

  private static final String A = "A";
  private static final String B = "B";
  private final String fn( String x ) {
    return x + A + B;
  }

should always be coded as:

  private final String fn( String x ) {
    return x + (A + B);
  }

if fn is being used in a performance critical region.

August 17, 2006

I have fixed the JRE 1.4 tests and, at the same time, upgraded to the latest version of Retrotranslator. This allowed me to finally change the way I handle things like BigDecimal.precision(), which neither the JRE 1.4 nor Retrotranslator support. I now compile helper classes giving access to such functions and run Retrotranslator on them in unverified mode. Then, at runtime, I simply do not call them on the JRE 1.4 (by checking the system version). This approach was suggested by the author of Retrotranslator, who is very helpful.

August 15, 2006

Marcel and Dejan have found a problem with Abacus’s own custom class loader and the way SEJ instantiates its engines. I need to add a configuration option to SEJ so you can specify the parent class loader for the engines. The easiest way would be to add static member to SEJRuntime. However, I generally don’t like static config, because the decision of what is a singleton environment and what not should really reside with the application, not the library. So that leaves me with two options:

  • pass the parent class loader to the engine compilation and loading methods, or
  • make SEJRuntime and, consequently, SEJ instantiable classes instead of namespaces for a bunch of static methods.

Since I don’t like cluttering the compilation and loading methods with this parameter, and since there might be even more config values in the future, I attempted to take the second route. This meant an API change. But with only two external projects using SEJ so far, I decided to do it. Seems I was wrong, though.

I quickly realized that this would mean having to make all the top-level doer classes newly hold a reference to the SEJ instance that created them. This, however, would couple together the entire range of subsystems of SEJ – clearly a bad thing. So I shall revert to adding a parameter.

...

Did this. And discovered, to my horror, that all my ant builds never ran the tutorial tests. The include filter did not catch them. I have changed that now and found that some of my type conversions do not work with JRE 1.4. So I need to fix this before I can get on with new features.

July 11, 2006

Citing Excel

Picking up on the ideas from June 30, I added support for citing Excel spreadsheets to JCite. Now the SEJ documentation has example spreadsheets in it that are automatically tested and nicely formatted with every release. Cool.

July 6, 2006

Been busy implementing repeating sections. Read-only version is nearly finished. Just realized that engines with repeating sections will need to be resettable because the section collections are always cached.

June 30, 2006

Authoring Sheets

I have been thinking some more about the people who will have to author custom computations as spreadsheets. While SEJ must be able to work with almost any sheet, we should still strive to provide a more pleasurable experience to sheet authors than just basic Excel. Here are my current ideas:

  • Write an Excel add-in that makes managing names easier. Might be a task-pane that lists all names and translates mangled parametrized calls to proper syntax. (For example: valueFor.1994 becomes valueFor( 1994 ).)
  • Write an Excel add-in that reads an i/o definition generated from the respective Java interfaces for the sheet. Then allow the user to paste the proper names onto cells. Could also support validation of existing names (much like the binder later does, too).
  • Use color to distinguish input and output cells in the automatic, reflective binder.
  • Write a tool that generates end-user documentation for the defined i/o names given the respective Java interfaces.
  • Write a tool that can embed sample Excel sheets in HTML documentation (and maybe FO), complete with name annotations and formula display. This would also come in handy in the documentation of SEJ itself. In fact, what I need to document for my sample use-cases is exactly of the same sort as what applications will have to document for the specific computations they make customizable.

Feedback welcome!

Dynamic Parameters

I have started the use-case tests and tutorial on basic repeating sections. The first example is computing a customer rating given the totals of the orders for the last 3 months. With the current version of SEJ, the best we can do is something like:

public OrderData[] ordersForLastNDays( int _days );

However, it would be more natural to have something like:

public OrderData[] ordersByDate( Date _from, Date _to );

This is not something we can bind to with SEJ today. And even if we could, we would have to bind the date arguments statically. But they need to be dynamic, computed relative to the current date. This could be done in Excel. So we would need a way to bind an input call so that its parameters are bound to other cells. For example:

   A               B
1  Today           =Date()
2  StartDate       =... (formula that computes the start of the 3 month period)
3  EndDate         =... (formula that computes the end of the 3 month period)
4
5  Order Date      Order Total
6  1.1.2000        1000.00
7  1.2.2000        1300.00

We would want to bind the range A6:B7 as a repeating section to ordersByDate( StartDate, EndDate ). For example, by naming it, in Excel, ordersByDate.__StartDate.__EndDate.

This is quite feasible. But something for a future version of SEJ. And I am not sure your typical Excel user could handle the complexity.

June 28, 2006

I am currently focusing on things needed by the two first Abacus projects that will use SEJ. It’s great to finally have real projects use SEJ. Generates lots of feedback regarding usability of the API and functionality.

Estimates

I was a bit disappointed that my initial estimate of the project was so hopelessly wrong – thank God Claudio had a much better idea of how long it would take. On closer inspection, though, I believe I can somewhat defend my initial guess. I did, after all, change requirements for the project in mid-flight more than once, and quite fundamentally (think byte-code, think sections). Given that I am practically the only one on the change board right now, such things happen a little informally and, thus, are sometimes hard to discern for what they are. Nevertheless, the laurels for the most realistic estimate for the big picture of the project go to Claudio. Thanks, man!

API Redesign

As I said, I much redesigned the API for release 0.5.0. Why was that? It really was a culmination of various separate nagging doubts and ideas that I’d been harboring for a while (and which Igor reinforced):

Standardization

  • Establish and document a set of best practices on how to use SEJ. This includes using cell and range names, reflection for binding cells to methods, and warning users when something looks fishy (like unbound named cells).
  • Encourage people to follow to the best practices established by said use-case by way of making it the simplest to do. This might help to attain a certain unformity of uses of SEJ, which, in turn, will make it easier to change particulars of the lower-level API in the future.

API Improvements

  • Introduce a straightforward API for the standard use-case, following said best practices. People don’t want to see complexity when they don’t need it.
  • Add support for application-defined computation factories to reduce the application’s dependence on SEJ.
  • Offer the lower-level, more complex API to people needing the added control and power.
  • Make the lower-level API more flexible by separating phases (load sheet, bind cells, compile to internal model, compile to engine).
  • Expose the internal computation model as a separate way of specifying an engine. I have not done this yet, but prepared the API so that it would be easy to do so.

Defensible API

  • Enforce a strict separation between internal and public parts. Given Java’s insufficient visiblity control (no friends), I copied the approach taken by Sun and Eclipse by moving all internals into the sej.internal package.
  • Expose only interfaces to strictly control what features users can see.

What finally prompted me to do this redesign was when, in order to implement automatic type conversions, I needed the introductory example to SEJ. This example, of course, would have to be the standard use-case for SEJ, showcasing the best practices!

June 20, 2006

Release 0.5.0 is out the door. This has been a major API redesign, caused by the need for a simplified initial user experience with SEJ. The standard use-case is now just a few lines.

I shall have to take some time to write down all the thoughts that led to this redesign.

Even Older Entries

...can be found here.