Re: [Orekit Developers] Changing conditions to a tolerance

On Thu, Apr 27, 2017 at 2:03 PM, Ward, Evan <Evan.Ward@nrl.navy.mil> wrote:

Hi Hank,

On Wed, 2017-04-26 at 20:28 -0400, Hank Grabowski wrote:

During some V&V tests I was running when I tried to get the tolerance of a computation down to millisecond or very tight levels I sometimes ran into a condition where the root finder in the propagators went a bit haywire (it says it doesn't have roots in the interval).

Thanks for finding this bug and investigating it. I thought I had shaken out most of the bugs from the event detection code so I really value any new ones you find. Do you have a test case that you can share? I definitely want to add a test case so we make sure the bug stays fixed. Does this issue just occur with the analytical propagators, or with the NumericalPropagator as well? If it is just the analytic propagators then it is probably a precisions issue related to AbsoluteDate containing ~ quad precision but differences between dates only containing double precision.

Many of the existing test cases in CloseEvents*Test use a event finding tolerance of 1e-6, and some use 1e-18, so the event finding algorithm should work with an arbitrarily small tolerance, even down to adjacent AbsoluteDates. Breaking at the millisecond level is definitely a bug in the algorithm.

The check() statements in EventState are to verify certain pre-conditions for the following code. I found it useful because there are some complex interactions between EventState and AbstractAnalyticalPropagator. Similar to the assert statements in the Java Collections, they're there to catch bugs in the algorithm before they break user's code and could be ignored if the algorithm is working correctly. These checks would also fail if the user supplies a g function that is not well behaved. The invariant that g(s) = g(s) for all s must be true if the event detection algorithm has not processed a RESET_STATE or RESET_DERIVATIVES between the calls to g(s). I.e. same inputs results in the same outputs.

A way we were able to get past this for testing purposes were changes as follows to orekit/propagation/events/EventState.java:

Changed assert comparison on line 266

OLD check(ga == 0.0 || gb == 0.0 || (ga > 0.0 && gb < 0.0) || (ga < 0.0 && gb > 0.0));

NEW check(equalsWithinTolerance(ga, 0.0) || equalsWithinTolerance(gb, 0.0) || (ga > 0.0 && gb < 0.0) || (ga < 0.0 && gb > 0.0));

This checks that there is a root in the interval [a, b]. If this check fails then the following algorithm, which assumes there is a root in [a, b], may not behave correctly. Adding a tolerance on that check would invite the situation where, for example g(a) = 1 and g(b) = 1e-20 to pass, even though there is no guarantee of a root in [a, b] as it could just be a close approach.

Changed assert comparison on line 386

OLD check(increasing == gb >= ga);

NEW check(increasing == gb >= ga || (equalsWithinTolerance(ga, 0.0) && equalsWithinTolerance(gb, 0.0)));

This checks that the increasing flag is set correctly. Failing this means that, for example two increasing events were found in a row, which is a bug in the algorithm, or possibly due to the check on line 266 failing.

Changed assert comparison on line 493

OLD check(g0 == 0.0 || gPositive == (g0 > 0.0));

NEW check(equalsWithinTolerance(g0, 0.0) || gPositive == (g0 > 0.0));

Fails for the same reason as the last check, two increasing or deceasing events in a row.

In general comparing a double to exactly zero is obviously bad but I wasn't sure if this was done because even if it was within machine epsilon error we'd want these conditions to be evaluated as false. If you'd like me to make the change to EventState to accommodate this we could make the tolerance a configurable parameter with a default so that users could set it all the way to machine epsilon if they want (perhaps that's the default value) but expand it if they wanted to via something like a programmatic override.

I agree that equality comparisons with floats are generally bad, but if you look closely at the boolean _expression_ these are all really >= or <= comparisons. For example g0 == 0.0 || gPositive == (g0 > 0.0) is a short way of writing (gPositive == true && g0 >=0) || (gPositive == false && g0 <= 0).

Thoughts?

It appears that loosening the checks allows the algorithm to find two increasing events or decreasing events in a row. So we should try to figure out why the algorithm is trying to do that because that will lead to the root cause.

Best Regards,

Evan

Hank