[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Orekit Developers] Changing conditions to a tolerance





On Thu, 2017-04-27 at 16:34 -0400, Hank Grabowski wrote:
I believe the problem was with numerically integrated propagators but I'd honestly have to dive into it.   Unfortunately I don't think I'll be able to get an isolated test case that I could add to the code, it was pretty deep in a test system.  Let me see what I can do on both questions though...

OK, it's also helpful to have a back trace, and a plot of the g function around the time of interest so we can understand what the algorithm is trying to do.

Best Regards,
Evan


I agree that this is a bit more than just checking a double for zero equality, which is why I was a bit hesitant to bring this forward.  I too am concerned about potential odd combinations by adding this tolerance but it was the mechanism that got all of the V&V tests working and passing.

On Thu, Apr 27, 2017 at 2:03 PM, Ward, Evan <Evan.Ward@nrl.navy.mil> wrote:
Hi Hank,

On Wed, 2017-04-26 at 20:28 -0400, Hank Grabowski wrote:
During some V&V tests I was running when I tried to get the tolerance of a computation down to millisecond or very tight levels I sometimes ran into a condition where the root finder in the propagators went a bit haywire (it says it doesn't have roots in the interval).

Thanks for finding this bug and investigating it. I thought I had shaken out most of the bugs from the event detection code so I really value any new ones you find. Do you have a test case that you can share? I definitely want to add a test case so we make sure the bug stays fixed. Does this issue just occur with the analytical propagators, or with the NumericalPropagator as well? If it is just the analytic propagators then it is probably a precisions issue related to AbsoluteDate containing ~ quad precision but differences between dates only containing double precision.

Many of the existing test cases in CloseEvents*Test use a event finding tolerance of 1e-6, and some use 1e-18, so the event finding algorithm should work with an arbitrarily small tolerance, even down to adjacent AbsoluteDates. Breaking at the millisecond level is definitely a bug in the algorithm.

The check() statements in EventState are to verify certain pre-conditions for the following code. I found it useful because there are some complex interactions between EventState and AbstractAnalyticalPropagator. Similar to the assert statements in the Java Collections, they're there to catch bugs in the algorithm before they break user's code and could be ignored if the algorithm is working correctly. These checks would also fail if the user supplies a g function that is not well behaved. The invariant that g(s) = g(s) for all s must be true if the event detection algorithm has not processed a RESET_STATE or RESET_DERIVATIVES between the calls to g(s). I.e. same inputs results in the same outputs.

  A way we were able to get past this for testing purposes were changes as follows to orekit/propagation/events/EventState.java:

Changed assert comparison on line 266
OLD check(ga == 0.0 || gb == 0.0 || (ga > 0.0 && gb < 0.0) || (ga < 0.0 && gb > 0.0));
NEW check(equalsWithinTolerance(ga, 0.0) || equalsWithinTolerance(gb, 0.0) || (ga > 0.0 && gb < 0.0) || (ga < 0.0 && gb > 0.0));

This checks that there is a root in the interval [a, b]. If this check fails then the following algorithm, which assumes there is a root in [a, b], may not behave correctly. Adding a tolerance on that check would invite the situation where, for example g(a) = 1 and g(b) = 1e-20 to pass, even though there is no guarantee of a root in [a, b] as it could just be a close approach.


Changed assert comparison on line 386
OLD check(increasing == gb >= ga);
NEW check(increasing == gb >= ga || (equalsWithinTolerance(ga, 0.0) && equalsWithinTolerance(gb, 0.0))); 

This checks that the increasing flag is set correctly. Failing this means that, for example two increasing events were found in a row, which is a bug in the algorithm, or possibly due to the check on line 266 failing. 


Changed assert comparison on line 493
OLD check(g0 == 0.0 || gPositive == (g0 > 0.0));
NEW check(equalsWithinTolerance(g0, 0.0) || gPositive == (g0 > 0.0));

Fails for the same reason as the last check, two increasing or deceasing events in a row.


In general comparing a double to exactly zero is obviously bad but I wasn't sure if this was done because even if it was within machine epsilon error we'd want these conditions to be evaluated as false.  If you'd like me to make the change to EventState to accommodate this we could make the tolerance a configurable parameter with a default so that users could set it all the way to machine epsilon if they want (perhaps that's the default value) but expand it if they wanted to via something like a programmatic override.  

I agree that equality comparisons with floats are generally bad, but if you look closely at the boolean _expression_ these are all really >= or <= comparisons. For example g0 == 0.0 || gPositive == (g0 > 0.0) is a short way of writing (gPositive == true && g0 >=0) || (gPositive == false && g0 <= 0).


Thoughts?

It appears that loosening the checks allows the algorithm to find two increasing events or decreasing events in a row. So we should try to figure out why the algorithm is trying to do that because that will lead to the root cause.

Best Regards,
Evan


Hank