[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Orekit Users] Test failure



Hi Luc, Walter,

On Mon, 2018-06-04 at 18:10 +0200, MAISONOBE Luc wrote:
Walter Grossman <w.grossman@ieee.org> a écrit :

Thanks for prompt response. I will do my best. Let me also add that there was a warning that 2 tests were skipped.
The skipped tests are expected, they correspond to one of the class considered experimental as of 9.2.
I cloned the repository using git. the jar is orekit-9.2.jar UBUNTU 16.04LTS Intel® Core™ i5-3320M CPU @ 2.60GHz × 4 Intel® Ivybridge Mobile 64-bit openjdk version "1.8.0_171" OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11) OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

I'm also seeing this test error, as well as one with NetworkCrawlerTest, when building the 9.2 tag from git. The NetworkCrawlerTest issue may be unrelated. Here is my system information:

$ mvn clean test
...
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   NetworkCrawlerTest.compressed:82 expected:<2> but was:<0>
[ERROR]   OrbitDeterminationTest.testW3B:384 expected:<0.687998> but was:<0.6880143632396981>
[INFO] 
[ERROR] Tests run: 2790, Failures: 2, Errors: 0, Skipped: 2

commit: 5da7febcc2769477c4522d7ec4ed42e8169c6e39

javac 1.8.0_171
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

Linux B259-LINUX4 4.4.0-127-generic #153-Ubuntu SMP Sat May 19 10:58:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description:	Ubuntu 16.04.4 LTS
model name	: Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz

The NetworkCrawlerTest issue seems to be related to the order the tests are run, because when I run `mvn clean test -Dtest=NetworkCrawlerTest#compressed` the test passes. So it is probably a data loading order/caching issue. I think the order JUnit tests are run is non-deterministic. 

Luc, If you run `mvn clean test -Dtest=OrbitDeterminationTest#testW3B` does it still pass on your machine? Perhaps it is a test ordering issue for that test as well. 

Let me know if you would like me to try something to debug the issue.

Best Regards,
Evan

Maybe I should switch to Oracle Java?
No, most of the Orekit developers use Linux and openJDK. I'll have a quick look at this, but this may be a numerical glitch. Increasing the tolerance seems fine to me. best regards, Luc
On Mon, Jun 4, 2018 at 9:54 AM, MAISONOBE Luc <luc.maisonobe@c-s.fr> wrote:
Hi Walter, Walter Grossman <w.grossman@ieee.org> a écrit : I am a newbie to OREkit. I ran tests and go a "near-miss" failure. I
resolved by relaxing precision. How do I know if I am OK? OrbitDeterminationTest.testW3B:384 expected:<0.687998> but was:<0.6880143632396981> found this line: Assert.assertEquals(0.687998, covariances.getEntry(6, 6), 1.0e-5); Is the problem that acceptance criterion too tight? Why?
The test tolerance is intentionally extremely small, see below for the rationale for this stringent choice. The test should however succeed with the current settings. Could you tell us which version of Orekit you use (development version from the git repository, released version?) and with which Java environment (OS, JVM version, processor)? Some tests in Orekit are built in several stages. First the test is created without any thresholds and only output its results, which are compared by the developer with whatever is available to get confidence on the results. This may be run of other reference programs if available, this may be another independent implementation using different algorithms, or this may be sensitivity analysis with the program under test itself. This validation phase may be quite long. Once developers are convinced the implementation is good, they run the test one last time and register its output as the reference values with a stringent threshold in order to transform the nature of the test into a non-regression test. The threshold is therefore not an indication that the results are very good, it is only a way for us to ensure that any change in the code that affects this part will break the test and will enforce developers to look again at this code and to decide what to do. They can decide that the changes that broke the test are valid and that they only changed the results in an acceptable way (sometimes to improve the results), so they change either the reference value or the threshold. They can decide that the changes in fact triggered something unexpected and that they should improve their new code so the test pass again without changing it. So as a summary thresholds for non-regression tests are small to act as a fuse and people notice when it blows up and can take decisions. best regards, Luc