[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Orekit Users] design of the DataProviders




"JEANDROZ, Yannick [FR]" <yannick.jeandroz@airbus.com> a écrit :

Hello,

Hi Yannick,


I think I have encountered a use case that, to my knowledge, Orekit cannot
handle. This has led me to investigate the inner workings of DataProviders. I
believe that the design, while very robust for "one-shot" simulations, is not
well suited for applications where the models data are a bit "dynamic" during
the execution. Could someone please confirm what I think I understood ?



My starting point is that I need to execute Orekit code in a
multi-threaded context, with thread-specific data providers. My actual
use case is about solar activity, but for the sake of simplicity I
will provide examples based on the 'tai-utc.dat' file.

Basically, I have 2 different versions of the data file "a.dat" and
"b.dat", and I need to run simultaneously thread A using a.dat,
and thread B using b.dat. Since DataProvidersManager is a
singleton, I have not found a way to do that.

You are right, DataProvidersManager is a global singleton. We could
probably easily make its internal fields ThreadLocal, but this would
probably not work as expected, see below the reasoning about the
interaction betwee caches and threads.





I understand that my need is very specific. But in the process of simplifying my
test case as much as possible, I have found another strange behaviour of
DataProvidersManager, even in a mono-thread application. I find this more
problematic. Since model data are usually cached by the factory classes (for
instance TimeScalesFactory), it seems virtually impossible to change the
dataproviders during execution (even in a mono-thread context).

I have an example to illustrate this. I have built two data sets :
- dataset 1 uses a "correct" utc-tai.dat
- dataset 2 uses a modified utc-tai.dat, where I have added a 0.5s shift in TAI-UTC values

Now I perform a simple computation on each of them with the following method :

    private void displayUTC(String datapath) throws OrekitException {
DataProvidersManager.getInstance().addProvider(new DirectoryCrawler(new File(datapath)));
        TimeScale utc = TimeScalesFactory.getUTC();
        AbsoluteDate date = new AbsoluteDate("1999-08-22T00:00:00", utc);
        System.out.println(date.durationFrom(AbsoluteDate.GALILEO_EPOCH));
System.out.println(DataProvidersManager.getInstance().getLoadedDataNames());
    }

My test code looks like this. Each method must be run in a separate
process, to make sure nothing is kept in memory between executions.

RUN1 :
    public void test1() throws OrekitException {
        System.out.println("Test 1");
        displayUTC("C:\\dataset1");
    }
OUTPUT :
                Test 1
                0.0
                [C:\dataset1\tai-utc.dat]

This is the expected output.


RUN2 :
    public void test2() throws OrekitException {
        System.out.println("Test 2");
        displayUTC("C:\\dataset2");
    }
OUTPUT :
                Test 2
                0.5
                [C:\dataset2\tai-utc.dat]

This is the expected output. Notice the 0.5s shift that I have
introduced in the data file.

RUN3 :
    public void testBoth() throws OrekitException {
        System.out.println("Test both");
        displayUTC("C:\\dataset1");
        DataProvidersManager.getInstance().clearProviders();
        displayUTC("C:\\dataset2");
    }
OUTPUT :
                Test both
                0.0
                [C:\dataset1\tai-utc.dat]
                0.0
                [C:\dataset1\tai-utc.dat]

As you can see, the results for the second test change when it is executed right
after the first, in the same process. The dataset 1 is used twice, despite
clearing the DataProviders and reloading dataset 2. I believe this is because
the data is cached in the TimeScalesFactory. I think it would make more sense to
cache the data in the DataProviders (or maybe DataLoaders) instead of the
factories.

You are right about the cause: the factory caches data. In fact, when we run
the unit tests for Orekit, we do this kind of stuff thousands of times, so
we had to circumvent our own caches. We have set up a clearFactories method
in the class org.orekit.Utils just for this purpose. Beware, this is only
in the test part of the sources and is *not intended* to be put in the
library part. It is an ugly hack only suitable for tests. For the sake of
information, we do this using instrospection, accessing private fields and
modifying them (we also reset the singletons this way).

I am not sure about the solution. The caches were designed to be used by
several threads, and it took quite some time to achieve this (look at
the GenericTimeStampedCache class for an idea of what it looks like). This
designed was addressed the following use case: a server application runs
somewhere and answers to requests (typically web services) coming from
the network. the application uses a pool of threads to handle requests.
The important part is here: as the pool of threads can be reused, even
if remote clients A and B are always using the same data set (for example
computing next week maneuvers and therefore using data around next
week for client A and post processing last month data for client B),
requests from client A and client B will not always be served by the same
thread on the server. The threads are picked up from the pool, serve
one request and returned to the pool. Next incoming request from the
same client may pick up a different thread. So ThreadLocal do *not* work
in this context. This would be even worse if threads were created and
destroyed continuously to handle only one request: each new request
would use a newly created thread. This is why we currently have caches
that have the following properties:
 - they are thread safe
 - they can handle data from different time ranges
 - thread association with time range can change for each request

Obviously, the use case we adressed is different from the one you need.
In our case, one Orekit process (possibly using different threads)
corresponds to one data set (i.e. a set of data loaders and the caches
containing the data loaded from them).

In your case each of your threads has a dedicated predefined meaning and
should use its own data set.

I don't know yet how to manage this. Do your different threads really
need to be threads within the same process? Could you use simply different
processes, possibly exchanging data with inter-process communication
if needed (or not exchanging data at all if they don't need to)?




Finally, but this is a very very minor nitpick : the data loading mechanism is
based on data file names. This is a bit confusing when working with a
non-file-based storage, typically a database of some sort. Asking for data by
"type" (solar activity, earth orientation parameters...) would seem more
intuitive to me.

I fully agree. We had a SOCIS intern in 2014 for this. The work done
is available here: <https://www.orekit.org/forge/projects/socis-2014-database/>
It has never been integrated as it needs more work.

The "filename" could be considered simply as a key or table name. I don't know
if this would be a large API change or simply a documentation and parameter
name change.


After re-reading this email, I feel like I am bashing the data loading
mechanism. Please do not interpret my feedback this way : I have used Orekit for
several years now, and this is the first time I feel like I have hit a hard
limitation. This is a testament to the overall design of the library.

Thanks for the kind words, it is appreciated.



I have started thinking about possible refactorings of the model data
management. I have a somewhat similar behaviour somewhere else in my software,
and I have used a dependency inversion based on Java services to solve it. So
far, it seems to work quite well (but my software is not that big yet, so it
might be a bit early to tell). Maybe something like this could be implemented
for orekit data management ? I would gladly share a very basic draft of my ideas
if it can be of any help.

Sure! We can speak about this on the developers list (and also during
the Orekit day at the end of the month!).

best regards,
Luc




Thank you for your time.

Yannick Jeandroz

Yannick Jeandroz
TESOA2 - Flight Dynamics
T    +33 (0)5 62 19 51 71
E    yannick.jeandroz@airbus.com

www.airbusdefenceandspace.com<http://www.airbusdefenceandspace.com/>

[AirbusDS]



***************************************************************
Ce courriel (incluant ses eventuelles pieces jointes) peut contenir des informations confidentielles et/ou protegees ou dont la diffusion est restreinte. Si vous avez recu ce courriel par erreur, vous ne devez ni le copier, ni l'utiliser, ni en divulguer le contenu a quiconque. Merci d'en avertir immediatement l'expediteur et d'effacer ce courriel de votre systeme. Airbus Defence and Space et les sociétés Airbus Group declinent toute responsabilite en cas de corruption par virus, d'alteration ou de falsification de ce courriel lors de sa transmission par voie electronique. This email (including any attachments) may contain confidential and/or privileged information or information otherwise protected from disclosure. If you are not the intended recipient, please notify the sender immediately, do not copy this message or any attachments and do not use it for any purpose or disclose its content to any person, but delete this message and any attachments from your system. Airbus Defence and Space and Airbus Group companies disclaim any and all liability if this email transmission was virus corrupted, altered or falsified.
---------------------------------------------------------------------
Airbus Defence and Space SAS (393 341 516 RCS Toulouse) - Capital: 29.821.072 EUR - Siege social: 31 rue des Cosmonautes, ZI du Palays, 31402 Toulouse cedex 4, France