Author Topic: Improving performance of loading large data into an EA Model  (Read 1246 times)

Andrew Mitchell

  • EA Novice
  • *
  • Posts: 11
  • Karma: +1/-0
    • View Profile
Improving performance of loading large data into an EA Model
« on: December 04, 2020, 05:26:08 am »
We have a big model to load and so we ran some performance testing - we summarise the results below.
It would be really helpful if people could tell us whether these results fit with their experience. And also share any tips for further improvement? 

We have a model that we are trying to load in EA, currently using an MS Access Database EA Project but as we reach the limit of Access we will move to an SQL Server EA Project. It has: 14284 classes, 42725 attributes, 57065 connectors, 28483 proxy connectors.
Our first attempt at loading using EA Automation took 18 hours to load this full model.  We then found this link: https://sparxsystems.com/forums/smf/index.php?topic=4143.0 and by using the performance flags (BatchAppend, EnableUIUpdates) this took the time down to 6 hours.  We noticed here that we were calling the interface to retrieve created objects (e.g. repository.GetElementByGUID()). By storing the objects locally instead we got the time down to 3 hours.  Also, there appears to be no difference between using the internal VBScript or the external EA Automation.

We also tried another approach of using Native XML through the ImportUsingXMI command.  This only took 15 minutes to run.

We then looked at trying to compare the performance of these techniques, comparing the simple task of how long it took to load a number of classes into a package.

The table below shows the results of this comparison.
The values of the right-hand three columns are average times to load a class in milliseconds.  As you can see, when the performance flags are off the time grows linearly with the increase of classes, for the other two it is roughly constant.

Number of Classes                             Flags off                           Flags On                   NativeXML
1000                            12                 5                 8
10000                           21                 5                 4
20000                           34                 5                 9
30000                           47                 6                 6
40000                           62                 5                 6
50000                           78                 5                 7
60000                           91                 5                 11
70000                           106                6                 10
80000                           119                6                 10
90000                           135                5                 11
100000                          152                6                 12


It looks like the 'Flags On' approach is the best, but when we have a more complicated model than just classes in a package (see times above) then the 'NativeXML' approach is better.

« Last Edit: December 04, 2020, 05:28:42 am by Andrew Mitchell »

qwerty

  • EA Guru
  • *****
  • Posts: 12446
  • Karma: +348/-291
  • I'm no guru at all
    • View Profile
Re: Improving performance of loading large data into an EA Model
« Reply #1 on: December 04, 2020, 08:05:21 am »
I'm using SQL to retrieve data more often than the API operations. The latter only for some quick, temporary things. But when it comes to performance and daily use it needs to be SQL. Creating things is a bit more tricky and needs attention when Sparx is changing basics (for a new version). But that happens very rarely. Especially the basic things have almost not changed since the very beginning.

q.

Uffe

  • EA Practitioner
  • ***
  • Posts: 1859
  • Karma: +133/-14
  • Flutes: 1; Clarinets: 1; Saxes: 5 and counting
    • View Profile
Re: Improving performance of loading large data into an EA Model
« Reply #2 on: December 06, 2020, 09:56:18 pm »
Hello Andrew,


The EA API is notoriously chatty over the network. It appears to have no concept of a data cache; most every operation call seems to generate one or several SQL queries which are issued one by one, and there are no supporting stored procedures in the database schema. The properties you've found are pretty much the only ones that affect the operation of the API itself, so it's a pretty primitive affair.

As Qwerty writes, the only way to get acceptable performance is to work out your own SQL queries. However, this only works for data retrieval. While you can issue create and update queries using the undocumented Repository.Execute() method, the schema is undocumented and to achieve something like creating a new element you will have to work out for yourself which tables to modify and in what order. (Minor updates are generally safe, but creation is complex.)

I would certainly urge you to switch to SQL Server as the top priority. Access is very limited in and of itself, and over a network it's simply untenable. I don't know for certain, but I'm pretty sure that using an .EAP project on a file server means all network traffic is routed through NTFS and not ODBC, likely causing further performance issues.

The choice between opening a project in the EA client (whether by automation or the GUI) and loading model data from XMI seems like a false one to me. In the first case you're opening the database which already contains the data, while in the other you are adding data to the (local?) database. Maybe I'm not grasping what you're trying to do, but I will note that this is essentially how working with version-controlled models works. This approach is not one I recommend, but it is there and it is supported.

HTH,


/Uffe
My theories are always correct, just apply them to the right reality.

Andrew Mitchell

  • EA Novice
  • *
  • Posts: 11
  • Karma: +1/-0
    • View Profile
Re: Improving performance of loading large data into an EA Model
« Reply #3 on: December 08, 2020, 01:13:16 am »
Uffe and qwerty,
Many thanks.
As you have both mentioned, retrieving data from a model is best done (for performance reasons) using SQL queries, and this is what we do.
The performance issues arise when we want to create objects.  And both your comments agree with our findings.
For clarity Uffe, the two approaches we were testing were to start with a dataset of objects we want to migrate into EA.  Approach one was to create these in EA using the EA Automation. Approach two was to create a Native XML file programmatically based on the objects we want to create and import this file into a Model.
We definitely take onboard your suggestion of switching to SQL Server.
Andy

qwerty

  • EA Guru
  • *****
  • Posts: 12446
  • Karma: +348/-291
  • I'm no guru at all
    • View Profile
Re: Improving performance of loading large data into an EA Model
« Reply #4 on: December 08, 2020, 01:35:20 am »
If you are just creating plain elements it's pretty much straight forward to create them with SQL. Also attributes/operations and connectors are not a big deal. Just when you need stereotypes it's getting tricky (though can be done as well if you know how).

Not sure what you are doing basically (where from you get the data to create your model).

q.

Andrew Mitchell

  • EA Novice
  • *
  • Posts: 11
  • Karma: +1/-0
    • View Profile
Re: Improving performance of loading large data into an EA Model
« Reply #5 on: December 08, 2020, 08:52:50 pm »
Hi qwerty,
We have a process that creates tables of objects that represents our model - this is then migrated to EA UML.
The migration creates classes, stereotypes, connectors, attributes, operations, proxy connectors, diagrams (including objects and links).  It may be extended to other object types.
We are a little nervous about manipulating the underlying tables in EA directly as we feel the EA Automation Interface is more stable, i.e. EA could change the structure of the underlying tables without warning.
Andy

qwerty

  • EA Guru
  • *****
  • Posts: 12446
  • Karma: +348/-291
  • I'm no guru at all
    • View Profile
Re: Improving performance of loading large data into an EA Model
« Reply #6 on: December 08, 2020, 09:27:07 pm »
Yes, sure to be on the safe side. I'm looking into EA's guts since almost two decades and the number of essential changes were near zero. If you're not hopping on the next beta as soon as it's announced and have a good life cycle it's quite safe to go to the DB directly. Just those stereotypes in the dreaded t_xref are a bit of a challenge. Especially tagged values which have coded GUIDs (I started cracking but got no time to end that task so far).

q.