View Issue Details

IDProjectCategoryView StatusLast Update
0002078SkyChart1-Softwarepublic19-02-21 15:59
ReporterSasaAssigned ToPatrick Chevalley 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
PlatformLinuxOSUbuntu 64-bitOS Version18.10
Product Version4.1 SVN 
Target Version4.2Fixed in Version4.1 SVN 
Summary0002078: Slow picture initialization when DSO is present - Ubuntu rise "Skychart is not resoinding:
DescriptionInstalled:
sudo apt-get install skychart-data-stars skychart-data-dso skychart-data-pictures

When ./skychart dir is deleted and start CDC, initialization process begins, however when start to process pictures it appears that is freeze. It actually working and update progress bar, however extremely slowly and in large intervals (comparing to default catalog). Ubuntu also complains that CDC is not responding...

Since this is tested on brand new and fast PC, I would suggest to update progress bar and execute Application.ProcessMessages not based on percent, but time interval. Or perhaps to make it threaded, which is an overkill IMO, as it happens only once.
TagsNo tags attached.

Activities

Sasa

19-02-07 14:18

reporter  

Patrick Chevalley

19-02-07 18:17

administrator   ~0005355

This not need a fast processor or multi-thread because the limitation is disk access to read the header from every FITS file.
For example on my computer it take about 30 seconds when run from a 5400rpm disk, about 5 seconds from a SSD, and less than 3 seconds when run a second time with all the files in memory cache.

But you are right there was not enough ProcessMessages, only 25 for the whole process.
I increase that to 250 and this must be enough to not get this message.
https://github.com/pchev/skychart/commit/c39e2840a61482617722917e6402690643334d50

Sasa

19-02-07 23:20

reporter   ~0005358

Last edited: 19-02-08 17:33

View 3 revisions

Sorry, I have to reopen this.

It is something really wrong here, since I have standard 5400rpm HDD and very fast PC, but this process last 15 minutes. There is 9701 items to process.

It may be SQLite version issue, issues with internal cache or any other. I have no idea, since this is really fast PC.

Is it possible with SQLite to make temp database in memory and simply flush all when done in the file? That would be really fast.

Patrick Chevalley

19-02-07 23:57

administrator   ~0005359

OK, I see the problem. For all the testing my sqlite db stay in my $home on the SSD.
If I move the db to the 5400rpm disk it is very slow as you describe. I not think sqlite liked so much the SSD.
There is already a "begin transaction" before the insertion but this not help.
This need more testing.

Patrick Chevalley

19-02-08 00:15

administrator   ~0005360

Interesting, the following change make it finish in a few second on the HDD:

In cu_database.pas insert this two lines between lines 1517 and 1518:
              n := 0;
              db.Query('PRAGMA journal_mode = MEMORY'); //NEW
              db.Query('PRAGMA synchronous = OFF'); //NEW
              DB.starttransaction;

Sasa

19-02-08 01:36

reporter   ~0005361

Yes, but real time is about 1 min. If deleted ./skychart dir and try again, then it is in few seconds because of internal cache. Real speedup is 15x and that may be used for another issue I submitted while ago.

However, I'm not sure when journal file is introduced, maybe 3.4.x, I do not really remember. And I'm not sure did you supply specific SQLite library for specific platform or rely on OS repository package dependence due installation.

Second may be consideration about used memory on some specific platforms.

In any way, another speed up is to avoid plain SQL commands and use BINDing methods. In this case, that is irrelevant.

In general, for linux and PC, this solution is more than suitable with 15x speed up..

Patrick Chevalley

19-02-08 09:37

administrator   ~0005363

Yes, I will move this instruction just after opening the database so this also increase the performance for the asteroid and other operation.
But I still want to search a bit more because normally the StartTransaction function send a BEGIN to sqlite that must prevent it to do the sync I/O for every insert.

Memory journaling was added in 3.6.5 and the version I include for Windows is 3.7.2. On Ubuntu I am at 3.24.0, I will take the occasion to refresh the Windows version anyway.

Sasa

19-02-08 09:59

reporter   ~0005364

Last edited: 19-02-08 10:21

View 2 revisions

I'm referring specifically about journal file and when it was introduced, since I'm not certain about behavior of the SQLite wrapper, if that PRAGMA is not supported for specific library...

And also another thing should be taken in consideration is cache size for SQLite. But I do not know details, as I do not use latest SQLite versions. Some older specific bug free version of the library was radically faster for some applications than current ones.

Patrick Chevalley

19-02-08 23:43

administrator   ~0005366

I think I understand how this sqlite transaction work, this is not the same that with mysql or oracle, because each query is unitary it not take account for the "BEGIN TRANSACTION" in a previous query.
So the complete transaction must be put in a single string starting by BEGIN; and ending by COMMIT; with all the INSERT between them.

Doing this way it take only one second to insert all the rows at the end when the progress bar is at 100%. There is no need for the PRAGMA and I remove them.
Now all the slow part with the progress bar is taken to read the fits files and compute the coordinates without any sqlite operation.

https://github.com/pchev/skychart/commit/6664fc70a5f2896e002a4ce13b924dfa3f34798f

I have also updated the sqlite Windows dll to the latest version 3.27.

Sasa

19-02-09 08:08

reporter   ~0005367

Last edited: 19-02-09 08:29

View 3 revisions

Performance is the same as previous method. However, I think I have found what was the main problem...

While I read the code carefully, I have noticed that is used Query method. This is incorrect, it should be used Execute method. Then transaction mechanism will work fine, adding each line between START and COMMIT, sending each line separately. I have returned back old transaction methods.

And once again, real execution time is about 1 minute. If delete ./skychart dir and test again, it is few seconds due previous internal cache.

As well I have added each second Application.ProcessMessages, which may be set even under 1s if needed. Old code may be removed or corrected accordingly .

I have attached the whole method for testing.



Update.txt (4,793 bytes)
procedure TCDCdb.ScanImagesDirectory(ImagePath: string; ProgressCat: Tlabel;
  ProgressBar: TProgressBar);
var
  c, f: tsearchrec;
  i, j, n, p: integer;
  catdir, objn, fname, cmd: string;
  dummyfile: boolean;
  ra, de, w, h, r: double;
  ms : QWord; //NEW
begin
  try
    if DirectoryExists(ImagePath) then
    begin
      try
        if DB.Active then
        begin
          ProgressCat.Caption := '';
          ProgressBar.position := 0;
          ms :=GetTickCount64; //NEW
          j := findfirst(slash(ImagePath) + '*', faDirectory, c);
          while j = 0 do
          begin

            if ((c.attr and faDirectory) <> 0) and (c.Name <> '.') and (c.Name <> '..') then
            begin
              catdir := slash(ImagePath) + c.Name;
              ProgressCat.Caption := c.Name;
              ProgressBar.position := 0;
              Application.ProcessMessages;
              DB.UnLockTables;
              DB.starttransaction;
              cmd := 'delete from cdc_fits where catalogname="' + uppercase(c.Name) + '"';
              DB.query(cmd);
              DB.commit;
              i := findfirst(slash(catdir) + '*.*', 0, f);
              n := 1;
              while i = 0 do
              begin
                Inc(n);
                i := findnext(f);
              end;
              ProgressBar.min := 0;
              ProgressBar.max := n;

              if (ProgressBar.Max > 250) then
                ProgressBar.Step := ProgressBar.Max div 250
              else
                ProgressBar.Step := 1;

              i := findfirst(slash(catdir) + '*.*', 0, f);
              n := 0;

              db.StartTransaction; // USE THIS

              while i = 0 do
              begin
                if f.Name = 'README.TXT' then
                begin
                  i := findnext(f);
                  continue;
                end;
                Inc(n);
                if (n mod ProgressBar.step) = 0 then
                begin
                  ProgressBar.stepit;
                  Application.ProcessMessages;
                end;

                // NEW statement
                if GetTickCount64-ms > 1000 then
                begin
                  Application.ProcessMessages;
                  ms := GetTickCount64
                end;

                dummyfile := uppercase((extractfileext(f.Name))) = '.NIL';
                if dummyfile then
                begin
                  ra := 99 + random(999999999999999);
                  de := 99 + random(999999999999999);
                  w := 0;
                  h := 0;
                  r := 0;
                  fname := slash(catdir) + f.Name;
                end
                else
                begin
                  // this is much faster than cdcwcs_getinfo
                  FFits.FileName := slash(catdir) + f.Name;
                  ra := FFits.Center_RA;
                  de := FFits.Center_DE;
                  w := FFits.Img_Width;
                  h := FFits.img_Height;
                  r := FFits.Rotation;
                  fname := FFits.FileName;
                end;

                if FFits.header.valid or dummyfile then
                begin

                  objn := extractfilename(f.Name);
                  p := pos(extractfileext(objn), objn);
                  objn := copy(objn, 1, p - 1);
                  objn := uppercase(stringreplace(objn, blank, '', [rfReplaceAll]));
                  cmd := 'INSERT INTO cdc_fits (filename,catalogname,objectname,ra,de,width,height,rotation) VALUES ('
                    + '"' + stringreplace(fname, '\', '\\', [rfReplaceAll]) + '"' +
                    ',"' + uppercase(c.Name) + '"' +
                    ',"' + uppercase(objn) + '"' +
                    ',"' + formatfloat(f5, ra) + '"' +
                    ',"' + formatfloat(f5, de) + '"' +
                    ',"' + formatfloat(f5, w) + '"' +
                    ',"' + formatfloat(f5, h) + '"' +
                    ',"' + formatfloat(f5, r) + '"' +
                    ')';
                    // Separator is SQL command is only needed
                    // if send group of commands at once

                    DB.Execute(cmd); // CORRECT WAY, Query method was the problem
                    //writetrace(Format(rsDBInsertFail, [f.Name, DB.ErrorMessage]));

                end
                else
                  writetrace(Format(rsInvalidFITSF, [f.Name]));

                i := findnext(f);
              end;
              DB.commit;

              findclose(f);
            end;
            j := findnext(c);
          end;
          DB.flush('tables');
        end;
      finally
        findclose(c);
      end;
    end
    else
    begin
      ProgressCat.Caption := 'Directory not found!';
    end;
  except
  end;
end;

Update.txt (4,793 bytes)

Sasa

19-02-09 08:20

reporter   ~0005368

Last edited: 19-02-09 08:27

View 3 revisions

And BTW, this paslib do not provide support for SQLite binding ability, in order to avoid constant SQL command parsing for each line. For instance:

cmd := 'INSERT INTO cdc_fits (filename,catalogname,objectname,ra,de,width,height,rotation) '+
            ' VALUES (?,?,?,?,?,?,?,?);

This would be sent by sqlite3_prepare, then in the loop sqlite3_bind*, sqlite3_step and sqlite3_reset. That would be the fastest possible way, suitable for very slow computers as well.

Patrick Chevalley

19-02-09 09:48

administrator   ~0005371

Your method with Execute do not work, no data is actually inserted in the database.

The 1 minute execution time is not related to sqlite but to the time it need to read the header of the 9700 fits files. Try to comment all the sql instruction, you will see it take almost the same time.
You can purge the disk cache between test with the command (as root) : echo 3 > /proc/sys/vm/drop_caches

If you print the time before and after the DB.query(cmdl) in my last version you see it take only 1 or 2 seconds.

The passql library is old and not maintained, probably it as issue and not use correctly the feature that where introduced in sqlite since 15 years, I use it at the time there was no other alternative in Lazarus.
Now the Lazarus native DB component are good and it is worth to change for them.
I suggest we keep it as is for version 4.2 and after the release take the occasion we remove Mysql in 4.4 to change for a better sql component.

Sasa

19-02-09 12:44

reporter   ~0005377

Yes, you are right, Execute method is not implemented correctly in the library. Let me some time to look and correct it.

BTW, I have made long time ago my own wrapper work fine even today and this transaction mode should work fine as implemented and very fast. I do not have 10K records, but more than 100K with much more columns...

Sasa

19-02-10 08:40

reporter   ~0005379

Last edited: 19-02-10 08:42

View 2 revisions

I have implemented Executed method with plain SQLite3_Exec, however there is very high latency as it use journal file on HDD, and again we have to use PRAGMA to create it in memory. The library itself use by default quite a bit of LOCK/UNLOCK mechanism and that also may trigger creation of journal file...

With older fast version of SQLite (I have looked at old code) transaction worked fast up to some 10K inserts, then slow down progressively, thus if there is more records to insert I have separated in blocks of 10K per transaction, but all was executed line by line and it was extremely fast.

The best performance is to keep as you created already (it is under 10K anyway, which may not be an issue nowadays). Perhaps only to use TStringlist in order to avoid constant memory relocation when concat strings.

However there is another way...

From 3.7.11, it is possible to insert multiple rows with one line only:
INSERT INTO 'name' ('c1', 'c2') VALUES
    ('d11', 'd12'),
    ('d21', 'd22'),
...

Which I tested and it works fine. However, I'm not certain does MySQL support it and which version.

At end, since this code should work with MySQL until 4.4, perhaps BEGIN and COMMIT are not recognized as start and end of transaction.

Patrick Chevalley

19-02-10 14:13

administrator   ~0005380

Yes, I not tested my change with mysql and it is not sure it work.
I will test and if this not work I revert this change and let as is for 4.2.

Then for 4.4 we can work on a clean new implementation for sqlite only.
In this case all I need is binding for the sqlite function with some utility function to not mess with pchar everywhere.
I really prefer to work with Prepare, Execute, Fetch, ... if possible.

Sasa

19-02-20 07:21

reporter   ~0005424

Last edited: 19-02-20 07:40

View 3 revisions

I can see you revert the changes. However, here may be used global variable which direct which databases is used then may be used specific code/procedure. Benefit of this speed up for SQLite is so massive that it worth to be kept and used, until are used both engines.

Patrick Chevalley

19-02-21 15:59

administrator   ~0005430

OK, I make the change again, but only for sqlite:
https://github.com/pchev/skychart/commit/f1a42675e2a2cbc65456300754aaf97cf4c91213

A big problem with this method is if a single insert fail all the changes is rollback.
This is why I not want to generalize it for other insert. But for this one the data are the same for all the users and is well tested, so this must not be a problem.

 

Issue History

Date Modified Username Field Change
19-02-07 06:50 Sasa New Issue
19-02-07 14:18 Sasa File Added: Screenshot from 2019-02-07 06-12-40.png
19-02-07 18:17 Patrick Chevalley Assigned To => Patrick Chevalley
19-02-07 18:17 Patrick Chevalley Status new => resolved
19-02-07 18:17 Patrick Chevalley Resolution open => fixed
19-02-07 18:17 Patrick Chevalley Fixed in Version => 4.1 SVN
19-02-07 18:17 Patrick Chevalley Target Version => 4.2
19-02-07 18:17 Patrick Chevalley Note Added: 0005355
19-02-07 23:20 Sasa Status resolved => new
19-02-07 23:20 Sasa Resolution fixed => reopened
19-02-07 23:20 Sasa Note Added: 0005358
19-02-07 23:22 Sasa Note Edited: 0005358 View Revisions
19-02-07 23:57 Patrick Chevalley Note Added: 0005359
19-02-07 23:57 Patrick Chevalley Status new => assigned
19-02-08 00:15 Patrick Chevalley Note Added: 0005360
19-02-08 01:36 Sasa Note Added: 0005361
19-02-08 09:37 Patrick Chevalley Note Added: 0005363
19-02-08 09:59 Sasa Note Added: 0005364
19-02-08 10:21 Sasa Note Edited: 0005364 View Revisions
19-02-08 17:33 Sasa Note Edited: 0005358 View Revisions
19-02-08 23:43 Patrick Chevalley Note Added: 0005366
19-02-09 08:08 Sasa File Added: Update.txt
19-02-09 08:08 Sasa Note Added: 0005367
19-02-09 08:20 Sasa Note Added: 0005368
19-02-09 08:25 Sasa Note Edited: 0005367 View Revisions
19-02-09 08:26 Sasa Note Edited: 0005368 View Revisions
19-02-09 08:27 Sasa Note Edited: 0005368 View Revisions
19-02-09 08:29 Sasa Note Edited: 0005367 View Revisions
19-02-09 09:48 Patrick Chevalley Note Added: 0005371
19-02-09 12:44 Sasa Note Added: 0005377
19-02-10 08:40 Sasa Note Added: 0005379
19-02-10 08:42 Sasa Note Edited: 0005379 View Revisions
19-02-10 14:13 Patrick Chevalley Note Added: 0005380
19-02-20 07:21 Sasa Note Added: 0005424
19-02-20 07:40 Sasa Note Edited: 0005424 View Revisions
19-02-20 07:40 Sasa Note Edited: 0005424 View Revisions
19-02-21 15:59 Patrick Chevalley Status assigned => resolved
19-02-21 15:59 Patrick Chevalley Resolution reopened => fixed
19-02-21 15:59 Patrick Chevalley Note Added: 0005430