O N E   C R A Z Y   M U L T I   T A L E N T Today its Thursday, March 28, 2024 @
21:37:43


SOTDS v2.0alpha update & info
in Software | Friday, September 05, 2014 | 03:39


For the first time I proud to present screenshots and write some chitter about it to show a fully working search engine with proof of drastic improvements all over!


Since the release of v1.3beta I have been working hard on improving the entire concept, for now, I'll leave you with some comparisions from the progress as it has emerged during the last 2 years!

Below are a 81 million row, 2columns test database (that's 163million words that can be dug out). The current search speed is quite impressive!

You should note that the search speeds given out is from the moment you enter the search until you can actually see something visible in the window. The search speed is actually doing several "slow" processing such as, fileseek, read databank from disk, uncompress, match up and finally procdure human readable results.

Having that in mind, that the search speed involves reading from disk into memory and then additionally UNCOMPRESS the data, still SOTDS produces some extremly fast results. If the data was in memory and didn't have to uncompress, we could probably expect a drastic decrease in search speed, yeah, even down to practically instant (0.01ms or 0.00ms).


News Update 17 Oct 2014

Managed again to improve the search speed for my current test database. Not only improving but even with more data to show for my test search it beats my old 176millon word database after an update where the particular database has now expanded to 188million words. Check out the 17 Oct 2014 screenshots below.

I have only 2-3 minor bugs to take care of including 2-3 last remaining features which brings us closer to a final v2.0 release than ever. An impressive coding journey can soon be put to rest.



News Update 28 Sep 2014

Massive search speed improvements now showing its results after more coding and switching over to a "standardized" compression format for the SOTDS. See 3 pictures below marked with 28 Sep 2014. Everything smashing my old records. As far as I'm concerned, there isn't much else that can be done to improve the search speed, as even with no compression in place, the benefits are slightly better, but requires much more space on disk for database. Since one of my goals was to make the searchable database SMALLER than the original with pretty fair search speeds, this is it!

During the upcoming weeks I will continue to work on GUI optimizations, generate more real databases to test with along bringing in the last planned features in the Search Engine. I'm focused to release a fully working version during end of 2014.



News Update 27 Sep 2014

Several improvements made over the last 2 weeks to gain even better search speeds and more functionality working. During my testing and research, I wanted to see how much compression/uncompression of search data impacted on the search speed. It should be noted that during the last year, "lzma, 7z" was the compression used, and I'm amazed that even with no compression, the "zip" came in as 2nd best, with a fair compression ratio vs speed.

Here are results for a test database used:

Original .DAT file: 61mb - Rows = 7092984 (7.09mill) - Total Words = 8325224 (8.33mill)

no compression - 101mb:
50 shown / 56101 hits for: the in 0.03s (1.39s)

zip - 36mb:
50 shown / 56101 hits for: the in 0.03s (1.47s)

lzma,7z - 30mb:
50 shown / 56101 hits for: the in 0.04s (1.79s)

brieflz - 50mb:
50 shown / 56101 hits for: the in 0.08s (1.81s)


It seems that using "ZIP" compression yields the best search speed, as well as keeping the final database filesize down to a fair level. Looking at the raw database size vs ZIP, we can expect more or less 50% reduction. Based on this small study, I will choose "ZIP" as standard compression for SOTDS suites.

It should be noted that ZIP has a limit of 2GB, but that is not a problem for my database structure, as everything has been split into pieces, basically making it virtually impossible to reach 2GB just for a single package. It can be possible to achieve, but we are talking about databases that would most likely be billions of rows, or even bigger upto trillions, meaning, SOTDS could seem to handle that as well :-)



News Update 18 Sep 2014

A massive headache of debugging, tracing down, re-coding and extensive bug-fixing during the last week, I once again managed to make the search engine more memory friendly, more features working and on top of that, even faster searching (both direct hit and partial hit). Screenshots will appear again for the 81million rows database if any improvements are seen. For now, it seems I have managed to make the search engine more stable and does not crash anymore in addition to improving the search speed slightly for direct hits as seen in the "18 Sep 2014" pictures below.






Current Plan

The SOTDS Suite is planned to be released as a 3-part software solution which is:

1: SOTDS Generator Tools (let you create own databases into .DAT format) and will firstly arrive as dedicated tools for different main terms such as; file+directory indexer, PDF word indexer, disk/cd image indexer (.adf,.d64,.iso etc.)

2: SOTDS Constructor Tool (which loads your finalized .DAT file and converts and compress it into a suitable package for use with the search engine)

3: SOTDS Searcher (the main tool as illustrated on this page that will load your optimized database packages and give you almost instant results and outputs).


I have no planned release date of the mentioned tools, but as per. September 2014 all of the three tools have been programmed and are currently under heaving testing and tweaking. You may see some screenshots of those below (all preliminary, but still).




Screenshots of progress 2014-2012


17 Oct 2014:
Improved overall search speed again for direct hit!

Now 50 shown / 304177 hits for: the in 0.19s (2.92s)
out of a 188million word database!



17 Oct 2014:
Improved overall search speed again for direct hit!

Now 50 shown / 7676 hits for: robot in 0.08s (0.14s)
out of a 188million word database!



28 Sep 2014:
Improved overall search speed again for direct hit!

Now 50 shown / 268659 hits for: the in 0.16s (2.42s)
beating my old record of
50 shown / 268659 hits for: the in 0.25s (2.12s)



28 Sep 2014:
Improved overall search speed again for direct hit!

Now 50 shown / 6883 hits for: robot in 0.13s (0.18s)
beating my old record of
50 shown / 6883 hits for: the in 0.17s (0.23s)



28 Sep 2014:
Improved overall search speed again for partial search!

Now 50 shown / 16233 hits for: robot in 0.07s (4.02s)
beating my old record of
50 shown / 16233 hits for: robot in 0.19s (13.33s)



18 Sep 2014:
Direct hit search speed decreased a little again as a result of 1 week of heavy coding and tweaking.

50 shown / 268659 hits for: the in 0.25s (2.12s)
beating my old record of
50 shown / 268659 hits for: the in 0.26s (2.51s)



18 Sep 2014:
A major decrease in the first 50 hits now, but slower full list into memory generation. I'll have a look a my code and try to improve this during the next week or so.

"Partial" search for "robot", came out as "50 shown / 16233 hits" in just 0.19s (13.33s).

beating my old record of "50 shown / 16233 hits" in just 0.56s (7.18s).



Performing a direct word hit search for "robot", came out as "50 shown / 6883 hits" in just 0.17s (0.23s).

0.17s is the time from when you start the search until 50 first hits are shown fully completed in the listview.

0.23s is the time from when you start the search until ALL (6883) items has been fully complete and constructed as human readable text in memory, all ready to be pageflipped 50 per page as you please instantly. That is pretty amazing.

Take note that the complete database features 81million rows (163million word, since I have 2 columns) that can be searched, so it's pretty amazing.



Performing a slightly slower "Partial" search for "robot", came out as "50 shown / 16233 hits" in just 0.56s (7.18s).

Just days ago before some code optimization the search speed was 0.70s for the first 50.

0.56s is the time from when you start the search until 50 first hits are shown fully completed in the listview.

7.18s is the time from when you start the search until ALL (16233) items has been fully complete and constructed as human readable text in memory, all ready to be pageflipped 50 per page as you please instantly. That is pretty amazing too.

It might be improved in future, but for now its a proof of concept that even large databases can be handled.



Another test just to show the power of "Direct Hit" search.

50 shown / 268659 hits for: the in 0.26s (2.51s)

Take a look at the items, 268659 all constructed and ready to be pageflipped or exported, all in 2.51s.

And HTML export of this ended up as approx 45mb of data tables.

Now that's pretty amazing too.



From v2.0alpha from September 2014, showing 154 wildcard hits in just 0.04seconds out of a database with 1.75million registered entries!



First version of the "Generator" tools. This one will simply index your drives storing full paths, filenames and filesizes.



V2.0alpha Constructor Gui version (gui will slow down somewhat, but has a very low pri on updating while the Constructor is working - in addition, its nice to see what's going on).



V2.0alpha Constructor Command Line version (this is much faster naturally since there is no GUI to update).

However, due to very complex routines involving buffering to disk, merge sort, external sort, minimal memory usage and other strange spaghetti-code the Constructor is still slow. It can take hours to process 50million entries depending on how the actual human readable data is.



From v1.2 from 2012, showing 533 wildcard hits in 0.87seconds out of a database with 2.0 million registered entries.



From v1.3beta (probably during late 2012), showing 154 wildcard hits in 0.19seconds out of a database with 414533 registered entries.






------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Posted by: Programmer, Stone Oakvalley | Publisher: Website Designer, Stone Oakvalley
Last revised: December 07, 2022 - 17:31 | Page views: 1306


Website Design by post@stone-oakvalley-studios.com - Copyright © 2024 www.stone-oakvalley-studios.com