Since the release of v1.3beta I have been working hard on improving the entire concept, for now, I'll leave you with some comparisions from the progress as it has emerged during the last 2 years!
Below are a 81 million row, 2columns test database (that's 163million words that can be dug out). The current search speed is quite impressive!
You should note that the search speeds given out is from the moment you enter the search until you can actually see something visible in the window. The search speed is actually doing several "slow" processing such as, fileseek, read databank from disk, uncompress, match up and finally procdure human readable results.
Having that in mind, that the search speed involves reading from disk into memory and then additionally UNCOMPRESS the data, still SOTDS produces some extremly fast results. If the data was in memory and didn't have to uncompress, we could probably expect a drastic decrease in search speed, yeah, even down to practically instant (0.01ms or 0.00ms).
News Update 17 Oct 2014
Managed again to improve the search speed for my current test database. Not only improving but even with more data to show for my test search it beats my old 176millon word database after an update where the particular database has now expanded to 188million words. Check out the 17 Oct 2014 screenshots below.
I have only 2-3 minor bugs to take care of including 2-3 last remaining features which brings us closer to a final v2.0 release than ever. An impressive coding journey can soon be put to rest.
News Update 28 Sep 2014
Massive search speed improvements now showing its results after more coding and switching over to a "standardized" compression format for the SOTDS. See 3 pictures below marked with 28 Sep 2014. Everything smashing my old records. As far as I'm concerned, there isn't much else that can be done to improve the search speed, as even with no compression in place, the benefits are slightly better, but requires much more space on disk for database. Since one of my goals was to make the searchable database SMALLER than the original with pretty fair search speeds, this is it!
During the upcoming weeks I will continue to work on GUI optimizations, generate more real databases to test with along bringing in the last planned features in the Search Engine. I'm focused to release a fully working version during end of 2014.
News Update 27 Sep 2014
Several improvements made over the last 2 weeks to gain even better search speeds and more functionality working. During my testing and research, I wanted to see how much compression/uncompression of search data impacted on the search speed. It should be noted that during the last year, "lzma, 7z" was the compression used, and I'm amazed that even with no compression, the "zip" came in as 2nd best, with a fair compression ratio vs speed.
Here are results for a test database used:
Original .DAT file: 61mb - Rows = 7092984 (7.09mill) - Total Words = 8325224 (8.33mill)
no compression - 101mb:
50 shown / 56101 hits for: the in 0.03s (1.39s)
zip - 36mb:
50 shown / 56101 hits for: the in 0.03s (1.47s)
lzma,7z - 30mb:
50 shown / 56101 hits for: the in 0.04s (1.79s)
brieflz - 50mb:
50 shown / 56101 hits for: the in 0.08s (1.81s)
It seems that using "ZIP" compression yields the best search speed, as well as keeping the final database filesize down to a fair level. Looking at the raw database size vs ZIP, we can expect more or less 50% reduction. Based on this small study, I will choose "ZIP" as standard compression for SOTDS suites.
It should be noted that ZIP has a limit of 2GB, but that is not a problem for my database structure, as everything has been split into pieces, basically making it virtually impossible to reach 2GB just for a single package. It can be possible to achieve, but we are talking about databases that would most likely be billions of rows, or even bigger upto trillions, meaning, SOTDS could seem to handle that as well :-)
News Update 18 Sep 2014
A massive headache of debugging, tracing down, re-coding and extensive bug-fixing during the last week, I once again managed to make the search engine more memory friendly, more features working and on top of that, even faster searching (both direct hit and partial hit). Screenshots will appear again for the 81million rows database if any improvements are seen. For now, it seems I have managed to make the search engine more stable and does not crash anymore in addition to improving the search speed slightly for direct hits as seen in the "18 Sep 2014" pictures below.
Current Plan
The SOTDS Suite is planned to be released as a 3-part software solution which is:
1: SOTDS Generator Tools (let you create own databases into .DAT format) and will firstly arrive as dedicated tools for different main terms such as; file+directory indexer, PDF word indexer, disk/cd image indexer (.adf,.d64,.iso etc.)
2: SOTDS Constructor Tool (which loads your finalized .DAT file and converts and compress it into a suitable package for use with the search engine)
3: SOTDS Searcher (the main tool as illustrated on this page that will load your optimized database packages and give you almost instant results and outputs).
I have no planned release date of the mentioned tools, but as per. September 2014 all of the three tools have been programmed and are currently under heaving testing and tweaking. You may see some screenshots of those below (all preliminary, but still).
Screenshots of progress 2014-2012