O N E   C R A Z Y   M U L T I   T A L E N T Today its Friday, March 29, 2024 @
8:36:46


SOTDS vs SQL (ranting)
in Software | Saturday, September 20, 2014 | 23:03


Just a natural observation and some ranting about how useless the SQL DB format really is!


Since I always planned to make the databases small on disk, use as less as memory as possible and even work on slower systems, the results are pretty darn impressive (compare that to a bloated, overkilled SQL setup where we're talking gigabytes of memory and diskspace needed as well as cpu server or even more of them).

Both SQL and SOTDS testing as shown was run on a Intel Core2 Quad CPU, Q6600, 2.40ghz and 6gb memory with 3.5 HDD drives (probably Green WD type).

For example, my test database is around 939mb in my own raw format (human readable, with some key characters in place), the resulting compressed database on disk is actually only 388mb on disk all finalized and ready to be searched within.

Now, the same exact file exported as CSV becomes 6GB (6 gigabytes!) and a SQLITE readymade DB ended up at 12gb.

What the hell? Are they insane!

Not to mention, searching in SQL LITE command tool version caused and immense disk grinding I have never seen the likes of. My SOTDS Search Engine does not that at all :-) !


The SOTDS Constructor will actually pre-calculate every search possible and remove duplicates of words, leaving only numbered indexes left. This is the reason why SOTDS is superior next to SQL format which naturally also prevent disk grinding and most likely gives better search speed than the commercialized and industrialized SQL does.

How can SQL (which was invented in 1974, 40 years ago with countless improvements and involvements of thousands of people since then) become worse than my SOTDS Project (1 man+3 years) ?!

Oh well.



First up is my SOTDS dat format database in its human readable format.

- The first line (ACD) is just an internal code line.

- The second line we could call a "container" for all the words that existed in that pdf file. You can see path/filename, filesize and a magazine name.

The third line and the following is simply word, pagenumber found in the entire pdf file (dupes removed already).

This file is 933mb big.



Now, the same 933mb file converted to standard TAB formatted CSV file that contains the same stuff, only problem is, we have to duplicate the path/filename for every row. Oh well.

The file ended up as whopping 6gigabytes.



Then finally after some hours, the CSV file was converted to SQL LITE3 DB format...ended up as 12.7gb ridiculous and redudant dataflow. Why they duplicate everything, seriously?!!

No wonder you need fast drives, cpu and other hardware park area to deal with shit like this.

What an utterly bad waste of diskspace.




After constructing my original SOTDS dat formatted file as shown in picture 1, we end up with something like this. My human readable 933mb file is now only 388mb of data.

Go figure :-)

That compressed data and special formatting ensures you can search through at the same speed, even faster then any SPHINX/SQL (I presume) instead of that monster file of 12GB.

They all contain the same data!
- SOTDS DAT Human readable: 933mb
- CSV: 6gb
- SQL DB: 12gb
- SOTDS DAT Compressed: 388mb

The SOTDS DAT Compressed format is the one that is used by the SOTDS Search Engine to show the same results with almost no disk grinding (as SQL searching did).



Now for a real world test. With the massive 12gb database file that contains 81million rows, with 5 columns, searching for "robot" with human readable export to a CSV file:

CPU Time: user 0.140401 sys 0.234002 in milliseconds.

The export of the CSV file with 6883 rows took 43! seconds to generate, about 900kb, along with a massive disk grinding!

Now, check what SOTDS can do to the right ->

Proof enough?




Same search for "robot" in my SOTDS search engine, where 6883 rows of the SAME data was exported into memory in just 0.23seconds and *NO* disk grinding!

Don't forget that SOTDS is currently only beeing tested in a GUI environment which would naturally cause some extra delays. One day, I might have an CMD version ready, and by then, SQL search performance would be further worse than SOTDS. HAH!

Looking at SQL it says user and sys combined of 0.14+0.23=0.37seconds, BUT to be able to see the results, wait for generation of output file that took 43 seconds.

SOTDS beats SQL!

Remember: SOTDS uncompresses small 7zip packages which also adds search time, and its quite impressive that even with uncompression in place, SOTDS is today just as fast as SQL.

If my calculations are correct, my 388mb compressed database in a uncompressed form would be around 2gb, still it's not 12gb as SQL ends up with, but the search speed for SOTDS would be even smaller for sure beating SQL to the dust forever.







------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Posted by: Programmer, Stone Oakvalley | Publisher: Website Designer, Stone Oakvalley
Last revised: December 07, 2022 - 17:31 | Page views: 1052


Website Design by post@stone-oakvalley-studios.com - Copyright © 2024 www.stone-oakvalley-studios.com