Tag Archives: search input query

The Bible Written by Hitler!

Authors of books invariably have their own writing styles – wouldn’t it be interesting to see how different texts would read if re-written by different authors?

Combining textual data and language patterns from disparate sources is extremely difficult, because the end result needs to be grammatically and semantically accurate for the aggregated sentences to be meaningful and fully understood.

The phrase “the rhinoceros is feeding on grass in the veld” is semantically and grammatically correct, whereas “feeding veld in grass on is the rhinoceros” is semantically correct (it is physically and logically possible for a rhinoceros in our world to perform this action), but obviously grammatically faulty. If we say “the rhinoceros is flying through the air and catching insects”, it makes grammar sense, but it is not in semantic meaning possible in the currently physically constrained world that we live in.

Being able to find interesting ways to re-write the Bible by combining texts is unusual, but also scientifically relevant as it can be shown that a very real technological application can flow from this.  

At present, keyword-based extraction from datasets is typically the entry point of human interaction with electronic data. The past decade has not only seen a phenomenal growth in data worldwide, but datasets have grown so diffuse that keyword searches are becoming ineffective in returning adequate and meaningful results.

What if the same technology that enables the Bible to be written by Hitler could also be applied to increase the retrieval power of electronic search?

When looking at a search input query, computers are currently unaware of any invisible words or concepts similar in meaning and natural language grammar structure related to this query. If computers were able to “see” an accurate “image” of our world and understand our real world in human terms, then any input query could be rephrased with similar words of any group size that fits both grammatically and semantically.

For example: National security agencies could massively broaden the amount of dataset “hits” for given target words if similar concepts around the target words are also simultaneously investigated. Danger words such as “suicide attack” can be triggered in searches even if only inoffensive words such as “sacrifice, impact, cause, final” are present in the dataset under analysis.

The implications for targeted advertising, online retail searches and even the simple act of matching Dave who likes “spaghetti bolognaise” with Mary that loves “Italian restaurants” on a dating site are massive.

Unfortunately – very few semantic technologies currently available can render the required multiword to multiword functionality.

What is needed is an elegant “engine”, that can create a computer “readable” realistic world image straight from random web crawling with both semantic and grammar accuracy. Being able to get computers to understand grammar without using grammar rules or classifications requiring human interpretation is surprisingly difficult. It is, however, very attractive, because no grammar ontologies are needed. This technology could work just as easily in Arabic, Mandarin, Chinese, Russian or French as it does in English. A small compact engine that could successfully synonym multiword groups to other multiword groups quickly and on a massive scale, while allowing for trillions of concept permutations to adequately represent data across the full cybersphere, will represent a major step forward in the development of semantic technologies.

The reward would be for current technologies to evolve gradually from keyword search to eventual human-machine conversational interaction – and the seemingly unrelated process of rewriting the Bible therefore subtly leads us into a technological future that could give us the ultimate HAL 9000.

To see what the Bible would look like in Hitler’s hand, have a look at…

http://www.gatfol.com/gatfolpower/strangerthanfiction.html