Monthly Archives: January 2012

Gatfol and the Complexity of Real World Search Queries…


Examples of the length and semantic diversity in what we would like to ask from current search engines:

Real world queries……..we have a problem…

-What is a spade flat head with a worm like body that has black stripes down center of body and moves slow about 2 inches long and 1 centermetre wide the body was all the same length?
-Is there a movie or video that has footage of the deep dark corners of the sea that doesnt have a whole lot going on just some cool looking fish doing their thing?
-Why does an organism that lives on land have more complex mechanisms to maintain a stable internal environment than organisms that live in water?
-There is this roomer going around that says that 7 people have already been eaten at ratanga by the crocidiles at crocidile gorge is it true?
-What are the three terms that are indicative of what kind of food source an animal from which it derives it’s primary of energy supply?
-Which of the following are adaptations to remaining afloat and controlling bouyancy found in many pelagic organisms except?
-What is the asexual reproduction method conducted by starfish in which new organisms can grow from its sections?
-What is the maze of passageways that carries proteins and other materials from one part of the cell to another?
-What are some of the conditions or requirements that distributed database would be better for an organisation?
-Explain what Welty means by the habit of love and tell why this habit might be compared to a worn path?
-Why humans have a higher density of receptors for touch in some areas of skin than in other areas?
-How do put together your bio-ball aquaium filter system i bought used use everything in a a box?
-How does the body’s arrangement of nerve receptors for touch make the Braille system possible?
-Why do the survival of the whole organism depends on the functioning of its individual cells?
– What is the relationship between the carbon in the ocean and the carbon in the soil?
-If the dna sequence is tacgttaccgagctagact then what is the sequence of the messenger rna?
-The effect on ancient cultures and traditions of people of cutting the old-growth forest?
-Why there is more species diversity in southern Florida than there is in northern Alaska?

Actual questions submitted to wiki.answers.com

Gatfol provides massive stealth power in a fraction of a second to enable this level of complicated search input into any data set.

Gatfol Evolution in the World of Data


TIMELINE PROJECTION TO GATFOL ENVISIONED “REASONED SEARCH”


Humans will “speak to data” semantically in 5-10 years. Gatfol will be exceptionally well placed to provide an integral part of this process as Gatfol Search Technology provides “translation” between human natural language and data. Gatfol enables humans to reason with information.

VOLUME BASED DIAGRAMMATIC PROJECTION TO GATFOL NATURAL LANGUAGE INTELLIGENCE

Unstructured datasets worldwide are growing large and diffuse – keyword searches have become ineffective in returning adequate results.  Large search engines currently experience a search failure rate of 20% – 25% for queries of longer lengths, or approximately 900 million online searches per day. Gatfol aims to remedy this.

Gatfol Startup Announces Showcased Technology, Core Algorithm Patent Filing and Seed Funding

Radical new search advance rewrites Shakespeare and Einstein to validate a quantum leap in human-to-machine conversation.


Centurion, South Africa PRWEB January 12, 2012

Gatfol Technology represents a compact, elegant and massively scalable technique for changing keyword search into man-to-machine questioning communication. The technology provides an invisible (to the user) universal tool that acts as interpreter between unsophisticated user input and what application based information “requires” for efficient retrieval.

Founder and CEO Carl Greyling mentions that he believes the Gatfol technology is now so universally powerful that he challenged the development team to apply it beyond search and re-write some of the most iconic literature in the history of mankind. They obliged by providing the almost impossible test – using Gatfol power to re-write sections of Shakespeare, Hitler and Einstein’s main works. “Just imagine what we are going to do for search in 2012 if you take a look at this” Carl proudly proclaims.

As an additional achievement, Carl today announced that seed funding has been obtained from private New Zealand-based investors to fund pre-Beta testing of Gatfol’s free browser technology scheduled for release towards the end of first quarter 2012. Carl says “as we are getting more exposure and interest worldwide as an innovative technological start-up, the right investors have stepped in at the right time to take the Gatfol project forward successfully.”

Carl acknowledges that conceptual thinking around the Gatfol idea has been long in coming, but points out that human-to-machine communication and the resultant commercial search avenue derivatives are the quintessential funding “minefields”, leading to many start-up corpses and investment disasters over the last 40 years.

Carl added that “I see current basic keyword-driven search technology as an extreme disappointment and failure compared to how we have wanted to question computers through natural language since the 1960’s. As opposed to common agreement my view is that after decades computers still cannot see our human world – and we cannot talk to them about it. Current search is still at caveman-speak three keyword level. The difference that our technology makes is that it can analyse everyday word relationships at power levels several magnitudes more subtle than that presently available”.

As to seed funding timing, Carl explains that the algorithm technology was finally patented in the United States in 2011 and together with the advanced state of prototype development, now provides a firm base to present to investors. This led to Carl delivering a talk showcasing the Gatfol technology at the world’s largest semantic intelligence convention SemTech 2011 held annually in San Francisco. “Funding interest after SemTech 2011 has been fairly robust” Carl acknowledges thankfully.

He goes on further to say that the eventual two investors provide a strong match, given the decision-making freedom they allow the Gatfol management. “This is essential in the natural language semantic field as research projections around the ‘messiness’ and ‘greyness’ of language analysis present a multitude of possible development paths at any one time – much more so than in the exact sciences. Funding providers that can understand and accept this are few and far between.”

The investors were also drawn in by the fact that whilst in San Francisco, Carl had discussions with representatives of a large search engine and an agreement was reached whereby Gatfol would present a working prototype for web search enhancement in the first half of 2012, upon which a possible commercial liaison with the search engine could be negotiated.

Those interested in learning more about Gatfol Search Technology can visit http://www.gatfol.com/individuals.html or to see Shakespeare, Hitler and Einstein re-written using Gatfol power, visit http://www.gatfol.com/gatfolpower/strangerthanfiction.html. Gatfol Blog at http://www.gatfol.com/blog. For more information contact Carl Greyling at Gatfol on +27 82 5902993.

Can a Machine Rewrite Shakespeare?

Even with mankind firmly in the 21st century and more than four decades after HAL from Space Odyssey 2001 showed humans the way to go, it seems unfathomable that the everyday ability to converse with machine information in natural language is not a reality after all these years.

Wouldn’t it be amazing if we could get computers for instance to read Shakespeare’s plays and perhaps re-write them using different words – and not just by applying synonym replacements or related concepts?  Imagine even a machine giving us a completely “new” play in Shakespeare’s writing style? Just think of the impact that reaching this technological level would have on a rapidly exploding data world that hungers for increased semantic intelligence filtering capabilities.

Why are we not there yet?

What hasn’t been done and what represents the “holy grail” in semantic analysis on a technical level, is an efficient multi-word to multi-word grammar and semantic transformation in text input-output flow.

Current semantic (meaning in language data) technologies available are crippled by a serious flaw. It is difficult to generate a semantically and grammatically equivalent body of text from an existing repository of language patterns and word combinations. Additionally, sentence structure and logical meaning flow have to fit in with the physical and rational make-up of the world we live in.

The flaw comes in when we literally have to “show computers our world”.  By attempting to “categorise” words or concepts beyond the English left-right, Arabic right-left, or Chinese up-down reading and writing order, most of the modern semantic intelligence technologies delivers a level of complexity that is unsustainable in terms of permutations.

By laying down logical concept rules, such as “a dog is alive” and “things that are alive replicates” giving us “a dog replicates”, current technologies hope to be able to create systems that generate and perpetuate rules of logic – and eventually represent some type of “machine intelligence” on a level with human thinking.

Categorisation systems very quickly run into the “permutation problem”: imagine any sentence of about 8-10 words, i.e. “I really appreciate my mother in the morning”. What would happen if we replace let’s say each word with 10 equivalent words that fit both grammatically and semantically? i.e. “I definitely/positively/demonstratively…” “like/admire/love my mother…”. Taking the original word phrase and randomly inserting the replacement words in all possible groupings that still make sense, we get 100 million phrases that are ALL grammatically and semantically equivalent – and we are only still saying that we feel positive about our mother some time early in the day!

Even the smallest body of text of even minimum complexity, obviously has trillions upon trillions upon trillions of grammar-semantic equivalents. In the usage of these logical categorisation systems, we just do not have the concept-combination multiplication power to cover the permutation problem. World-wide effort since the 1980’s around ontological classifications, hierarchical categorisation, entity collections and logic rule based systems have therefore not succeeded quite as envisaged. We can think of CYC, OpenCYC, Mindpixels and Wordnet amongst many.

“Permutations” is the villain that everyone hopes will disappear with “just a few more categorisations…”

Alas, it will not.

What is needed is a small compact “semantic engine” that can “see” our world and that will enable trillions of concept permutations to adequately represent the resulting image.

With an abundance of data in a complex and highly unstructured web and without a powerful enough “engine”, we really don’t have much chance of ordering and classifying this data such that all concepts inside of it relates to everything else in a manner that resembles our real human world holistically.

The search is therefore on for a technology that could take a quantum leap into the future. If we can start by enabling machines to “rewrite Shakespeare”, we should be able to develop an innovative, ontology-free, massively scalable, algorithm technology that requires no human intervention and that could act as librarian between humans and data.

The day when humans are able to easily talk-to, reason and “casually converse” with unstructured data will lead to a giant leap in the human-machine symbiosis and – after far too long a wait – in our lifetime we can perhaps still experience a true Turing “awakening”.

 To see a version of Shakespeare’s Hamlet re-written by a machine, have a look at…

http://gatfol.com/gatfolpower/strangerthanfiction.html

The Bible Written by Hitler!

Authors of books invariably have their own writing styles – wouldn’t it be interesting to see how different texts would read if re-written by different authors?

Combining textual data and language patterns from disparate sources is extremely difficult, because the end result needs to be grammatically and semantically accurate for the aggregated sentences to be meaningful and fully understood.

The phrase “the rhinoceros is feeding on grass in the veld” is semantically and grammatically correct, whereas “feeding veld in grass on is the rhinoceros” is semantically correct (it is physically and logically possible for a rhinoceros in our world to perform this action), but obviously grammatically faulty. If we say “the rhinoceros is flying through the air and catching insects”, it makes grammar sense, but it is not in semantic meaning possible in the currently physically constrained world that we live in.

Being able to find interesting ways to re-write the Bible by combining texts is unusual, but also scientifically relevant as it can be shown that a very real technological application can flow from this.  

At present, keyword-based extraction from datasets is typically the entry point of human interaction with electronic data. The past decade has not only seen a phenomenal growth in data worldwide, but datasets have grown so diffuse that keyword searches are becoming ineffective in returning adequate and meaningful results.

What if the same technology that enables the Bible to be written by Hitler could also be applied to increase the retrieval power of electronic search?

When looking at a search input query, computers are currently unaware of any invisible words or concepts similar in meaning and natural language grammar structure related to this query. If computers were able to “see” an accurate “image” of our world and understand our real world in human terms, then any input query could be rephrased with similar words of any group size that fits both grammatically and semantically.

For example: National security agencies could massively broaden the amount of dataset “hits” for given target words if similar concepts around the target words are also simultaneously investigated. Danger words such as “suicide attack” can be triggered in searches even if only inoffensive words such as “sacrifice, impact, cause, final” are present in the dataset under analysis.

The implications for targeted advertising, online retail searches and even the simple act of matching Dave who likes “spaghetti bolognaise” with Mary that loves “Italian restaurants” on a dating site are massive.

Unfortunately – very few semantic technologies currently available can render the required multiword to multiword functionality.

What is needed is an elegant “engine”, that can create a computer “readable” realistic world image straight from random web crawling with both semantic and grammar accuracy. Being able to get computers to understand grammar without using grammar rules or classifications requiring human interpretation is surprisingly difficult. It is, however, very attractive, because no grammar ontologies are needed. This technology could work just as easily in Arabic, Mandarin, Chinese, Russian or French as it does in English. A small compact engine that could successfully synonym multiword groups to other multiword groups quickly and on a massive scale, while allowing for trillions of concept permutations to adequately represent data across the full cybersphere, will represent a major step forward in the development of semantic technologies.

The reward would be for current technologies to evolve gradually from keyword search to eventual human-machine conversational interaction – and the seemingly unrelated process of rewriting the Bible therefore subtly leads us into a technological future that could give us the ultimate HAL 9000.

To see what the Bible would look like in Hitler’s hand, have a look at…

http://www.gatfol.com/gatfolpower/strangerthanfiction.html

George W Bush: Another Einstein?

Einstein’s “relativity theory” is mentally challenging.

An interesting experiment would be to mix George Bush’s language patterns into this complexity. This process will be shown – in analogy –  to mirror human-to-machine communication where it is possible for juxtaposing datasets to be integrated and information to be “smoothed” for interpretation at many entry points.  

Why is it still not possible for machines to “see” our human world and thus allow computers to converse and communicate with humans in natural language in an “everyday” manner?

With an abundance of unstructured data and with traffic flowing over the internet growing faster than the current network will be able to carry by 2013, we almost compulsively expect to extract useful information continuously to enhance personal and business decisions. Additionally, as the accumulation of data in proprietary databases and data repositories increases, it is essential to find more efficient ways of making information retrieval and data usage super-accessible.

A senior manager at a bank wants to obtain information about clients’ aggregated personal circumstances and financial needs: from a large repository of “unstructured” data; how does he or she know what to ask in order to identify the most relevant information? A new or extended bond for a client could be on offer if the manager knew of a planned home move by the client, for instance. Study loans could be on offer for children requiring further education, or a larger insurance package suggested if the manager knew that the client’s existing insurance was currently inadequate. The unstructured repository causes difficulties because the same question can be phrased in many different ways with different grammar analysis and semantic word combinations. Frustratingly, the retrieval results always differ according to options selected.

Current semantic technologies use extensive ontologies and categorisation systems, the mere design of which leads to severe “permutation” problems. Even the smallest body of text simply structured can generate trillions upon trillions of grammar-semantic equivalents. Given the mathematics, it is not surprising that we still do not have the concept-combination multiplication power able to adequately address the permutation problem.

What is needed is a compact and powerful representational matrix that can act as interpreter between human language and data to generate the trillions of concept permutations that adequately represent our real world. A machine that is in this way – able to “see” – could ensure that all of our human world concept combinations and language patterns relate to everything else logically and realistically in a stored electronic format.

Coming back to George Bush and Einstein’s relativity theory – a semantic “engine” that can easily perform the integration of language patterns between totally disparate textual sources, could also in a corresponding manner enable us to create multiple equivalent searches in the whole search engine- and proprietary database query universe. Searching for a needle in a haystack using a thousand magnifying glasses is a handy analogy to mirror the trillions of concept permutations that can in this way be provided to meaningfully and adequately represent any mass of unstructured data.

The application is definitely underpinned by a need, as most search engines are still three-word, caveman-speak search phrase solutions limited by keywords, at best only interchangeable with synonyms or related words. Effective multi-word to multi-word exchange technology is almost non-existent.

If successful, we might perhaps soon be able to expand beyond – and finally say goodbye to – classic keyword search and information retrieval. The immediate goal is “humanising” of cyber dataspace with seamless application of a semantic technology that enables mankind to converse and communicate effectively with data across a wide information spectrum.

 To see how George Bush subtly explains Einstein’s relativity, have a look at…

http://www.gatfol.com/gatfolpower/strangerthanfiction.html

When Will We Start Talking To HAL?

Who hasn’t felt frustrated by failed long search queries that bring up millions of irrelevant results because the keywords being used are not quite hitting the nail on the head and can’t retrieve the vital information one is looking for? Search engines are supposed to make life easier for us by doing all the dirty work and finding us the best results, but with more and more unstructured data to sift through daily, how can we even hope to find the exact bits of information inside a chaotic and unstructured web? What if we could find a way to make the web more intelligent and use a semantic solution to help us reason and talk-to unstructured data to find what we are looking for more easily? If there was a way to change keyword search into man-to-machine questioning communication, this could dramatically improve page returns on long, complicated search queries.

Say for instance the following search query is input into current search engines:

“What is the name of the small coffee shop with the red shutters behind the train station in the Center of Copenhagen that serves five types of strawberry cheesecake”

If the only relevant reference words on the web are “Maurice’s Deli Copernicus Square”, a word by word hit count gives a zero result. What hasn’t been done and what truly represents the “holy grail” in semantic analysis at this level, is an efficient multi-word to multi-word grammar and semantic transformation. This will make limited keyword search of today, yesterday’s technology.

It is highly likely that humans will “converse with data” semantically in the next 5-10 years. Research is currently being done that would help to facilitate “translation” between search language and data. This is a developing technology that endeavours to enhance the ability to deal with the informality and messiness of everyday verbal communication in its attempt to make more sense of unstructured data.

What would be hugely beneficial to society would be to make this semantic intelligence technology available to the masses. The current development of “semantic tools” could make a huge difference to how the public communicates with the web in future, facilitate communication on the social web and help to improve the extraction of useful data from a very chaotic and complex world of data.

According to an article written for the Economist (Feb 27th 2011), “Data, Data Everywhere”, it is said that while the exposure to abundant data brings huge benefits to society, it also brings a lot of frustrating problems and big headaches. Manipulating and using data for the greater good of society over time is most definitely a worthy cause. An even greater challenge is to not only find a way to enable humans to talk to data on the web, but in accomplishing that, to open the doors to exciting new ways of applying that technology  across other data collections more broadly in future.