unstructured data Archives

Who hasn’t felt frustrated by failed long search queries that bring up millions of irrelevant results because the keywords being used are not quite hitting the nail on the head and can’t retrieve the vital information one is looking for? Search engines are supposed to make life easier for us by doing all the dirty work and finding us the best results, but with more and more unstructured data to sift through daily, how can we even hope to find the exact bits of information inside a chaotic and unstructured web? What if we could find a way to make the web more intelligent and use a semantic solution to help us reason and talk-to unstructured data to find what we are looking for more easily? If there was a way to change keyword search into man-to-machine questioning communication, this could dramatically improve page returns on long, complicated search queries.

Say for instance the following search query is input into current search engines:

“What is the name of the small coffee shop with the red shutters behind the train station in the Center of Copenhagen that serves five types of strawberry cheesecake”

If the only relevant reference words on the web are “Maurice’s Deli Copernicus Square”, a word by word hit count gives a zero result. What hasn’t been done and what truly represents the “holy grail” in semantic analysis at this level, is an efficient multi-word to multi-word grammar and semantic transformation. This will make limited keyword search of today, yesterday’s technology.

It is highly likely that humans will “converse with data” semantically in the next 5-10 years. Research is currently being done that would help to facilitate “translation” between search language and data. This is a developing technology that endeavours to enhance the ability to deal with the informality and messiness of everyday verbal communication in its attempt to make more sense of unstructured data.

What would be hugely beneficial to society would be to make this semantic intelligence technology available to the masses. The current development of “semantic tools” could make a huge difference to how the public communicates with the web in future, facilitate communication on the social web and help to improve the extraction of useful data from a very chaotic and complex world of data.

According to an article written for the Economist (Feb 27th 2011), “Data, Data Everywhere”, it is said that while the exposure to abundant data brings huge benefits to society, it also brings a lot of frustrating problems and big headaches. Manipulating and using data for the greater good of society over time is most definitely a worthy cause. An even greater challenge is to not only find a way to enable humans to talk to data on the web, but in accomplishing that, to open the doors to exciting new ways of applying that technology across other data collections more broadly in future.

Gatfol Blog

Tag Archives: unstructured data

When Will We Start Talking To HAL?