Did you know that when you search the Web with Google, you are searching only about 0.2 percent of the Web. The remainder, more than 66,800 terabytes, is part of the Deep Web, or the part of the Web that search engines haven't indexed. (For reference, 1 terabyte is 50,000 trees made into paper and printed.)
So what does the Deep Web contain? According to Wikipedia,
- Dynamic content: pages that are created on the fly.
- Unlinked content: pages that aren't linked to and don't link to any other content.
- Private Web: password-protected pages.
- Contextual Web: pages that display different content depending on who or what you are.
- Limited access content: pages protected by CAPTCHAs or other technical methods.
- Non-HTML/text content: content in file formats not handled by search engines.
A lot of researchers are examining how to access this invisible content. Last week, one potential contender in the race to expose the Deep Web launched, DeepDyve. This search engine is using techniques used in the field of genomics, an approach that differs significantly from Google's approach. The company behind the search engine is marketing it as a research engine. So while it works for searches that bring up movie times, hockey game scores, and so on, DeepDyve aims to help researchers do better research.
Unfortunately, I don't do a lot of scholarly research, so I turned to the first academic that I thought of--my Dad--and found that he is cited in Wikipedia. But that doesn't really tell me whether DeepDyve is better at research than Google. So I'd love to hear from some of you who do more research than I do. What do you think of DeepDyve? - K