By Prashant Mali
The surface web is the entire Internet for most users, but it represents a fraction of available content. The surface web is that part of the Internet that is accessible by standard search engines, either by indexing, or through use of the site’s IP address. By contrast, the deep web is unfamiliar to most of the public and is larger by orders of magnitude.
Characterised as the submerged part of the iceberg, researchers describe the deep web’s size in various and conflicting ways: over 96 percent of content on the world wide web, unguessable, 7500 terabytes, infinite, and 500x the size of the surface web. Although imprecise, these estimates indicate that the deep web contains much more content than the surface web. Generally speaking, the deep web is the content not indexed by standard search engines, like Google.
The only U.S. court that has attempted to define the deep web, described it as follows:
"The portion of the Web that is not theoretically indexable through
the use of “spidering” technology, because other Web pages do not
link to it, is called the “Deep Web.” Such sites or pages can still be
made publically accessible without being publically indexable by,
for example, using individual or mass emailings (also known as
“spam”) to distribute the URL to potential readers or customers, or
by using types of Web links that cannot be found by spiders but can
be seen and used by readers.
The deep web contains all manner of content including text, photographs, videos, and music. Large academic, library, and proprietary databases are stored on the deep web, including core content from the U.S. Patent and
Trademark Office, Thomson Reuters Westlaw, and NASA.
The distinctions between the deep web and the surface web are sometimes imprecise because content on the deep web can be “surfaced” in several ways. Similarly, the deep web can be searched even though it is not indexed like the surface web. While research in the deep web requires considerable technical facility, specialized deep web browsers, like Tor, allow visitors to browse the deep web without having to rely entirely on pre-identified URLs.
The dark web has been characterized as a subset of the deep web. Controversial and illicit transactions reputedly transpire on the dark web, including human trafficking, narcotic sales, and contracts for killings. The dark web relies on anonymity tools to conceal both the seeker and the provider of such services.It is not accessible through surface web browsers like Internet Explorer or Firefox, but is accessible via specialized and anonymized browsers such as Tor or I2P.
Tor facilitates browsing of dark web services without disclosing the user’s IP address, which would otherwise reveal the user’s network identity and location.
The Tor protocol leverages pseudomains like .onion as well as anonymous introduction points and relays between users, making de-anonymization difficult.
While the dark web and deep web contain criminal elements, both are routinely used for less nefarious purposes by those seeking anonymity. The U.S. Navy uses Tor for intelligence gathering. Journalists pursue controversial leads in the deep web to avoid government monitoring.An array of law enforcement agencies search for illicit conduct using Tor because Tor hides government IP addresses, ensuring covert surveillance.Whistleblowers reveal corporate and governmental malfeasance on the deep web to avoid retribution.
But increasingly, normal Internet users opt for deep web browsing simply for additional privacy. Tor’s website states that Tor “prevents somebody watching your Internet connection from learning what sites you visit, and it prevents
the sites you visit from learning your physical location.” Invasive commercial browsers and search engines cannot monitor, collect, aggregate, and sell user information, like browsing history, if the user is effectively hidden while searching the web. Similarly, governmental surveillance is
rendered substantially more difficult.