If you pose the question, "Do you know how to use the Internet?" most people today will instantly respond with something along these lines: "Of course, I do! I just open up my browser, enter the name of what I want to find in the search bar, click the little button, and that's it." Indeed, this is perhaps all the average, non-tech-savvy Joe knows about using the Internet. Although this is perhaps sufficient most of the time, and could very well lead you to your desired results, especially with a powerful search engine, there are ways to optimize your queries. This paper is all about optimization for either casual surfing or doing recon. We will be dealing specifically with every hacker's favorite research tool: Google. The browser's plug-ins and add-ons aside, Google is comprised of four fundamental entities: The Google bots – programs running on Google's own servers that constantly surf the Web and collect data. Some of you might have noticed the Google bot here on HaxMe along with the other active users. The Google index – a massive storage unit (tens of billions) of Websites that skyrockets in number daily. When you submit a search query to Google, the index is what you search. PageRank is the name of the original indexing algorithm created by Google's founders Sergey Brin and Lawrence Page. The details of the current version are very hush-hush for obvious reasons. For those of you interested, the original algorithm is described in this white paper. The Google cache – as the bots scour the Net, they copy up to 101k of text they find for each page (including .html, .doc(x), .pdf and .ppt(x) – no images). If you once posted something on your MySpace or Facebook profile, and then deleted it, it could very well still be online. The Google API – a tool created by Google for computer programs to perform searches and retrieve results. This one is a little advanced for the needs of basic users, and beyond the scope of this paper. There is something else you may or may not know about Google: there is a limit of 1,000 results you can retrieve from the index for a single search. Yes, it is true that searching "Family Guy" will return over 48,000,000 hits, but all that means is there are that many pages in the index. However, you're only allowed to view the first thousand. For the average user, unless you're into data mining, you won't even bother after the first 30 or so. Below, you will find a few helpful directives, along with their stated purpose and examples in use, that will optimize your searches and return better results. Remember the following when using Google, either with or without directives: Avoid putting a space between the directive and search term – i.e., site:haxme.org, and not site: haxme.org Google searches are always case insensitive – i.e., kevin mitnick and KeViN mItNiCk will return the same results Google allows up to a max of ten search terms, including each directive – i.e., site:haxme.org tutorials contains two search terms. site:[domain] Google responds with results associated with the given domain. The domain could be very specific, referring to a given Website such as www.haxme.org, or less specific, like .gov to search for all government institutions with that suffix. To search for all occurrences of the word "tutorials" on Wade's domain, try site:haxme.org tutorials Literal matches (" ") You should know this one already. Quotation marks indicate to search for a literal match of the given search terms in that order. Otherwise, Google searches for the given terms in any order. To find all references to malicious code on HaxMe, while avoiding results that might say "this code is not malicious" or refer to other things like malicious people with codenames, try site:haxme.org "malicious code" link:[page] The directive shows all sites linked to the given Web page, possibly identifying a target's business relationships and customers. To see everyone that links to HTS, try link:www.hackthissite.org intitle:[term(s)] The directive looks for pages with titles that contain the given term(s). It's useful for finding sites that are configured to show an index of various file system directories. Use your imagination on what you could do with that. To see if HaxMe has any directories that are indexed and available via Wade's server, try site:www.haxme.org intitle:"index of" related:[page] The directive returns pages that are similar to the given page, based on Google's indexing algorithm. Sometimes, it returns useless junk. Other times, you find a crucial piece of information, like a business relationship. To find similar pages to HaxMe, try related:www.haxme.org cache:[page] The directive returns the contents of a page from Google's cache. Note that only the text of the page is retrieved from Google. Any images might come from the original site and any links you click on in the cached page will take you to their actual location, not another cached page. Because of this, Google's cache doesn't really enable anonymous surfing, but is immensely useful in finding recently removed or currently unavailable pages. To find the most recent view of HaxMe grabbed by Google bots, try cache:www.haxme.org filetype:[suffix] The directive searches only for files of the given type. To find all PDFs on HaxMe, try filetype:pdf site:www.haxme.org To find e-book versions of The Art of Deception, try filetype:pdf "the art of deception" phonebook:[name and city or state] The directive searches Google's residential and business phone books for the given terms. To search for all people or businesses named Simpson in Springfield, try phonebook:simpson springfield To search residential phone books only, use rphonebook – e.g., rphonebook:john smith california To search business phone books only, use bphonebook – e.g., bphonebook:goldman sachs new york Not ( - ) The directive filters out pages that include a given term. Along with the site: directive, this is one of the most useful capabilities in performing recon. If you want Google to return search results about panthers in Florida that have nothing to do with hockey, try florida panthers -hockey Plus ( + ) Google filters out certain words by default, like "a," "and," "where," "the," and "how." Use this directive if you deliberately wish to include such terms in your search. Note that this directive is not the opposite of the Not ( - ) directive. Putting a plus sign in front of a search term does not tell Google that all pages must contain that term. It just means that Google is not to filter it out. To search for the terms "where" and "how" on HaxMe, try site:www.haxme.org +where +how As you gain experience with Google's search directives, you will discover how to optimize your searches evermore by combining them. For example, suppose you want to target a university that you somehow learned stores student and staff passwords on Excel spreadsheets in unencrypted form. You could perform a search like this: site:dumbass-u.edu filetype:xls passwords Now, suppose you want to know the version of the university's database and maybe the dialect of SQL used to program it, but without pulling up a bunch of lecture slides about SQL from the Computer Science department. You could try this: site:dumbass-u.edu sql database -filetype:pdf Now that you know a thing or two about Google search engine optimization, the next time someone claims she knows how to use the Internet, ask her if she can pull up a search about Floridian panthers unrelated to the hockey team! Source: Counter Hack Reloaded, 2nd Edition by Ed Skoudis – Chapter 5, Phase 1: Reconnaissance Sources recommended by above source: 1. Google Hacking for Penetration Testers by Johnny Long and Ed Skoudis 2. Google Hacks by Tara Calishain and Rael Dornfest