Email Web Crawler

This simple python script allow you to scan websites looking for email addresses.

In the begin, you have to set a seed url to have a initial point to scan

urlsToProcess = deque(['']) 

In the main loop (who runs until there is no more urls) process url looking for more urls to feed the deque urlsToProcess

url = urlsToProcess.popleft()
    urlsFile.write(url + "\n")
    parts = urlparse(url)
    base_url = "{0.scheme}://{0.netloc}".format(parts)
    path = url[:url.rfind('/')+1] if '/' in parts.path else url
    print("Crawling site: %s" % url)
        response = requests.get(url)

for anchor in soup.find_all("a"):
        # extract link
        link = anchor.attrs["href"] if "href" in anchor.attrs else ''
        if link.startswith('/'):
            link = base_url + link
        elif not link.startswith('http'):
            link = path + link
        # add url to the the list
        if not link in urlsToProcess and not link in processed_urls:

And scan and save the emails in a file.

new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", response.text, re.I))
    for email in new_emails:
        if email not in emails:
            emailsFile.write(email + "\n")

To donwload the full script clone/fork it from bitbucket zxcoders