The approach

We will set up Tor, create a proxy service to tunnel our requests through tor, and use the STEM library to rotate the IP periodically.

Downloads

  1. Proxy Server: http://www.privoxy.org/
  2. Tor: https://www.torproject.org/download/download-easy.html.en
  3. Python: https://www.python.org/downloads/
  4. Pip: https://pip.pypa.io/en/stable/installing/

You probably have python and pip already installed. If you don’t, please follow the instructions in the link.

Both Privoxy and Tor has excellent installation guides for all major OSes, please follow that in the link to set it up correctly.

Configuration

Open privoxy config sudo vi /usr/local/etc/privoxy/config (The path might be different for Linux/Windows)

Append the following line. Notice the . in the end.

forward-socks5   /               127.0.0.1:9150 .

Restart privoxy

sudo /path/to/Privoxy/stopPrivoxy.sh
sudo /path/to/Privoxy/startPrivoxy.sh

Install stem library

Stem library will be used for managing Tor using Python.

pip install stem

Now, find the torrc file of your Tor installation for your respective operating system.

Current location in Mac OSX is:

~/Library/Application Support/TorBrowser-Data/Tor/torrc

Once you find it, modify it to add a control port by appending the following lines to the file:

ControlPort 9051
## If you enable the controlport, be sure to enable one of these
## authentication methods, to prevent attackers from accessing it.
HashedControlPassword 16:872860B76453A77D60CA2BB8C1A7042072093276A3D701AD684053EC4C

Save the file and restart Tor.

Test the setup

To test your tor setup, use the following script:

import requests
from stem import Signal
from stem.control import Controller

with Controller.from_port(port = 9051) as controller:
  controller.authenticate()
  controller.signal(Signal.NEWNYM)

proxies = {
  "http": "http://127.0.0.1:8118"
}
headers = {
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11'
}

r = requests.get("http://icanhazip.com", proxies=proxies, headers=headers)
print (r.text)

Everytime you run this, you should see a new IP.

Periodic IP rotation

The following script autorotates the IP periodically. Let’s call this rotate.py

import time
from stem import Signal
from stem.control import Controller

def main():
    while True:
        time.sleep(20)
        print ("Rotating IP")
        with Controller.from_port(port = 9051) as controller:
          controller.authenticate()
          controller.signal(Signal.NEWNYM)


if __name__ == '__main__':
    main()

Before you start up your scraper, you can run rotate.py so that it keeps rotating your Tor IP. In your crawler, use the following proxies and headers to tunnel your requests through the tor proxy.

proxies = {
  "http": "http://127.0.0.1:8118"
}

headers = {
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11'
}

r = requests.get("<request-url>", proxies=proxies, headers=headers)

That’s it! You have successfully set up periodic IP rotation. Thanks for reading!