One of the more interesting extensions to the BitTorrent protocol has been the introduction of a distributed hash table implementation. As mentioned in my previous article on the basics of the BitTorrent protocol, traditionally BitTorrent relies upon a centralised "tracker" application - which runs over standard HTTP - in order to facilitate contacting peers and so on. The requirement for a centralised tracker is obviously a major barrier to fully decentralised operation, and a problem in terms of BitTorrent's resistance to tracker outage (perhaps even caused by legal actions). In part one of this article I'm going to look a bit at the network side of BitTorrent's DHT. The official BitTorrent DHT specification states that the protocol is based on Kademilia. In BitTorrent, DHT is mostly separated from the original protocol. Peers listen on an additional port, using a UDP protocol, to issue network searches and so forth. The DHT protocol is known as KRPC and consists of three message types - query ("q"), response ("r") and error ("e"). There are four queries:

  • PING
  • FIND_NODE
  • GET_PEERS
  • ANNOUNCE_PEER
Each KRPC message is formatted in B-Encode, with various key-value pairs encoded in a dictionary structure. Each node participating in the DHT has its own routing table containing contact information for nodes "near" to it (according to a mathematical 'distance function'). This routing table is used to produce responses to queries. We will go more into details of how this works in the next article. The TCP BitTorrent peer protocol itself has only been very slightly extended in order to take advantage of this new DHT functionality. During the handshake, peers may indicate that they support DHT. If the receiver also supports DHT, it should send a new message called PORT (id is 0x09) with the port number of its UDP DHT service, encoded as a 16-bit value. The receiver of this message will then try to send a PING message to the peer on this port, and if it gets a reply, routing table adjustments etc should take place. Those are the very basics of BitTorrent DHT from a network perspective. I will write more about the details of the algorithms used for routing and how the consistent hashing is employed in part two.

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments

Monte Carlo simulation in Python #1

July 05, 2007 at 11:15 AM | categories: Technical, Maths, Python | View Comments |

I became interested in Monte Carlo simulation after reading Fooled By Randomness, the author of which makes numerous references to the power of these simulators. One of the first things I learned was that "Monte Carlo methods" is a term covering pretty much any use of pseudo-randomness to help solve any kind of problem. Apparently, Monte Carlo is an old name for what is now commonly known as a roulette wheel, hence the relation to randomness.

One of the fascinating examples of a Monte Carlo simulator described in Fooled by Randomness is the use of pseudo-random numbers to calculate an approximation of the value of Pi. Imagine a dart board inside of a square. If you throw darts, in a random fashion, at the square, counting the number of darts which land on the dart board versus the number which land on the square, you can approximate Pi with some simple arithmetic.

After reading about this, I simply had to write a program to work it out. I figured Python would be a good language in which to hack it out. Indeed, it turns out that Python comes with a good pseudo-random number module in its standard library. Here is the code for my simple Pi approximator, which throws 1,000,000 virtual darts:

from random import random
from math import pow, sqrt

DARTS=1000000
hits = 0
throws = 0
for i in range (1, DARTS):
	throws += 1
	x = random()
	y = random()
	dist = sqrt(pow(x, 2) + pow(y, 2))
	if dist <= 1.0:
		hits = hits + 1.0

# hits / throws = 1/4 Pi
pi = 4 * (hits / throws)

print "pi = %s" %(pi)
And a sample run (timed on a 2.0Ghz Pentium M):
$ time python monte-carlo-pi.py
pi = 3.1422991423
    0m3.89s real 0m3.78s user 0m0.03s system

I have done some other hacking using Monte Carlo methods, specifically exploring methods of stock price prediction, which I hope to write about in the future.

Niall O'Higgins is an author and software developer. He wrote the O'Reilly book MongoDB and Python. He also develops Strider Open Source Continuous Deployment and offers full-stack consulting services at FrozenRidge.co.

Read and Post Comments