Internet Census 2012
By Götz Kaufmann, 2015-07-26
From March to December 2012, an interesting research project was conducted aiming to 'count' the access points to the internet on a global scale. As the way of 'counting' was not entirely legal (even though not harming the system infrastructure at all!), its publication on the internet (click here to find it) was done using the undercover nickname of Carna Botnet. The intention of creating a global Internet Census in 2012 was due to the fact that "[w]ith a growing number of IPv6 hosts on the Internet, 2012 may have been the last time a census like this was possible." The transition from IPv4 to IPv6 was (simply speaking) necessary due to the increasing number of websites that require random (not even static) IP addresses. "Since the version 4 of the internet protocol (IPv4) was only able to distinguish 4,294,967,296 individual addresses - a number that is already exceeded today, IPv6 provides plenty of new addresses: roughly 3*1038 hence is expected to overcome this problem for the next couple of years" states ILNumerics founder Haymo Kutschbach. When this change took place in 2012, the pure number of IP addresses would make counting with the existing methods much more difficult - or even impossible. Therefore, the project was a once in a lifetime chance to reveal something unique.
In their conclusion, Carna Botnet expressed the hope "that this publication will help raise some awareness that, while everybody is talking about high class exploits and cyberwar, four simple stupid default telnet passwords can give you access to hundreds of thousands of consumer as well as tens of thousands of industrial devices all over the world."
The findings show the largest and most comprehensive IPv4 census that has ever been created. The research modelled the results in form of the following interactive world map showing the internet use by day and night:
To get a geographic overview the research determined the geolocation of all IP addresses that responded or had open ports using MaxMind's freely available GeoLite database for geolocation mapping.
Environmental Justice (EJ) research does not deal with IT security issues, so the rightful question is, how this matters for environmental justice? Access to the internet is certainly among the most famous and life changing developments of the recent decades. You can book flights, buy any commodity from ecommerce websites and find plenty of information for free. Thus, access to the internet is one of the most important environmental goods in the world, in terms of access to information and in some countries (mainly from the first world) access to procedural justice. It can hardly be denied that even the most deprived people benefit from the existence of the internet.
From EJ perspective, the question is not whether people are using sufficiently secured passwords (or passwords at all) for their devices, but rather the answer to the question, who has (geographically) access to the internet. Two questions are naturally involved into the debate: What kind of development are we experiencing? And: development for whom? The former - as said - cannot be put in question. The chances and benefits of the internet are undeniably high. The question is, if the world is benefiting somehow equally from this development so that we can truely speak about a 'worldwide web'.
As it can clearly be seen, the internet is an environmental good provided only for a minority of people in the world. The whole continent of Africa, most parts of the Brazilian Amazon and the Northwestern Territories in Canada where the indigenous people are living, plus many countries in the Asian-Pacific region are barely connected to the internet. And even if there is a connection, the people are just participating at most half as much as the people in Western Europe, in the South of Brazil, the USA, consequentially the industrialized part of the world.
Taking the findings of Carna Botnet seriously, the perspective of a worldwide benefiting internet is at the bare minimum a hoax since the majority of the people in the world is simply not connected to it. Considering the findings of this research for environmental injustice mapping would be in consequence utterly important and might mark another milestone for a better understanding of the global unequal distribution of environmental goods.
About the creation of the Internet Census 2012
The research was conducted from 2010 to 2012. It started "while spending some time with the Nmap Scripting Engine (NSE) someone mentioned that we should try the classic telnet login root:root on random IP addresses. This was meant as a joke, but was given a try. We started scanning and quickly realized that there should be several thousand unprotected devices on the Internet." At the end was the discovery of a "high number of open embedded devices on the Internet. Many of them are based on Linux and allow login to standard BusyBox with empty or default credentials. We used these devices to build a distributed port scanner to scan all IPv4 addresses. These scans include service probes for the most common ports, ICMP ping, reverse DNS and SYN scans. We analyzed some of the data to get an estimation of the IP address usage.
After completing the scan of roughly one hundred thousand IP addresses, we realized the number of insecure devices must be at least one hundred thousand. Starting with one device and assuming a scan speed of ten IP addresses per second, it should find the next open device within one hour. The scan rate would be doubled if we deployed a scanner to the newly found device. After doubling the scan rate in this way about 16.5 times, all unprotected devices would be found; this would take only 16.5 hours. Additionally, with one hundred thousand devices scanning at ten probes per second we would have a distributed port scanner to port scan the entire IPv4 Internet within one hour."
In order to prove the concept and to "further verify our sample data, we developed a small binary that could be uploaded to insecure devices. To minimize interference with normal system operation, our binary was set to run with a watchdog and on the lowest possible system priority. Furthermore, it was not permanently installed and stopped itself after a few days. We also deployed a readme file containing a description of the project as well as a contact email address.
The binary consists of two parts. The first one is a telnet scanner which tries a few different login combinations, e.g. root:root, admin:admin and both without passwords. The second part manages the scanner, gives it IP ranges to scan and uploads scan results to a specified IP address. We deployed our binary on IP addresses we had gathered from our sample data and started scanning on port 23 (Telnet) on every IPv4 address. Our telnet scanner was also started on every newly found device, so the complete scan took only roughly one night. We stopped the automatic deployment after our binary was started on approximately thirty thousand devices.
The completed scan proved our assumption was true. There were in fact several hundred thousand unprotected devices on the Internet making it possible to build a super fast distributed port scanner."
As the research reports, it "had no interest to interfere with default device operation so we did not change passwords and did not make any permanent changes. After a reboot the device was back in its original state including weak or no password with none of our binaries or data stored on the device anymore. Our binaries were running with the lowest possible priority and included a watchdog that would stop the executable in case anything went wrong. Our scanner was limited to 128 simultaneous connections and had a connection timeout of 12 seconds. This limits the effective scanning speed to ~10 IPs per second per client. We also uploaded a readme file containing a short explanation of the project as well as a contact email address to provide feedback for security researchers, ISPs and law enforcement who may notice the project.
The vast majority of all unprotected devices are consumer routers or set-top boxes which can be found in groups of thousands of devices. A group consists of machines that have the same CPU and the same amount of RAM. However, there are many small groups of machines that are only available a few to a few hundred times. We took a closer look at some of those devices to see what their purpose might be and quickly found IPSec routers, BGP routers, x86 equipment with crypto accelerator cards, industrial control systems, physical door security systems, big Cisco/Juniper equipment and so on. We decided to completely ignore all traffic going through the devices and everything behind the routers. This implies no arp, dhcp statistics, no monitoring or counting of traffic, no port scanning of LAN devices and no playing around with all the fun things that might be waiting in the local networks.
We used the devices as a tool to work at the Internet scale. We did this in the least invasive way possible and with the maximum respect to the privacy of the regular device users."
The analysis then showed that insecure devices are located basically everywhere on the Internet and aren't specific to one ISP or country. This resulted in the conclusion that the problem is due to default or empty password settings and an Internet wide phenomenon.
"We used a strict set of rules to identify the target devices' CPU and RAM to ensure our binary was only deployed to systems where it was known to work. We also excluded all smaller groups of devices since we did not want to interfere with industrial controls or mission critical hardware in any way. Our binary ran on approximately 420 thousand devices. These are only about 25 percent of all unprotected devices found. There are hundreds of thousands of devices that do not have a real shell so we could not upload or run a binary, a hundred thousand mips4kce machines that are mostly too small and not capable enough for our purposes as well as many unidentifiable configuration interfaces for random hardware. We were able to use ifconfig to get the MAC address on most devices. We collected these MAC addresses for some time and identified about 1.2 million unique unprotected devices. This number does not include devices that do not have ifconfig."
As for the collection of "the scan results, approximately one thousand of the devices with most RAM and CPU power were turned into middle nodes. Middle Nodes accept data from the clients and keep it for download by the master server. The IP addresses of the middle nodes were distributed to the clients by the master server when deploying a command. The middle nodes were frequently changed to prevent too much bandwidth usage on a single node. Overall roughly nine thousand devices are needed for constant background scans to update client IP addresses, find restarted devices and act as middle nodes. So this kind of infrastructure only makes sense if you have way more than nine thousand clients. To coordinate the scans without deploying large IP address lists and to keep track of what has to be scanned, we used an interleaving method. Scan jobs were split up into 240k sub-jobs or parts, each responsible for for scanning approximately 15 thousand IP addresses. Each part was described in terms of a part id, a starting IP address, stepwidth and an end IP address. In this way we only had to deploy a few numbers to every client and they could generate the necessary IP addresses themselves. Individual parts were assigned randomly to clients. Finished scan jobs returned by the clients still contained the part id so the master server could keep track of finished and timed out parts."
The full research including downloadable maps, and information technology related details, and conclusion can be found here.
Image: © Carna Botnet