Honeynet

Faiz Ahmad Shuja
Pakistan Honeynet Project
5th March, 2004


This paper would be introducing Honeynet. It is the technology used by Pakistan Honeynet Project to gather information about the motives and tactics of the Black Hat community targeting Pakistan’s’ networks. By the end of the paper, I hope you will understand what honeynet is, what it can do for you, its benefits, different risks / issues, types of honeynet technologies available, and how you can deploy them in your network.

Overview

Normally due to the size of traffic and activity on the production network, we cannot log the level of detail that security practitioner often needs. Honeynets are a way to get much more detailed logging for certain malicious situations than would be possible with normal logging. Suppose you have firewall, which is properly configured to stop attack on port 111. It is good, but you won’t be able to learn about the attack, which can be bad. There might be situations when you want to see the content of the traffic. It can be when you want to know the intentions of the attackers and how much they know about your network. It can be when a particular system is getting lots of probes. Also, when you think that a new attack or technique has been used to exploit your network.

Honeynet is a high-interaction network of honeypots. High-interaction Honeynet is a network of actual systems running real operating systems and services. They give the ability to learn more about the attacks and attackers since they are running actual operating system with real services, which an attacker can compromise. Unlike low-interaction honeypot emulation, they are running everything real that comes with the operating system. Honeynets are like real networks comprising different systems but the difference is that they don’t have any production value and all the activity is logged and analyzed. They don’t run any production services, so they don’t have any production activity or interaction. As a result, any activity happens on the honeynet is supposed to be from an attacker. 

We know Honeynet is a highly controlled network of systems designed for capturing attackers’ activity. They can be created in accordance with the existing network infrastructure to give attacker a feel of real network to which she interacts with. It all depends on the services you want to provide. It can be from clustered Exchange server on a Windows 2003 to an ISO Linux environment. It all depends on you!

Honeynet is not a single product but composed of multiple technologies and products. It is an architecture. The goal is to create a highly controlled environment in which everything is monitored and logged. Once the architecture is created, you put your target systems inside it. Normally target systems are default installations of widespread operating systems placed on external networks.

Mechanism

Honeynet is a not a single product that we install and it is ready to go. Honeynet is an architecture composed of multiple technologies and products. The architecture depends on you but the deployment can be very complex.  Improper deployment can get you into trouble. There are certain requirements for a proper deployment of Honeynet. The honeynet deployment emphasizes Data Control, Data Capture, and Data Collection.

Data Control is defined as management or tracking of the activity to and from the Honeynet. You won’t like getting emails regarding unauthorized scans from your network or attackers using Honeynet to harm other systems on the network. Data Control is used to prevent attackers from attacking other systems. It has been observed that attackers use compromised systems to discover other vulnerable systems on the internet. There are different approaches to implement data control. We do not want attackers to know that they are under a controlled environment but also we do not want to give them full freedom. Normally there are three techniques used for data control, i.e. connection control, bandwidth control and intrusion prevention. These three techniques make up a powerful data controlling system. Connection control is used to limit the outbound connections from the Honeynet. Usually inbound connections to Honeynet are not controlled. We just don’t want an attacker to harm other systems through Honeynet. A certain limit is set for outbound connections and once the limit is achieved, all outbound connections are blocked. This minimizes the risks of different network attacks. Bandwidth control is used to manage the inbound and outbound network bandwidth of Honeynet. You don’t want an attacker to choke your network pipe with a DoS attack. So bandwidth control allows you to set a limit on the amount of network bandwidth your Honeynet can consume. Intrusion prevention is used to block known attacks. It is done by inspecting each packet at the gateway, and if it matches with IDS rules, the packet is dropped or modified with an alert. These techniques cannot completely eliminate the possibility of an attacker harming others systems. They can only help you in minimizing the risks, but cannot completely eliminate them.

Data Capture is defined as the logging of the entire attacker’s activity in the Honeynet. The purpose of the Honeynet is to learn and analyze the attacker’s activity. Honeynet dose not have any value without logging, it is useless. There are different techniques and approaches used for data capturing. The purpose of data capture is to log as much information as we can without attackers knowing it. Data logging is done on multiple layers to avoid single point of failure. Normally there are three layers of data capture, i.e. firewall activity, network activity, and system activity. Firewall activity is logged through data control script. It logs all inbound and outbound connections in /var/log/messages. Firewall logs give an overview of the activity and provide first indication of the Honeynet compromise. Network activity is logged through a network sniffer, i.e. snort or tcpdump. The purpose of logging network activity is to capture every packet (with its full payload) crossing Honeynet. System activity logging is the most complex and critical task to accomplish. They give you exciting and ample amount of information, as the activity is captured on the honeypot itself. The advantage of capturing activity on the honeypot itself is that it makes encryption ineffective, and as we are logging on the system level, we capture everything unencrypted. We know that different risks are involved with every mechanism; therefore we also have to minimize the risk of attackers knowing that they are being logged. There are different techniques used to reduce the chances of attackers detecting the data capture mechanism.  Normally, we make changes on the honeypots by installing customized data capture patches and kernel modules. Sebek is one of the tools used for logging attacker’s activity on the honeypot, which is installed as a hidden kernel module. Secondly it is recommended to store the captured data on a secured remote system rather than storing locally. It reduces the chances of attackers detecting the captured data, and deleting or modifying it. These techniques and mechanism can never completely eliminate the risks, but can only reduce them.

Data Collection is defined as the collection of data from multiple honeynets to a central location. It is not the requirement for a standalone Honeynet deployment. The purpose of data collection is to centrally capture and combine the information collected from multiple Honeynet deployments.

Architectures

We have discussed the three requirements of Honeynet architecture. There are different ways you can implement these architectures but we will discuss two architectures evolved by the Honeynet Project. These two architectures are known as GenI (first generation) and GenII (second generation). GenI was the first Honeynet architecture deployed by the project in 1999. After learning the lessons, identifying the problems and issues in GenI architecture, GenII was evolved in 2002.

GenI

GenI Honeynets were developed in 1999 by the Honeynet Project. The purpose of GenI Honeynet was to capture the maximum amount of attacker activity and give them a feel of real network. The architecture of GenI Honeynet is uncomplicated. The approach used for data capture and data control is simple, which makes it detectable by attackers sometimes. However, it can capture great deal of information and even can help in capturing unknown attacks. The ability of this architecture to control and capture attacks makes it very effective in capturing known, automated, and beginner level attacks. GenI Honeynet is not effective in capturing advance attacks because it can be easily detected by advanced attackers. We know that the purpose of data control is to keep track of attacker’s activity to and from Honeynet. We don’t want her to harm other non-Honeypots systems on the network. Data Control is done by putting a layer three firewall in front of Honeypots. The firewall works as a gateway in NAT (Network Address Translation) mode and controls all the inbound and outbound connections. It allows all inbound connections, but limits outbound connections. The firewall keeps track of all the outbound connections an attacker makes, and when a certain limit is reached, it blocks all outgoing connections from the Honeynet. The firewall we use for setting up data control mechanism is IPTables, an Open Source firewall.

As we know that the purpose of data capture is to log as much information as we can without attackers knowing it. In GenI Honeynet, Data logging is done on multiple layers to avoid single point of failure. The first layer of data logging is firewall. We don’t get detailed information through firewall logs though, but any information available is helpful in the case of Honeynet. Firewall logs give information about the source / destination IP address, source / destination port, protocol, and data / time. The second layer of data logging is network. An Intrusion Detection System (IDS) is deployed on the gateway, which logs every packet and its payload traveling on the wire. IDS logs provide most useful information, as every packet traveling to and from the Honeynet is captured. Secondly, IDS alerts when it catches any suspicious activity. Most of the IDS have signatures database of known attacks, and if the packet traveling on the wire matches with the signature, the IDS generates an alert. The IDS used for setting up data capture mechanism is Snort, an Open Source IDS. The third layer of data logging is system. We log attacker’s activity on the Honeypot itself. The keystrokes and screenshots are captured by installing a modified version of bash or a kernel module. The logs are securely forwarded to a remote server over the network. The disadvantage of transferring the logs over the network is that it can be easily detected by an advanced attacker.

GenII

GenII Honeynets were developed in 2002 by the Honeynet Project after identifying the problems and issues in GenI architecture. The problems in GenI were solved by changing the architecture of GenII Honeynet. In GenI architecture, firewall works on layer three which is easily detectable. This problem is addressed by making the gateway a layer two device, which makes it harder to detect. The firewall works in BRIDGE mode and controls all the inbound and outbound connections like GenI architecture. The new ability added to the gateway in GenII architecture is the IPS (Intrusion Prevention System). Basically, IPS works same as an IDS but has the capability to block and modify the attacks also. As we know most of the IDS have signatures database of known attacks. So, if the packet traveling on the wire matches with the signature, the IPS can block or even modify that packet. This capability helps in distinguishing between legitimate and malicious activity.  If an attacker would try to run an exploit against a non-Honeypot system, the IPS would be able to block or modify the attack even if it is under connection limit. IPS mechanism works with know attacks only, so unknown attacks can bypass this technology. That is the reason it is combined with connection control mechanism, so that the attack can be blocked after a certain limit if it doesn’t matches with the signature. This mechanism makes the Honeynet harder to detect.

Data capture mechanism in GenI architecture is somewhat same as the GenII. Data logging is done on three layers, i.e. firewall layer, network layer and system layer. The most difficult part is to capture the attacker’s activity on the Honeypot itself. The newest and greatest development that has been done for data capturing during the GenII period is Sebek. Sebek is a client-server tool designed to capture attacker’s activity on the Honeypot. It is a hidden kernel module capable of tracking attacker’s activity. Once the Sebek client is installed on the Honeypot, it starts transmitting the data using UDP to its server. Sebek client hides its activity from the attacker. Sebek server captures the activity from the client and logs it.

The capability of Data Collection and Alerting is also introduced in GenII Honeynets. Data Collection mechanism lets you collect and analyze the data from distributed Honeynet deployments. Alerting notifies if someone breaks into the Honeynet, which helps in keeping track of the activity.

Virtual Honeynet

Virtual Honeynet lets you run everything on a single computer. It is deployed by running virtualization software, that allows creating multiple virtual machines and running separate operating systems on them. This technology is very effective when we have limited availability of the resources. Also, Virtual Honeynet is easier to manage as compared to traditional Honeynet, since everything runs on a single machine. There are certain limitations for the type of architecture and operating system you can use for Virtual Honeynet. Also, there are risks involved in Virtual Honeynet deployment. If an attacker is able to compromise the operating system on which virtualization software is running, he would be able to control the whole system. Secondly, if an attacker compromises the system in your Virtual Honeynet, he may be able to detect that the system is running in a virtual environment. The possible solutions that Pakistan Honeynet Project has used and tested are VMWare Workstation, VMWare GSX Server, Microsoft Virtual PC and User Mode Linux. The advantage of using User Mode Linux is that it is open source and free. All of these products have nice features and capabilities.

Future

The future plans are to make the Honeynet deployment and management easy. In next phase the Honeynet Project would be releasing a bootable CDROM that will boot into a Honeynet gateway or Honeywall. The bootable gateway would have all the Data Control and Data Capture mechanisms as defined above. Once you properly boot the CDROM, all you will have to do is to place your Honeypots behind it. This will make the Honeynet deployment easy and standardized.

Conclusion

The purpose of this paper was to help you understand what Honeynets are and their importance. We discussed the mechanisms of a Honeynet, i.e. Data Control, Data Capture and Data Collection. Then we discussed the two architectures of Honeynet, GenI and GenII. In the end we discussed Virtual Honeynet and the future. Honeynets are truly high-interaction Honeypots which helps you in capturing and analyzing complex attacks.