Computer Security Incident Response

NRIC FG1B Best Practice Appendices 06-03-2003

APPENDIX X. COMPUTER SECURITY INCIDENT RESPONSE PROCESS

BACKGROUND Computer security events happen on a regular basis and organizations must be prepared to respond in a timely and appropriate fashion. The intent of this document is not to proscribe a specific set of responses but rather to outline the process associated with responding. While a number of the steps will be presented in a linear fashion there may be opportunities to conduct steps in parallel. There are also a number of administrative and managerial issues that need to be considered separate and apart from the actual technical response. Wherever possible, organizations should plan for and consider their response to significant computer security events as part of the Disaster Recovery Plan. Organization with critical networked resources should establish a detailed computer security incident response plan as part of the Business Continuity Planning process. Organizations with a significant investment in or reliance upon their network(s) should consider the creation of a Computer Security Incident Response Team.

1. INITIAL RESPONSE a. Despite the views of individual security engineers, the principle objective of an incident response plan is to ensure business continuity and to support disaster recovery efforts. The scope and nature of any response must be consistent with the fundamental objectives of the business. b. As with any crisis, the initial response to a computer security event involves a rapid assessment of the situation and the execution of a number of “immediate action” steps designed to contain the problem and limit further damage. i. Upon detection of a suspected security event, notifications should be made in accordance with an organization’s security response plan. At a minimum, IT and the affected operational units should be notified immediately. An incident handler should be identified to assess the situation and direct the initial response. If the organization has a Computer Security Incident Response Team they should be notified of the event and if appropriate, assume control of the investigation and response. ii. There are two issues that need to be addressed immediately. First, is the compromised system an immediate threat to external resources or critical internal resources, and second, are malicious processes running which could result in a substantial loss of data on the compromised system? As a general rule, systems that pose an immediate threat to external entities or critical business functions should be isolated from the network. Depending on the network architecture and available resources, this may mean physical isolation (i.e., removal of an Ethernet cable or phone line) or logical isolation, through the use of firewalls and routers. If a malicious process that could result in substantial loss of data appears to be running on the compromised system, the system should be immediately disconnected from the power source. In routine situations, the decisions to isolate and power

- 1 - NRIC FG1B Best Practice Appendices 06-03-2003 down the potentially compromised system should be made as part of the investigative process. Systems supporting life-support or mission critical functions should only be disconnected only after careful consideration of the risks and under the direction of the proper authorities. Once the initial decision to isolate and/or unplug the device has been made, a more calculated analysis of the problem can take place.

2. INVESTIGATIVE PROCESS a. One of the first issues that should be resolved is determining the nature of the event. Where possible, non-malicious causes (i.e., software configuration errors and hardware failures) should be investigated and ruled out. Events determined to be non-malicious in nature should be documented and resolved in accordance with organizational policies. Where an obvious non-malicious cause cannot be identified, the incident should be responded to as a hostile event. b. Once a decision has been made to respond to the issue as a hostile event, the nature and intent of that response needs to be defined. If there is no desire to collect data for its intelligence or law enforcement value the incident can be responded to in much the same way as a non-malicious event. Since the extent and mechanics of the compromise will never be fully understood any system returned to service must be appropriately rebuilt, patched, and hardened before being connected to the network. c. Regardless of the organizational objectives (immediate return to service vs. investigation) some amount of initial data collection/preservation should be undertaken. Because the extent of the compromise is not known, this phase should be as non-alerting as possible. Logical steps to consider include: i. Review and analysis of the initial indicators of compromise ii. Inventory of operating system and applications/services (version and patch level) iii. Preserve and review system/application logs (copy to secure offline media) iv. Preserve and review security device logs (copy to secure offline media) v. Non-confrontational interviews of system administrators and users (as indicated) vi. Examination of other hosts on the network segment or hosts that share a trust relationship vii. Organizations not engaging in a full investigation may be able to infer the factors that led to a compromise from the limited data collected during this phase of the response. Organizations engaging in an investigation will use this data along with data collected during subsequent steps to develop an understanding of the vulnerability, exploit, and actions of the intruder.

- 2 - NRIC FG1B Best Practice Appendices 06-03-2003 d. Once the decision has been made to investigate an event, an organization must address a series of questions that will influence both the nature and the cost of the investigation. i. The fundamental issues that need to be addressed include referring the matter to law enforcement (this issue should be considered at the outset and periodically during the course of an internal investigation), conducting the investigation with in-house resources, contracting the task out, or working collaboratively. ii. The issue of responding immediately vs. monitoring the situation to develop additional information about the intruders, their methodologies and objectives must also be resolved. Before making a decision regarding any of these issues investigators should consult with management, the affected stakeholders, and their advisors (to include legal and PR). iii. If criminal activity is suspected, organizations should consider a referral to law enforcement agencies. Typically, events which result in significant financial loss (as measured by both opportunity costs and recovery costs), loss of life or potential loss of life, attacks on critical infrastructure, or have the potential to cause widespread loss should be presented to law enforcement. As with any criminal matter, the threshold on law enforcement involvement will vary by jurisdiction. Organizations approaching law enforcement should be prepared to provide as much information as possible on the costs and impact of the event. While law enforcement is frequently better equipped to investigate a computer security event than an organization with limited technical or financial resources there are some operational and PR issues to consider. Once law enforcement joins the investigation, they have the discretion to dictate both the pace and objectives of the investigation however, law enforcement is typically sensitive to the business operations of the victim. Establishing good pre-existing working relationships with local Law Enforcement and fostering an attitude of trust and cooperation can mitigate this risk. iv. If a matter is being investigated internally, a decision needs to be made on whether to use in-house or contract investigative resources. Intrusion investigations can be technically complex and very time consuming. Organizations intending to pursue legal remedies should evaluate the technical skills, tools, and methodologies available in- house to ensue their legal options will be preserved. Organizations with a dedicated computer security team or those that do not intend to pursue legal remedies may find that their in-house technical resources are sufficient to conduct the investigation. e. Determination to Restore or Monitor i. A key issue that needs to be considered at the outset of an investigation is whether to immediately restore the system to a secure and operational state or monitor the system in an attempt to collect additional information on the scope and nature of the compromise.

- 3 - NRIC FG1B Best Practice Appendices 06-03-2003 For most organizations, the initial reaction is to restore the system to a secure state and return to normal operations as soon as possible. Situations where organizations may want to consider monitoring before overtly responding include suspected involvement of an insider, suspected cases of corporate espionage, or cases of extortion. ii. Once the decision has been made to monitor a system, safeguards must be implemented that allow for rapid response should the compromised system begin attacking external or critical internal resources or should a malicious process be activated that attempts to destroy valuable data on the system. 1. Monitoring tools should be tuned to alarm on suspicious outbound traffic and someone should be tasked with immediately disconnecting the system from the network and/or power source if instructed to do so by the investigating team. 2. The actual mechanics of monitoring will vary by network but will invariably involve the use of a network sniffer and possibly an intrusion detection device. 3. The typical objectives of monitoring a compromised system include identifying the source(s) of the intrusion, determining the mechanics of the compromise, identifying the goals/objectives of the intruder, and defining the true scope of the problem. 4. In the course of monitoring hostile activity, additional compromised systems, to include systems external to the organization may be identified. Management will need to decide how and when to apprise those external organizations of the potential compromise. External notifications should only be made after coordination with organizational advisors to include legal and PR. iii. At some point in the investigation, an assessment of the compromised system will have to be conducted. The specific tools and techniques will vary by operating system and event but the basic intent is constant; collect and analyze both volatile and non-volatile information from the system. Volatile data must be collected from the system prior to powering the device down. The volatile information of greatest interest includes a memory dump, a listing of active processes/applications and their associated network ports, active connections, and current users. The processes used to collect the data should be adequately documented and the data itself written to secure removable media (i.e., a floppy) or to an off-host (networked) resource. There are applications and system utilities available on most operating systems to collect this data however, an investigator should assume that all applications on the system being examined have been compromised and cannot be trusted to return accurate information. Examiners should provide their own trusted tools that can be either run locally (statically compiled binaries run from removable media) or

- 4 - NRIC FG1B Best Practice Appendices 06-03-2003 over the network. There are a number of limitations associated with examining data from a “live,” potentially compromised system that are beyond the scope of this document to address. f. Handling Digital Evidence. i. A basic tenet of evidence handling is to maintain the item of evidence in its original state and to thoroughly document access to the item as well as the reason and process associated with any changes. With physical evidence, this dictates the order and type of examinations that can be conducted. The unique properties of digital evidence allow an examiner to avoid this issue. Using the proper tools, an unlimited number of identical copies of an item of digital evidence can be created for use by the examiner. ii. The process of creating an evidentiary copy involves “bit level duplication” and there are a number of commercial and open source products available that can accomplish this task. The resources, experiences, and preferences of the examiner will dictate which tools are utilized. At a minimum, an examiner familiar with Unix-type operating systems can use the “dd” system utility to make forensically sound copies for subsequent examination. iii. Critical to the process of creating an identical copy or “image” of a drive is ensuring that the original is not altered by the procedure and that each bit has been accurately recorded on the copy. Mounting the drive to be imaged as a “read-only” device can satisfy the first requirement while hashing algorithms such as MD5, which create a “fingerprint” unique to the input source, can be used to validate the copy process. The characteristics of the MD5 hashing algorithm are such that the alteration of a single bit in a file of any size will result in a different fingerprint. MD5 can be used to verify that the item of original digital evidence and any instances of Duplicate Digital Evidence (DDE) are identical. iv. Whenever possible, the original item of evidence should be retained and used to generate a first generation DDE copy which is in turn used to generate all subsequent DDE copies. If the original evidence (i.e., production hard drive) cannot be retained as evidence, a first generation copy should be made and treated in the same manner as an item of original evidence would be. Forensic examinations should be conducted on subsequent generations of DDE. v. Once the volatile information has been collected, a decision must be made whether to shutdown the system and “image” the drives or, to attempt to image the “live” system. For mission critical systems that cannot be taken off line, the system will have to be imaged while in operation, potentially over a networked connection. In situations where the system can be taken offline, the original drives should be retained as evidence whenever possible. If the original drives cannot be retained the reasons should be documented. The actual process of creating a forensically sound copy will vary by tool and situation.

- 5 - NRIC FG1B Best Practice Appendices 06-03-2003 Examiners unfamiliar with their chosen application should consult the documentation prior to attempting to image a drive or live file system. Once taken as evidence, access to the original drives, or 1st generation evidentiary copy, should be restricted. Any access to or transfer of custody over the physical article should be documented on a chain of custody form. g. Data Analysis i. Once a suitable copy of DDE is available for examination, the analyst can use any number of commercial or open source tools to conduct the analysis. The analytical process should be thoroughly documented, to ensure defensible/repeatable results. The specifics of an examination will vary by incident but in general, an analyst will look for evidence of contraband files, unauthorized access to intellectual property, logs/indicators of hostile acts directed at or originating from the compromised host, and indicators of specific compromised resources (files, user accounts, and other systems). Investigators not employing a commercial forensic analysis tool will want to consider open source resources such as “ftimes” and “The @stake Sleuth Kit” (TASK) to support their analysis. Additionally, a number of vendors and security researchers (to include Sun and NIST) make hashes available for known good files. These resources can significantly enhance the quality and efficiency of a forensic examination by allowing an examiner to quickly categorize a significant number of files as “known good.” ii. If during the course of the examination evidence surfaces that indicates trust relationships were exploited the scope of the investigation may have to be expanded. If it becomes apparent the security of other organizations was compromised management should decide on the timing and nature of any notification. Depending on the circumstances, legal and PR should be consulted prior to the notification.

3. RECOVERY a. Once the volatile data has been captured and a forensically sound copy of the compromised device secured, work can begin on retuning the system to service. Because the true scope of a compromise often remains in doubt the most prudent course of action is usually to rebuild the system from trusted media. Data should be restored from a trusted source and validated before being relied upon. The operating system and all applications should be updated wherever possible, patched, and all unnecessary services disabled. Organizations lacking security skills should consult any of the reputable and widely available resources dedicated to “hardening” servers and workstations. For purpose-built devices (i.e., routers, switches, and security appliances) consult the vendor for information on security conscious configurations. All system passwords should be changed and hosts with which the compromised system shared a trust relation examined for possible signs of compromise. If

- 6 - NRIC FG1B Best Practice Appendices 06-03-2003 the root cause for the compromise has been determined, appropriate steps should be implemented to mitigate the risks. b. Network surveillance should be increased following an intrusion. Post compromise monitoring of a network will often reveal additional probes and may help identify additional compromised resources. c. Lessons learned from the investigation should be presented to management, and as appropriate, shared within the organization. Network and security policies should be reviewed and if necessary adjusted based on the findings of the investigation. To the extent resources permit, other resources on the network should be examined and hardened as necessary. Depending on the root cause, the network security architecture may need to be revised. If Incident Response procedures and or an IR team did not previously exist, consideration should be given to their establishment. Legal counsel should be briefed on the scope of the compromise and should provide an opinion on any obligation to report the event to customers, regulators or partners. Refer to the Postmortem process, Appendix Z.

- 7 - NRIC FG1B Best Practice Appendices 06-03-2003 APPENDIX Y. RESPONDING TO NEW OR UNRECOGNIZED ANOMALOUS EVENTS

1. Investigate event, determine if malicious or non-malicious in origin (i.e., worm vs. configuration error or HW failure). a. If non-malicious resolve issue and document as appropriate b. If malicious or unknown, attempt to classify i. Determine if internal or external in origin ii. Determine if attempting to propagate c. Notify Security Response Team and convene if appropriate i. Periodically Review need to convene SRT 2. Analyze available data sources a. INTERNAL i. Security Device logs ii. Bandwidth Utilization Reports iii. Netflow Data iv. Application and system logs b. EXTERNAL i. Security Discussion Sites ii. NIPC/CERT iii. Service Provider’s Security Team c. PROACTIVELY COLLECTED i. If feasible examine hostile code Collect from: 1. “honeypot” 2. Compromised System 3. Trusted external sources 4. Consider making collected code and analysis available to Security Community a. Government Clearing Houses i. CERT ii. NIPC iii. NCC/NCS b. Security/Networking Forums i. NANOG ii. SecurityFocus Discussion Lists 3. Respond a. Isolate compromised host(s) b. Where possible block malicious traffic with existing security devices c. Where available, apply expedient mitigation techniques (based on analysis of code) d. When possible, patch/harden to address specific issue being exploited e. Monitor network for signs of additional compromise f. Vendors- where appropriate, advise customers of mitigation/recovery options g. Providers – where appropriate, advise customers of mitigation/recovery options

- 8 - NRIC FG1B Best Practice Appendices 06-03-2003 h. Reporting – If suspected criminal acts, report to Law Enforcement. 4. Recover a. Recover compromised hosts IAW DR/BCRS Plans b. Consider need to collect data for forensic analysis c. Survey infrastructure for other vulnerable hosts – patch/harden as appropriate d. Quantify loss if seeking legal remedies e. Monitor host and network for signs of subsequent compromise or exploitation f. Conduct post-mortem analysis g. Revise procedure and training based on Post-mortem analysis (See Appendix Z)

- 9 - NRIC FG1B Best Practice Appendices 06-03-2003 APPENDIX Z. INCIDENT RESPONSE POST MORTEM CHECKLIST

1. PREPARATION AND INFORMATION GATHERING a. Determine purpose of the investigation in order to ensure proper evidence steps are taken (e.g. Attorney Client Privilege, prosecution, etc.) b. Determine if law enforcement involvement is appropriate c. Determine various corporate groups that must be involved (Public Relations, Legal, Investigations, etc.) d. Document how the incident occurred, starting at discovery points of the incident (roadmaps and flowcharts as necessary) e. Develop inventory of all affected components, elements (hardware and software), business processes and people f. Identify data sources that will provide pertinent information that should be analyzed g. Collect data from identified sources and maintain per chain of custody requirements (if necessary) h. Develop timeline of incident events and IR activities i. Collect notes, interviews, conversations from various individuals involved in the IR j. Interview individuals involved in IR activities to determine events that occurred k. Enlist expertise based on technical needs and resource limitations l. Identify potential compromise of employee or customer personal data – ensure laws and regulations have not been broken/breached related to employee or customer personal data.

2. DETERMINE THE CAUSE (WHY) AND EFFECTS a. Develop data sources as necessary, such as filtering logs, IDS alerts, etc. b. Analyze and review data collected during IR activities c. Examine existing policies, processes and technologies d. Consult best practices and alert information e. Determine human errors and identify short cuts f. Identify employee misconduct and criminal misconduct g. Identify gaps and areas of non-compliance h. Involve necessary groups within company, including investigations, corporate compliance and HR i. Identify and resolve conflicting information j. Identify management issues resulting in acceptance of risk and bad management decisions k. Identify contributing factors and effects of the incident l. Determine if incident was intentional or accidental m. Identify if incident affected confidentiality, availability or integrity of key data and systems n. Perform business impact analysis to quantify effect on customers, systems, and data, financial impacts to company (include investigation and

- 10 - NRIC FG1B Best Practice Appendices 06-03-2003 recovery costs) and legal ramifications, in order to provide effective and efficient recommendations.

3. MAKE RECOMMENDATIONS AND FIX ISSUES a. Based on gaps, make recommendations for improvements to: i. Policies, standards and guidelines ii. Processes iii. People iv. Technology components b. Design and implement solutions as necessary c. Compile summary report to document the following: i. Post incident analysis ii. Summary of incident iii. Cause and effects iv. Actions performed v. Cost associated with response activities vi. Business impact of the incident vii. Lessons learned viii. Remediation actions required (recommendations) ix. Post mortem activities d. Report to all external entities as necessary

- 11 -