Gathering raw information from many different sources.
In this stage, also known as data gathering, intelligence is acquired through activities, such as interviews, technical and physical surveillance, human source operations, searches, and liaison relationships. Information can be gathered from open, covert, electronic and satellite sources.
There are six basic types of intelligence collection.
- Signals Intelligence (SIGINT): The interception of signals, whether between people, between machines, or a combination of both
- Imagery Intelligence (IMINT): Representations of objects reproduced electronically or by optical means on film, electronic display devices, or other media
- Measurement and Signature Intelligence (MASINT): Scientific and technical intelligence information used to locate, identify or describe distinctive characteristics of specific targets
- Human-Source Intelligence (HUMINT): Intelligence derived from human sources, the oldest method for collecting information
- Open-Source Intelligence (OSINT): Publicly available information appearing in print or electronic form, including radio, television, newspapers, journals, the Internet, commercial databases, videos, graphics and drawings
- Geospatial Intelligence (GEOINT): Imagery and geospatial data produced through an integration of imagery, imagery intelligence, and geographic information
In the first stage, the collection as it pertains to open source includes the identification of potentially useful information and the retention of that material. This stage requires guidance—either explicit or general—for open-source collectors to identify the kinds of information that should be collected and to prioritize collection efforts to reflect the requirements of the IC.
The acquisition is the physical or electronic collection of this information. Retention is the continued holding of acquired OSIF.
Of the four types of OSIF considered here, news media content is the easiest to collect.
For the first-generation OSINT, the physical acquisition of transmitted news media data presented logistic challenges that required FBIS to disperse to multiple geographic locations to intercept broadcasts. The collection of print material was dependent on the presence of a diplomatic officer or clandestine collector to physically acquire published material. Today, however, with most news media information available online, logistic challenges have shifted from processing to information management. The retention of news media information is fairly simple. The volume of such information is manageable, and the information generally comes in a standardized and text-based format.
Gray literature, like news media content, is becoming easier to collect, for similar reasons. Gray literature creators have been slower than news media in transitioning to online content, so there are still cases in which a collector is required to physically acquire information in hard copy, particularly in the developing world, where Internet usage by institutions may not be widespread. As is the case for news media content, retention of gray literature is not very difficult.
Social media information, in contrast, presents many unique challenges in the collection phase, for both short-form and long-form content. First, a complete picture of the raw data can be difficult to acquire. In the startup phase of social media content analysis, social media analytics were easily accessible and sometimes even free to use. One company, Topsy, for example, provided public access to a complete index of Twitter material since Twitter’s start in 2006. However, as social media analytics has become an established industry, platforms like Topsy have been purchased and shuttered by larger companies looking to monetize these markets. Social media aggregation companies that market social media data often provide only a fraction of the data from a social media platform or dataset from only a specific window of time. Furthermore, these providers also tend to focus on social media data from U.S.-based platforms, primarily Twitter and Facebook, although native platforms are more relevant for some of the IC’s key interests. In addition, even if the IC can acquire a complete set of social media data rather than a subset, the data do not present a representative sample for a population. Demographic groups do not use social media evenly, and in many locations of interest to the IC, usage can be tremendously impacted by socioeconomic class.
The collection of social media data also raises legal issues related to the protection of U.S. persons, which is particularly relevant to retention. Such issues are less present with gray literature and mostly nonexistent with news media. As social media data can easily include data related to U.S. persons, the IC must follow stringent procedures related to the collection and retention of information. Those procedures are detailed in a variety of regulations, including Executive Order 12333 and DoD Directive 5240.01. In addition, both long-form and short-form social media content are more dynamic than news media content or gray literature. A news article (with the exception of corrections) is generally not a living document—if a story has changed, a separate, new article will be generated. In contrast, a discussion trend may garner interest and updates for a few days or weeks, or it could continue for years. Acquisition and retention of social media content, in particular, must be real-time and constant, as impactful content may be posted and removed in a short period of time if it incites controversy or reveals sensitive information—cases that could be of particular interest to the IC. Finally, both long-form and short-form social media content are increasingly presented in formats other than text. YouTube videos are an example of long-form social media content in a different format, and short-form social media data in a nontext format include images on platforms such as Flickr and “live” videos on platforms such as Facebook and Twitter.