The IC generally uses commercial off-the-shelf (COTS) tools for OSINT analysis, particularly the study of social media data. This chapter focuses primarily on existing social media analysis tools. A few important caveats must be kept in mind when considering the utility of these tools for intelligence professionals.
First and most importantly, most COTS tools are developed for commercial purposes—for advertising, brand management, and consumer analytics. Companies want to understand and predict a customer’s buying behavior, to position their product to be available when a customer is most susceptible to influence and to influence the customer’s opinion of the product or the company itself. These tools can often serve the interests of the IC, but they are rarely a perfect match, and many tools have extremely limited utility for the IC because they are not designed for its purposes.
Second, the market developing these tools is so dynamic that it presents problems for the IC. Both COTS tools and the producers developing them are constantly changing. This problem manifests itself in several ways. Data feeds can be limited or eliminated by the company owning the content for a variety of reasons. Companies may want to protect user data, or conversely, they may start selling user data that were previously available free. Companies may have acquired a capability or developed an indigenous one for social media content analysis, and they may want to undermine competing capabilities by eliminating their data source.
For example, Topsy was a social media analytics service that indexed all published Twitter tweets and provided free searching functions. After eight years, the service unexpectedly went offline on December 15, 2015, two years after being acquired by Apple. This case is illustrative for analytics services and IC operations that rely on other services for the early phases of the data acquisition and analytic cycle. Apple and Topsy provided little information at the time of the acquisition about whether this data feed would remain available, nor did they provide a warning before the Topsy platform ultimately went offline. The IC is accustomed to data accesses being unexpectedly unavailable. SIGINT collectors may lose access for a variety of reasons, including system reconfigurations and new encryption. HUMINT collectors grapple with the possibility of a source being compromised or of losing access to sub-sources or sensitive programs. Satellite malfunctions can leave IMINT collectors in the dark as they arrange repairs. Intelligence consumers may be frustrated by losing a stream of information collected by covert methods. Still, the loss can be explained as an inevitable consequence of covert methods—the data source is no longer accessible to analyze.
One advantage of OSINT is that it is more dependable than covert collection methods. A sudden loss of an open-source data feed—when the raw data are still accessible online—may be unfairly interpreted as reflecting negatively on OSINT by intelligence consumers who may be neither aware of nor interested in the process of transforming raw data into an intelligence deliverable. When Twitter is still online, and people are even tweeting, it can be harder to explain to an intelligence consumer why an OSINT product is suddenly no longer available.
The dynamic nature of the social media analytics market is incongruent with the IC’s timelines in vetting tools and providers. Figure 3.1 shows possible options available to the IC for using COTS tools. Ideally, the IC would transfer both a data source and an analytic platform to its classified system. The IC, understandably, wants to fully understand an institution and its platform before introducing it to a classified system.
By relying on COTS tools, the IC risks being always behind in social media analytics because of the time needed to complete this vetting. The predominance of startup companies in this space complicates the IC’s ability to build a trusted relationship with established providers to streamline the vetting process, possibly. The IC could, of course, develop indigenous tools, but this is a costly alternative. It could also leverage a tool on an unclassified system and avoid the complication of collocating it with the more-sensitive capabilities and information on classified networks. Social media analytics is also a dynamic market because of rapid improvements in computing power and data-processing capabilities. Tools are becoming more capable of handling large amounts of data, and machine learning are making impressive strides. Instead of humans having to teach computers how to perform complex tasks, systems are being built that enable networks to learn how to conduct these complex tasks themselves.