Software applications are required to analyze, from any source of text data, and automatically extract many different entity types, such as people, dates, location, modes of transportation, facilities, measurements, currency figures, weapons, email addresses, and organizations. The extraction capability is extended to the detection and extraction of activities, events and relationships among these types of entities. Automatically extracting this information means that analysts do not have to read extensive amounts of text to pull out these types of information manually; they can focus sooner on the relevant information. Automated event and relationship extraction helps analysts more quickly discover associations, transactions, and action sequences that can be employed in the development of link, event and activity analyses. Therefore, assuming that this is done effectively, analysis can begin with information that has been automatically extracted and organized from much more voluminous amounts of information available to the analyst.
Information relevant to global operations might be in various languages other than English, such as Arabic, Chinese, Farsi and many others. Technology is available to support and augment the efforts of the limited number of translators typically available to exploit foreign language documents. Language processing software can help translators analyze documents in their native language and help them select the most relevant documents or sections of documents for translation. Available software might contain a suite of natural-language processing components that enable language and character encoding identification, paragraph and sentence analysis, stemming and decompounding, part-of-speech tagging, and noun phrase extraction. With such a system, analyst training can assume that the capabilities exist to provide the analyst with information that has been extracted and translated relatively effectively, by means of automated and human processing, from numerous different languages.
Software can also provide user-guided text extraction from unstructured data sources, supporting the transformation of user-identified text-based information into structured graphic formats for further analysis. The user can highlight important information contained in text documents—entities and associations among entities, for example—and easily put it in chart form to enhance visualization of the information without having to retype information. This type of conversion can be employed with a variety of text formats and applications.