totse.com | Intelligence Community Markup Language (ICML) Release Notes

by IC Metadata Sub-Working Group

NOTICE: TO ALL CONCERNED Certain text files and messages contained on this site deal with activities and devices which would be in violation of various Federal, State, and local laws if actually carried out or constructed. The webmasters of this site do not advocate the breaking of any law. Our text files and message bases are for informational purposes only. We recommend that you contact your local law enforcement officials before undertaking any project based upon any information obtained from this or any other web site. We do not guarantee that any of the information contained on this system is correct, workable, or factual. We are not responsible for, nor do we assume any liability for, damages resulting from the use of any information on this site.

ICML v0.5 Release Notes

Prepared for the IC

Prepared by
IC Metadata Sub-Working Group

4 October 2001

ICML v0.5 Release Notes
Table of Contents
TABLE OF CONTENTS I
1 INTRODUCTION 1
2 WHAT’S IN ICML 1
2.1 DOCUMENT/PRODUCT MODELS 2
2.2 DOCUMENT/PRODUCT METADATA MODEL 3
2.3 GENERIC CONTENT MODELS 4
2.4 FORMATTING MODELS 4
2.5 INLINE MODELS 5
2.6 DESCRIPTIVE TAGGING MODELS 5
3 EVALUATION AND ACCEPTANCE 6

ICML v0.5 Release Notes
1 Introduction
The IC Metadata Sub-Working Group (MSWG) has developed the Intelligence Community Markup Language (ICML) as part of the ICCIO Executive Council and Working Group commitment to IC inter-organization interoperability. ICML is based on a number of data modeling activities that have occurred in the IC over the last ten years, all of which have been used in one way or another to develop ICML.
The first focus of ICML is to aid finished intelligence production. Since a majority of the intelligence content being produced within the IC takes the form of documents, it was felt by the MSWG that limiting the scope of the initial ICML release to this type of intelligence content would yield the most benefits in the shortest period of time. The ICML standard as written is incorporates key writing styles, metadata, and structure requirements of typical IC products.
ICML is described as a Document Type Definition (DTD). The ICML DTD defines tags, much like HTML, that communicate important descriptive and structural information about intelligence content that resides within a document, product, or information module. ICML introduces: 1) various document/product structures, such as reports, articles, and analytical packets; 2) a new, expanded collection of document/product metadata broken into administrative and descriptive categories; 3) the most commonly used generic document components, such as paragraphs, lists, tables, and media; 4) CAPCO-compliant security models; and 5) descriptive content tags for more clearly indicating the subject matter of the information.
This release of the ICML is version 0.5. It is targeted for use by all intelligence production components of the nine agencies, the four military services, the J2, the nine unified commands, and the three national centers (counterterrorism, counterintelligence, and crime and narcotics). It is expected that this release, and subsequent releases, will be commented on by any and all of these components as part of a continuing effort to develop and deploy standards that are applicable to the widest IC audience possible.
2 What’s in ICML
Figure 1 depicts the different DTD components and subordinate structures found with ICML. The ICML XML application uses a standard modularization technique to create separately manageable and reusable modules of DTD fragments. The highest-level file in the application is the “icml-vX.dtd”. This top-level DTD calls in three modules: a list of XML notations, a list of XML character sets, and the core information pool module. The information pool module then calls in two industry-available DTDs for math/equations and tables, as well as an IC developed DTD for the CAPCO security markings. Each of these is described in more detail below. Additionally, this modularization technique provides a facility to override specific portions of the models for local implementations. This override technique is described more fully in the ICML Technical Addendum.

Figure 1: ICML DTD Components
2.1 Document/Product Models
The ICML DTD defines six key document models that are applicable for use in a generic sense or could be applied to a specific product line. Along with the six document models, the DTD defines a set of reusable containment structures, such as “summary”, “section”, “appendix”, which are used within specific document models where appropriate.
The document models include: Report, Article, Analytical Packet (AP), Briefing, Correspondence, and Basic. Each model uses the newly designed metadata block that identifies both administrative and descriptive metadata for the document or product. See paragraph 2.2 for detailed discussion on the metadata block. The document models are defined as follows:
· Report: The report will likely be the most used model. It contains the metadata block and at least one or more sections. It optionally can contain a front cover, title page, table of contents, preface, scope notes, summary, key findings, appendix, glossary, bibliography, index, distribution list, and back cover. The report model generically defines the most complicated document model and therefore can be used for intelligence assessments, information papers, background papers, etc. of varying lengths and complexities.
· Article: The article will most commonly be used for shorter news or magazine article-like documents. These would most often be found in a “current” intelligence environment. The article was defined with the same structures as the report so that interoperability could occur between the shorter-lived current intelligence content and its possible migration into more established assessments or reports. The article model simply requires some metadata and some form of text to represent the body of the article.
· Analytical Packet: The AP is a construct originally defined by JICPAC that attempts to modularize intelligence production into shorter topic-based responses. The intelligence modularization technique is something being looked at throughout the IC as a way of delivering more focused responses without having to provide a 50-100-page document that the consumer has to search through. The fundamental pieces of the AP are the metadata block, summary, analysis, and amplification which each contains more detailed levels of intelligence.
· Briefing: The briefing is currently a demonstrator document model. This model was developed to address the interoperability of PowerPoint briefing material. The briefing consists of the metadata block and one or more slides. Each slide has content. The content that typically exist on a slide is just bulletized and graphical forms of the same content that typically goes into a normal document. It should then be possible to move paragraphs, lists, and media between reports and briefings and to store some of that information in a reusable form. Presentation to a web browser in the form of a briefing can easily be achieved, without animation of course. The briefing model is optimally designed for organizations who restrict briefings to very simple formatting with no animation and who want to create their briefing material using the same tool as they use to create documents.
· Correspondence: The correspondence is for those products that take the form of a memorandum or letter and have multiple attachments. The correspondence consists of the metadata block, a body, and optionally a distribution list and one or more attachments. This type of finished product is rare, but would usually be found where the consumer is a higher up decision maker who has requested this format. Since the information is equally valid or reusable in other more traditionally formatted products, a model for this special type was worthwhile. The correspondence model is not intended for all official correspondence, just finished products that take this form.
· Basic: The basic model is for other intelligence products that need significant flexibility or where content is to be produced at more granular levels than a document or product. The metadata block is required, but a sectioning title is optional, and the content can be of any generic type for any length.
One or more instances of a document model can be collected and titled in the wrapping containers called “set” and “setsect”. In some organizations where there is a more modular production of intelligence content, for example using APs, there is a need to collect and group the modules for presentation as a complete document or product, while the modules really are created or stored by themselves. A set could be used with multiple reports to create a multi-volume collection or it could be used with multiple smaller articles to create an intelligence digest.
2.2 Document/Product Metadata Model
Document/product metadata provides greater insight into the content and structure of information and facilitates content management and discovery by query and retrieval tools. The document/product metadata block incorporated in ICML introduces a new organization of metadata not previously implemented in the IC. The ICML metadata block is to date the most sophisticated metadata model built for the IC. Additionally, it’s the most comprehensive XML metadata model and it merges the best of the previous XML metadata models tried in previous projects. The Intelligence Community Guidelines for Intelink Metadata, dated September 1999 and the newest release of April 2001 are also used within the ICML metadata block.
The metadata block is segmented into two categories: administrative and descriptive. These categories were chosen after evaluation of new industry metadata models, such as PRISM and NewsML . There were components of these industry models that were applicable to the IC, but there were significant portions that were not applicable due to the syndication and rights-related business processes that are pervasive in the commercial publishing and news industry.
The administrative metadata deals with information such as document identifiers, publisher and contributor information, key dates captured along the product’s life-cycle, indications of the existence of privacy act or copy righted information, records management information, and production requirement tracking details. The models are all very specific and contain multiple levels of granularity.
The descriptive metadata deals with the subject matter of the content. The descriptive block contains information such as the overall document’s security marking; title and subtitle; content description; a variety of locations including the country, region, or place of interest relevant to the content; subject code and category assignment; intelligence discipline; and product line identifier. Of particular interest within the IC at this time is the subject identification. There are current numerous subject coding specifications and taxonomies being developed and deployed around the IC. The subject code metadata in ICML is very flexible allowing the definition of multiple codes within multiple coding specifications. An initial list of coding specifications to choose from is available, but no subject codes are provided. The selection and insertion of codes is an exercise left up to the content creation system or some downstream categorization or classification system.
2.3 Generic Content Models
Below the document/product models and the document containers are the generic content models. These models include the most commonly used structural components defined by the English language for communicating information. Whether the information is presented on a web page or in a document, it is visually presented using the same structural components, regardless of their actual format or style. The ICML components include: paragraphs, media, numbered and bullet lists, tables, notes, quotes, equations, and verbatim (used for pre-formatted data such as code samples). Each of the generic content models includes either a required or optional portionmark as specified by CAPCO security marking rules.
Two idiosyncrasies in the current ICML release include the media and equation models. The media model has not been fully vetted by the MSWG XML Team. The most incomplete parts of the media model involve the non-graphic media types and their related metadata. These are one of the next items for evaluation as we plan on involving the media creation and management staffs throughout the IC to assist.
The equation model that is being used has two components: a graphic component and a rich equation markup model called MathML. The current version of MathML from OASIS has some notable parsing errors that need further exploration. For this reason, MathML has been excluded from this release. The MSWG XML team has agreed that if a model is warranted, an industry standard will be used. MathML v2.0 is the newest industry standard available.
A significant change from the JIVA KOM/GOM models includes the separation of the portionmark element from the text container within almost all of the generic content models. The JIVA models were flawed in that they used a “mixed” content model that allowed multiple portionmarks located anywhere within the generic content model. By introducing a new container element called “text”, the occurrence of the portionmark could be easily controlled. However, this solution poses some implementation issues in certain tools dealing with content splitting. This issue will be addressed at a future time when the security and portionmark models are reevaluated.
Within most of the generic content models you will find sub-models for formatting and inline components. Additionally, a substantive list of inline descriptive tags are also provided for clear semantic tagging of keywords and concepts. These subordinate models are discussed further below.
2.4 Formatting Models
Format models are kept to a minimum. There are only three tags in this category: emphasis, superscript, and subscript. The emphasis is the only true format element as it identifies one of three emphasis types: bold, italic, and underscore. These can be used in combination as well. The superscript and subscript tags are descriptive in nature, but imply a specific composition of the content.
2.5 Inline Models
Inline models provide a method for tagging sub-components of the generic content model. Examples of inline models include: sources, footnotes, quotes, and equations. Sources and footnotes float in the text stream at the point where they are relevant. The content of the source and footnote may be displayed in a variety of ways depending on the delivery media. Quotes and equations are special inline models that recognize, for formatting and/or searching purposes the special purpose of the text string.
A special inline model for rewriting strings of text for multi-domain outputs is also available. This feature lets the producer create a single multi-domain version of the content. The different wordings for each domain are captured in the rewrite model that identifies the original string and multiple substitute strings. The original string carries the portionmark of the generic content model while the substitute strings can each be marked separately. The business process for processing the rewrite model requires a filtering and reclassification processor that can generate the different domain outputs automatically from the single source.
Two idiosyncrasies in the current ICML release include the link or cross-reference model and the inline equation model. The link model is currently under reevaluation and has therefore been commented out of the current release. The commented model does not conform to new guidance from the W3C XLink Recommendation that is the current industry standard. The same equation issue with MathML exists for the inline model as it does for the generic content model. Refer to discussion above for more information.
2.6 Descriptive Tagging Models
In response to the Intelink Technical Exchange of May 2000, the IC under the leadership of the IMO/IMD drafted a Content Tagging Scheme (CTS). Since that time, the MSWG XML team working on ICML has renamed these tags as “descriptive” tags to clarify their intent. The purpose of the descriptive tags is to tag words or phrases within a document that can be later used for indexing and searching. The tags are to be used to better describe the content using agreed to semantics.
The IMO/IMD has published a draft document that lists and describes twelve core tags and the possibilities for extending those tags to further levels of granularity. These XML descriptive tags wrap around text within a generic content model, for example within a paragraph or table. They may be used to easily extract the values of the keyword field of the Intelink Metadata Guidelines.
The IMO CTS document discusses the possibility of expansion and granularity. The concept of tag nesting is used to provide greater levels of specificity and create containers for layers of abstraction. The main thing to understand is that greater detail is nested (or “contained in’) the data item to which it applies.
The JIVA GOM standard implementation of the CTS was the first XML application to exercise both the concept of expansion and granularity. ICML provides this same implementation, but with minor modifications suggested and used by the National Ground Intelligence Center (NGIC).
At every level of the descriptive tagging models, content is allowed. This means that the use of the expansion and granularity levels are optional. The simplest use is to just surround a word or phrase with the descriptive tag. If desired, the words or phrases in that tag can be further tagged with the next layer of descriptive tags. If desired, the core and expansion levels can be further tagged using the granularity attributes. These can also be mixed in any combination to suit a particular use. Additionally, the expansion level elements can be used in any order to make the use of text and phrases, along with their containing tags, more readable.
3 Evaluation and Acceptance
ICML’s success will be based in large part by participation of IC organizations in the expansion or refinement of ICML over time and its eventual adoption into all IC business processes. Once ICML approved, all IC producers will use the standard in their workflows to exploit its capabilities and speed production.
Evaluation of ICML will have to take many forms and involve both technology representatives as well as functional representatives from every organization. In that respect, ICML is based the an understanding of:
· intelligence content,
· existing production tools and techniques,
· daily demands on producers and consumers,
· fluid nature of production requirements,
· technical infrastructure issues with bandwidth and multiple delivery media, and
· the newest tools and techniques that XML can provide.
Just as the development team took these different perspectives into account, we anticipate that the evaluating organizations of ICML will bring as many different perspectives as are available into the process. Typically, an intelligence functional expert won’t be able to judge the potential of ICML to automate certain business processes and a technical expert proficient in XML tool implementation won’t be able to judge the potential of ICML to fit that organization’s current and future intelligence content requirements. Accordingly, we ask that both functional and technical staffs examine the standard.
We have outlined a multiple step evaluation process contained in the ICML v0.5 Release Notes. Please provide feedback on as many issues as possible keeping in mind that different questions may need to be answered by different types of specialists. To date, comprehensive ICML documentation has not been developed. This is planned for the 1.0 release in early December. Without this documentation, you are left to interpret some parts of the models on your own. Where there is any confusion over the models or definitions, please request further explanation if possible prior to the evaluation cutoff date using the feedback method on the ICML web site, http://www.xml.saic.com/icml. Otherwise, indicate in your evaluation response that more information was necessary to evaluate a particular model and we will take this into consideration for future releases and evaluations.
How to View ICML:
1. Read the ICML v0.5 Release Notes and the ICML Technical Addendum as these will provide important definitions and issues that were dealt with during the development of ICML.
2. Use one of the following tools to assist in walking through the ICML DTD:
Ø Text editor (this is for the XML expert who knows how to read DTD syntax)
Ø DTD viewer. Examples of 3rd party COTS DTD viewers include: XML Authority, XML Spy, or Near & Far – the first two are inexpensive and can be downloaded on the Internet with 30-day evaluations.
Ø HTML presentation. To introduce those evaluators unfamiliar with XML DTD syntax, the ICML document models are available as a special HTML package. This set of files is available as a zipped archive and on the ICML web site. It allows evaluators to view and navigate through the DTD hierarchy within an Internet browser, such as Netscape or Internet Explorer. By utilizing the well-known web interface, this tool allows the novice user to examine and understand the document models and all levels of granularity below those models. The ICML Viewer can be user or downloaded from the ICML web site under the table of content items “ICML_FULL Viewer”.

What to Evaluate:
1. Evaluate the six high-level document models and the reusable containers found in the “icml-vX.dtd”. Map each product you produce to one of the document models. Identify any inconsistencies in your products and identify any shortfalls of ICML (missing containment structures, unclear definition or terminology, required versus optional components).
2. Evaluate the new metadata block. For comparison to electronically delivered products, this is best done by brushing up on the new Intelink Metadata Guidelines and looking at how well your organization supplies metadata on all of your Intelink posted products. For comparison to hardcopy delivered products, this is best done by looking at production template, style guides, and the information contained on the front cover, back cover, or title pages of the product. Identify missing parts of the metadata block, confusing organization or terminology, required versus optional components.
3. Evaluate the generic content models found in the information pool module “infopoolx-vX.mod”. Look at your products to see if there are other generic objects that ICML might need to have. Currently, the list consists of paragraphs, media, numbered and bullet lists, tables, notes, quotes, equations, and verbatim. Determine what types of media are used in your products and compare with the list provided within the “media” element. Evaluate if any of your data might be better suited for a more semantically meaningful structure, as in a database extraction, that could make use of the fielded data hook and a localized override.
4. Evaluate the formatting characteristics found within the generic content models, namely superscript, subscript, and the three forms of emphasis. Do these meet the needs of your products?
5. Evaluate the inline tagging models that appear inside of the generic content models. These include: comments, sources, footnotes, quotes, and equations. Are there other types of inline content that you use in your products? Evaluate your organization’s internal production of multiple versions of the same product for posting or release to multiple domains or consumers. Evaluate if using an inline-tagging concept, such as the rewrite model, would better enable your producers to create one multi-domain version of the document that could be automatically separated in the multi-domain versions suing a filtering script.
6. Evaluate the descriptive tagging models to determine if they are representative of the types of content that might exist in your products. Suggest additions or modifications.