LREC 2002 Workshop

Content Interoperability Standards

Motivation and Aims

The scope of this workshop is to bring together designers, developers and users of content encoding practices in order to promote de facto standards for content creation, management and delivery. More specifically, the workshop provides a hands-on meeting meant to generate a manifesto of requirements and recommendations for best practices in content interoperability. Such a manifesto is intended to form the basis for a business-driven alliance of practitioners and users from the industries and research organizations in the area of attention, whose goal is to converge on a specific operational metadata scheme for virtual integration in the creation, management and delivery of content.

Market Situation

In the next five years, Human Language Technologies (HLT) related to content processing will find their steadiest and strongest growth within the market segments of Content Management and Syndication. In the US alone, Content Management software is expected to generate revenues of nearly $5.2 billion by 2005 (Merrill Lynch, 6/01) and up to $7.2 billion by 2006 (Butler Group). Jupiter (5/01) estimates about 70% of the archived content market --- $850 million in 2001 and estimated to grow to $4.6 billion in 2006 --- will be realized through syndication. The ensuing opportunities for language-based solutions to content understanding, extraction and generation tasks are therefore substantial.

Virtual Integration

The materialization of such prospected opportunities requires the availability of content interoperability standards capable to grant Human Language Technologies seamless integration in the content creation, management and delivery supply chain. Most current Content Management environments offer a relatively open software infrastructure in which a variety of HLT components can be integrated. However, the opportunity of operational integration into Content Management infrastructures, whether implemented in OEM or ASP mode, will have a fragmentary effect on the HLT industry as a whole if it continues to rely on proprietary metadata schemes. HLT solutions will continue to be tightly knit into the specific applications they were engineered to service and less adaptable to other systems, with a consequent lack of fair competition for HLT providers and paucity of choice for prospective buyers. Open content interoperability standards are thus necessary to stimulate market growth in the HLT sector and provide a healthy competitive environment for the HLT industry as a whole.

Semantic Web Standards

Emerging Semantic Web standards such as DAML and OIL are now starting to provide a description framework in which complex meaning relationships can be actively encoded and effectively engaged in categorization, search, navigation, retrieval and extraction technologies. However, Semantic Web standards must be tailored to the specific needs of vertical industries in order to promote business viability for the technologies they are intended to facilitate. This is demonstrated by the creation and use of metadata standards such as PRISM, NewsML, NITF and ICE in the publishing industry. In addressing the need for syndicating, aggregating, post-processing and multi-purposing content, these initiatives have built consortia where content providers and content management software vendors work together to create the "right" vocabulary and support in the leading software tools, facilitating widespread adoption throughout the industry.

Workshop Format

The workshop comprises two working sessions. The morning session will consist of 3 invited talks (35 minutes each including discussion), each followed by two short papers (20 minutes each including discussion). The afternoon session will be devoted to an in-depth discussion of issues raised during the morning session with the aim of converging on a lockstep approach to the deliberation of content interoperability standards.

8:45-09:00 Opening
09:00-09:35 Invited talk I
09:35-10:15 Two Short Papers
10:15-10:50 Invited Talk II
10:50-11:30 Two Short Papers
11:30-11:45 Coffee Break
11:45-12:20 Invited Talk II
12:20-13:00 Two Short Papers
13:00-14:00 On-site Lunch
14:00-15:00 Three Breakout Working Groups,
tasked to critique the morning sessions
15:00-15:30 WG1 presentation & discussion
15:30-15:45 Coffee Break
15:45-16:15 WG2 presentation & discussion
16:15-16:45 WG3 presentation & discussion
16:45-17:00 Concluding remarks

Admission to the workshop is limited to 40 participants and will be established upon submission of a one page statement of interest including:

Participant name, affiliation and contact details;
Motivation for participating;
Position statement.

Those interested in giving a talk, should also submit a position paper (~1500 words). Topics to be addressed in the statements of interest and position papers include, but are not limited to:

Encoding: Semantic Web Standards, Language Engineering Standards;
Technology: e.g. categorization, clustering, meta-tagging, retrieval, extraction, summarization, and generation;
Applications: Content creation, management, and delivery.

Statements of interest and position papers will be circulated among participants ahead of time to create a shared background and facilitate discussion before the event. A selection of statements of interest and position papers will be distributed in printed form as workshop proceedings. The results of the workshop will be compiled and published as a collection in a major trade journal.

Important Dates

Deadline for workshop abstract submission	18th February 2002
Notification of acceptance	8th March 2002
Final version of paper for proceedings	5th April 2002
Workshop	1st June 2002

Organizing Committee

JAMES PUSTEJOVSKY (contact person)

Computer Science Department and Volen Center for Complex Systems
Brandeis University
Waltham, MA 02254-9110, USA
Voice: 1-781-736-2709
Fax: 1-781-736-2741
Email: jamesp@cs.brandeis.edu

CTO, LingoMotors, Inc.
585 Mass. Ave.
Cambridge, MA 02139, USA
Voice: 1-617-299-2711
Email: jamesp@lingomotors.com

ANTONIO SANFILIPPO

SRA International
4300 Fair Lakes Court
Fairfax, VA 22033, USA
Tel.: (703) 322-4988
Fax: (703) 803-1793
Cell: (571) 332-9595
Email: Antonio_Sanfilippo@sra.com

Visiting Research Scientist
Computer Science Department and Volen Center for Complex Systems
Brandeis University
Waltham, MA 02254-9110, USA
Email: antonio@cs.brandeis.edu

Programme Committee

David Allen	International PressTelecommunications Council (UK)
Chris Porter	Factiva (UK)
Chinastu Aone	SRA International (USA)
Anna Bjarnestam	Getty Images (USA)
Nicoletta Calzolari	Istituto di Linguistica Computazionale del CNR (Italy)
Ido Dagan	LingoMotors, Inc.
Ron Daniel Jr.	Interwoven (USA)
Sharon Flank	eMotion (USA)
Chris Green	Time Warner, Inc. (USA)
Nancy Ide	Vassar College (USA)
Roger Medlin	Artesia Technologies (USA)
Eric Miller	W3C World Wide Web Consortium (USA)
Tony Rose	Reuters (UK)
Piek Vossen	Irion (The Netherlands)

Workshop Registration Fees

The registration fees for the workshop are:

If you are not attending LREC: 140 EURO
If you are attending LREC: 90 Euro

These fees include coffee breaks, refreshments, and the Proceedings of the workshop. Participation in the workshop is limited by the venue. Requests for participation will be processed on first come first served basis. Registration will be handled by the LREC Secretariat. Please send statement positions and papers to jamesp@cs.brandeis.edu.

8:45-09:00	Opening
09:00-09:35	Invited talk I
09:35-10:15	Two Short Papers
10:15-10:50	Invited Talk II
10:50-11:30	Two Short Papers
11:30-11:45	Coffee Break
11:45-12:20	Invited Talk II
12:20-13:00	Two Short Papers
13:00-14:00	On-site Lunch
14:00-15:00	Three Breakout Working Groups, tasked to critique the morning sessions
15:00-15:30	WG1 presentation & discussion
15:30-15:45	Coffee Break
15:45-16:15	WG2 presentation & discussion
16:15-16:45	WG3 presentation & discussion
16:45-17:00	Concluding remarks