REAP Cataloging and Searching: Difference between revisions

Revision as of 01:07, 3 February 2009

Summary

The Kepler client has the ability to access data using DAP servers but has no way to find those servers. Data search systems built for DAP servers don't have a very good track record often because such systems do not address the twin needs of working with a fluid (rapidly changing) pool of data servers and data sets and fitting in with the basic requirement of DAP-based systems - that impact on a data provider be absolutely minimal.

In order for impact to be minimal, the typical data documentation (i.e. metadata) required for most searching systems is not required for data served using DAP. As a result, interfacing DAP servers to such systems is a daunting task involving lots of metadata entry. This effort is frustrated not obly be the often baroque nature of metadata standards (e.g., FGDC) but because the data sources being described move from place to place more frequently and that is something the begs for automated discovery and cataloging - exactly the opposite what is provided by hand-written metadata records.

Other things that complicate this are:

Search systems tend to be one-off things tailored to a specific client
They often eschew server-side solutions that can be leveraged by other projects

The system described here will use Kepler as an example client. It will build metadata records using EML, where most of the really important metadata is actually a series of XML micro documents described by ISO 19115. The system will use a server-side solution (technology leveraged from other projects and thus more likely to be in wide use after time) to augment data sets with this information. The EML documents will be built using XSLT - EML won't be used by the DAP servers and the same geo-spatial information can be used for other things. The EML will be scavenged by Metacat, which Kepler already knows how to use (with some caveats). Metacat doesn't know how to crawl DAP servers, but it can be feed URLs and we may employ TPAC's crawler to feed Metacat with URLs or DDX/EML objects. TBD.

@@ Line 3: / Line 3: @@
 The Kepler client has the ability to access data using DAP servers but has no way to find those servers. Data search systems built for DAP servers don't have a very good track record often because such systems do not address the twin needs of working with a fluid (rapidly changing) pool of data servers and data sets and fitting in with the basic requirement of DAP-based systems - that impact on a data provider be absolutely minimal.
 In order for impact to be minimal, the typical data documentation (i.e. ''metadata'') required for most searching systems is not required for data served using DAP. As a result, interfacing DAP servers to such systems is a daunting task involving lots of metadata entry. This effort is frustrated not obly be the often baroque nature of metadata standards (e.g., FGDC) but because the data sources being described ''move'' from place to place more frequently and that is something the begs for automated discovery and cataloging - exactly the opposite what is provided by hand-written metadata records.
+Other things that complicate this are:
+* Search systems tend to be one-off things tailored to a specific client
+* They often eschew server-side solutions that can be leveraged by other projects
+The system described here will use Kepler as an example client. It will build metadata records using EML, where most of the really important metadata is actually a series of XML micro documents described by ISO 19115. The system will use a server-side solution (technology leveraged from other projects and thus more likely to be in wide use after time) to augment data sets with this information. The EML documents will be built using XSLT - EML won't be used by the DAP servers and the same geo-spatial information can be used for other things. The EML will be scavenged by Metacat, which Kepler already knows how to use (with some caveats). Metacat doesn't know how to crawl DAP servers, but it can be feed URLs and we may employ TPAC's crawler to feed Metacat with URLs or DDX/EML objects. TBD.
 == Use Cases ==
-====[[Add information about a data set to the catalog]]===
+====[[Add information about a data set to the catalog]]====
 ====[[Search the catalog]]====
 ====[[Use a data set found using the search system]]====
 == Definitions ==

REAP Cataloging and Searching: Difference between revisions

Revision as of 01:07, 3 February 2009

Contents

Summary

Use Cases

Add information about a data set to the catalog

Search the catalog

Use a data set found using the search system

Definitions

Background

Deliverables

Period of use

Navigation menu

Page actions

Personal tools

Search

Tools