Investigation of Patterns in Large Collections of Data
We're still flushig out this description, so please be patient...
Background
Work to date
We have developed a small set of python programs that can be used to examine collections of URLs that reference data accessible using various DAP servers.
Using the data
A description of the UI idea from an email:
So the idea is that when the data provider installs his/her server, it launches a job that crawls the site and organizes the data based on directory structure, file name and das/dds. It then launches a GUI in which this organization is presented to the data provider. I imagine a sort of hierarchy affair starting at the top of the web site and going down to individual files. In most cases it would not be possible to list all files, but there should be a way to represent an archive of similar files. For example, one element of a site might be all AVHRR SST fields for the western North Atlantic. This would show as a block for this archive and hen a sample file. The interface would also show the common metadata characteristics for the elements in a given archive and the differences between the metadata in this archive and another archive at the site. The data provider would able to combine elements or split them up in a graphical fashion, like Kepler. S/he would also be able to augment, correct or complete metadata objects. The result would be a THREDDS catalog that could be harvested as well as used by the server. All THREDDS catalogs produced by this GUI for any Hyrax server would be similar in structure and content and all would be CF based. This would make writing clients for these servers very easy.
