QuickStart: Difference between revisions

From OPeNDAP Documentation
⧼opendap2-jumptonavigation⧽
No edit summary
Line 1: Line 1:
= An OPeNDAP Quick Start Guide\\\DOCversion =
= An OPeNDAP Quick Start Guide =
Tom Sgouros
Tom Sgouros
\rcsInfoDate
\pagenumbering{roman}
\copyrightmatter
\W\pslink{http://www.opendap.org/pdf/quick.pdf}
\listoffigures
\clearemptydoublepage
\pagenumbering{arabic}


=What To Do With An OPeNDAP URL=
=What To Do With An OPeNDAP URL=
Line 42: Line 36:
The simplest thing you can do with this URL is to download the data it
The simplest thing you can do with this URL is to download the data it
points to.  You could feed it to an OPeNDAP-enabled data analysis package
points to.  You could feed it to an OPeNDAP-enabled data analysis package
like Ferret, or you could append <code>#<\code>, and feed the URL to a
like Ferret, or you could append <code>.asc<\code>, and feed the URL to a
regular web browser like Netscape.  This will work, but you don't
regular web browser like Netscape.  This will work, but you don't
really want to do it because in binary form, there are about 28
really want to do it because in binary form, there are about 28
Line 53: Line 47:
   information about the data that will be useful when analyzing data
   information about the data that will be useful when analyzing data
   in ''any''  package.}
   in ''any''  package.}
\subj{You need to sample the data}
 
===You need to sample the data===
 
A better strategy is to find out some information about the data.
A better strategy is to find out some information about the data.
OPeNDAP has sophisticated methods for subsampling data at a remote site,
OPeNDAP has sophisticated methods for subsampling data at a remote site,
Line 59: Line 55:
looking at the data's \new{Dataset Descriptor Structure} (DDS).  This
looking at the data's \new{Dataset Descriptor Structure} (DDS).  This
provides a description of the "shape" of the data, using a vaguely
provides a description of the "shape" of the data, using a vaguely
C-like syntax.  You get a dataset's DDS by appending <code>#<\code> to the
C-like syntax.  You get a dataset's DDS by appending <code>.dds<\code> to the
[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.dds URL].
[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.dds URL].
\figureplace{An OPeNDAP DDS (<code>#<\code>)}{htb}
\figureplace{An OPeNDAP DDS (\lit{sst.mnmean.nc.dds})}{htb}
{reynolds,dds}{reynolds-dds.ps}{reynolds-dds.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.dds}
{reynolds,dds}{reynolds-dds.ps}{reynolds-dds.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.dds}
From the DDS shown, you can see that the dataset consists of five
From the DDS shown, you can see that the dataset consists of five
pieces:  
pieces:  
\subj{Find out what's in the data}
 
===Find out what's in the data===
 
    
    
*A 180-element vector called "lat",  
*A 180-element vector called "lat",  
*A 360-element vector called "lon",  
*A 360-element vector called "lon",  
*A 226-element vector called "time",  
*A 226-element vector called "time",  
*A "Grid" containing a three-dimensional array of integer  values (<code>#<\code>) called <code>#<\code>, and three "Map" vectors,  which may look familiar, and  
*A "Grid" containing a three-dimensional array of integer  values (\lit{Int16}) called \lit{sst}, and three "Map" vectors,  which may look familiar, and  
*Another Grid called <code>#<\code>. The \new{Grid} is a special OPeNDAP data type that includes a
*Another Grid called \lit{mask}. The \new{Grid} is a special OPeNDAP data type that includes a
multidimensional array, and \new{map vectors} that indicate the
multidimensional array, and \new{map vectors} that indicate the
independent variable values.  That is, you can use a Grid to store an
independent variable values.  That is, you can use a Grid to store an
Line 101: Line 99:
   Click [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.das here]
   Click [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.das here]
   or on the figure to see the rest of it.}
   or on the figure to see the rest of it.}
\figureplace{An OPeNDAP DAS (<code>#<\code>)}{h}
\figureplace{An OPeNDAP DAS (\lit{sst.mnmean.nc.das})}{h}
{reynolds,das}{reynolds-das.ps}{reynolds-das.gif}
{reynolds,das}{reynolds-das.ps}{reynolds-das.gif}
{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.das}
{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.das}
Line 109: Line 107:
   COARDS compliant.  Other metadata standards you may encounter with
   COARDS compliant.  Other metadata standards you may encounter with
   OPeNDAP data are HDF-EOS, EPIC, FGDC, or no metadata at all.}
   OPeNDAP data are HDF-EOS, EPIC, FGDC, or no metadata at all.}
\subj{Find out more about the data variables}
 
===Find out more about the data variables===
 
Now we can tell something more about the data.  Apparently the
Now we can tell something more about the data.  Apparently the
<code>#<\code> vector contains latitude, in degrees north, and the range is
\lit{lat} vector contains latitude, in degrees north, and the range is
from 89.5 to -89.5.  Since this is a global grid, the latitude values
from 89.5 to -89.5.  Since this is a global grid, the latitude values
probably go in order.  We can check this by asking for just the
probably go in order.  We can check this by asking for just the
Line 126: Line 126:
\xlinkn{time}
\xlinkn{time}
and [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?lon longitude]
and [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?lon longitude]
\subj{The info service also provides the DAS and DDS information.}
 
===The info service also provides the DAS and DDS information.===
 
vectors to see how this works.
vectors to see how this works.
}
}
Line 133: Line 135:
recorded in the data which, because of your familiarity with the
recorded in the data which, because of your familiarity with the
Julian calendar, you instantly recognize as beginning in November,
Julian calendar, you instantly recognize as beginning in November,
1981.  You might also notice that the <code>#<\code> array is used to
1981.  You might also notice that the \lit{mask} array is used to
indicate land and sea, and has only the values 0 and
indicate land and sea, and has only the values 0 and
1.  
1.  
Line 143: Line 145:
documentation here, as well.  Some will find this the easiest way to
documentation here, as well.  Some will find this the easiest way to
read the attribute and structure information.  You can see what
read the attribute and structure information.  You can see what
information is available by appending <code>#<\code> to a URL, like
information is available by appending <code>.info<\code> to a URL, like
[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.info this]:
[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.info this]:
\begin{vcode}[.]{sib}
\begin{vcode}[.]{sib}
Line 153: Line 155:
Now that we know a little about the shape of the data, and the data
Now that we know a little about the shape of the data, and the data
attributes, let's look at some of the data.
attributes, let's look at some of the data.
\subj{Use subscripts to sample a Grid.}
 
===Use subscripts to sample a Grid.===
 
You can request a piece of an array with subscripts,
You can request a piece of an array with subscripts,
just like in a C program or in Matlab or many other computer
just like in a C program or in Matlab or many other computer
Line 168: Line 172:
...sst/mnmean.nc.asc?mask[28:30][206:209]
...sst/mnmean.nc.asc?mask[28:30][206:209]
</pre>
</pre>
\subj{Sampling a Grid produces part of the Grid, including the map vectors.}
 
===Sampling a Grid produces part of the Grid, including the map vectors.===
 
Which produces a portion of the land mask somewhere near Alaska's
Which produces a portion of the land mask somewhere near Alaska's
Kenai peninsula\texorhtml{, shown in figure~\ref{reynolds,mask}}{:}
Kenai peninsula\texorhtml{, shown in figure~\ref{reynolds,mask}}{:}
Line 177: Line 183:
interested in the sea surface temperature data than the land mask.
interested in the sea surface temperature data than the land mask.
The temperature data is a three-dimensional grid.  To sample the  
The temperature data is a three-dimensional grid.  To sample the  
[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?sst[12:13][28:30][206:209] <code>#<\code>]
[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?sst[12:13][28:30][206:209] \lit{sst}]
Grid, you
Grid, you
just add a dimension for time:
just add a dimension for time:
Line 187: Line 193:
\figureplace{Part of the Reynolds SST data}{h}{reynolds,sst}{sst.ps}{sst.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?sst[12:13][28:30][206:209]}
\figureplace{Part of the Reynolds SST data}{h}{reynolds,sst}{sst.ps}{sst.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?sst[12:13][28:30][206:209]}
Note that the sst values are in celsius degrees multiplied by 100, as
Note that the sst values are in celsius degrees multiplied by 100, as
indicated by the <code>#<\code> attribute of the [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.das DAS].
indicated by the \lit{scale_factor} attribute of the [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.das DAS].
Further, it's important to remember with this dataset, that the data
Further, it's important to remember with this dataset, that the data
were obtained by calculating spatial and temporal means.
were obtained by calculating spatial and temporal means.
Consequently, the data points in the <code>#<\code> array should be ignored
Consequently, the data points in the \lit{sst} array should be ignored
when the corresponding entry in the <code>#<\code> array indicates they
when the corresponding entry in the \lit{mask} array indicates they
are over land.
are over land.
%
%
Line 205: Line 211:
stored in that form.  OPeNDAP provides a data type called a \new{Sequence}
stored in that form.  OPeNDAP provides a data type called a \new{Sequence}
to store this kind of data.
to store this kind of data.
\subj{A Sequence is a relational table.}
 
===A Sequence is a relational table.===
 
A Sequence can be thought of as a relational data table, with each
A Sequence can be thought of as a relational data table, with each
column representing a different data value, and each row representing
column representing a different data value, and each row representing
Line 219: Line 227:
</pre>
</pre>
The [http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.das DAS]
The [http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.das DAS]
(append <code>#<\code> to the URL) for this data is pretty uninformative,
(append <code>.das<\code> to the URL) for this data is pretty uninformative,
telling us only that all the data are stored as strings\texorhtml{.
telling us only that all the data are stored as strings\texorhtml{.
   You can see this in figure~\ref{rlctd,das}.}{:}
   You can see this in figure~\ref{rlctd,das}.}{:}
\figureplace{A DAS for Sequence data.}{h}{rlctd,das}{rlctd-das.ps}
\figureplace{A DAS for Sequence data.}{h}{rlctd,das}{rlctd-das.ps}
{rlctd-das.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.das}
{rlctd-das.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.das}
\subj{You can sometimes find data attributes among the data.}
 
===You can sometimes find data attributes among the data.===
 
On the other hand, a lot of the information we would get from the DAS
On the other hand, a lot of the information we would get from the DAS
is actually encoded in the data itself, which you can see by looking
is actually encoded in the data itself, which you can see by looking
at the data's
at the data's
[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.dds DDS] (append
[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.dds DDS] (append
<code>#<\code> to the URL)\texorhtml{, shown in figure~\ref{rlctd,dds}.}{:}
<code>.dds<\code> to the URL)\texorhtml{, shown in figure~\ref{rlctd,dds}.}{:}
\figureplace{A DDS for Sequence data.}{h}{rlctd,dds}{rlctd-dds.ps}
\figureplace{A DDS for Sequence data.}{h}{rlctd,dds}{rlctd-dds.ps}
{rlctd-dds.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.dds}
{rlctd-dds.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.dds}
Line 239: Line 249:
This produces a response shown \texorhtml{in
This produces a response shown \texorhtml{in
   figure~\ref{rlctd,coverage}.}{[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s here].}
   figure~\ref{rlctd,coverage}.}{[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s here].}
\figureplace{The <code>#<\code> dates and locations}{h}{rlctd,coverage}
\figureplace{The \lit{rlctd} dates and locations}{h}{rlctd,coverage}
{rlctd-cov.ps}{rlctd-cov.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s}
{rlctd-cov.ps}{rlctd-cov.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s}
\subj{Use a selection clause to select Sequence rows.
 
===Use a selection clause to select Sequence rows.===
 
After reviewing the data in the last request, perhaps we decide we
After reviewing the data in the last request, perhaps we decide we
only want to see data from one of the cruises listed, or maybe only
only want to see data from one of the cruises listed, or maybe only
Line 267: Line 279:
wish to have returned, subject to the constraint of the selection
wish to have returned, subject to the constraint of the selection
clause.  In the previous example, the projection clause consiste only
clause.  In the previous example, the projection clause consiste only
of the <code>#<\code> variable.  In the one before that, the list was
of the \lit{o2} variable.  In the one before that, the list was
longer, containing 7 variables.\indc{constraint expression!projection
longer, containing 7 variables.\indc{constraint expression!projection
   clause}  
   clause}  
Line 276: Line 288:
==An Easier Way==
==An Easier Way==


\subj{The OPeNDAP query form is an easier way to sample data.}
 
===The OPeNDAP query form is an easier way to sample data.===
 
OPeNDAP also includes a way to sample data that makes writing a
OPeNDAP also includes a way to sample data that makes writing a
constraint expression somewhat easier.  Append <code>#<\code> to the URL,
constraint expression somewhat easier.  Append <code>.html<\code> to the URL,
and you get a form that directs you to add information to sample the
and you get a form that directs you to add information to sample the
data at a [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html URL]:
data at a [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html URL]:
Line 284: Line 298:
...sst.mnmean.nc.html
...sst.mnmean.nc.html
</pre>
</pre>
Sending a URL ending in <code>#<\code> returns a form like this:
Sending a URL ending in <code>.html<\code> returns a form like this:
\figureplace{The OPeNDAP Dataset Access Form}{h}{reynolds,ifh}{ifh.ps}{ifh.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html}
\figureplace{The OPeNDAP Dataset Access Form}{h}{reynolds,ifh}{ifh.ps}{ifh.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html}
It's useful to have a browser window open with one of these query forms in it
It's useful to have a browser window open with one of these query forms in it
Line 297: Line 311:
is really just for your perusal.  At this point, there's not much to
is really just for your perusal.  At this point, there's not much to
be done with this, but it is often helpful information.
be done with this, but it is often helpful information.
\subj{Select variables by clicking on a checkbox.The important part
 
===Select variables by clicking on a checkbox.===
  The important part
of the page is the "Variables" section.  For each variable in the
of the page is the "Variables" section.  For each variable in the
dataset, you'll see the data description (e.g. ``Array of 32 bit Reals
dataset, you'll see the data description (e.g. ``Array of 32 bit Reals
Line 311: Line 327:
instructions about how to proceed.
instructions about how to proceed.
\note{You'll see a "stride" mentioned.  This is another way to
\note{You'll see a "stride" mentioned.  This is another way to
   subsample an OPeNDAP array or Grid.  Asking for <code>#<\code> gets you the first
   subsample an OPeNDAP array or Grid.  Asking for \lit{lat[0:4]} gets you the first
   five members of the <code>#<\code> array.  Adding a stride value allows
   five members of the \lit{lat} array.  Adding a stride value allows
   you to skip array values.  Asking for <code>#<\code> gets you
   you to skip array values.  Asking for \lit{lat[0:2:10]} gets you
   every second array value between 0 and
   every second array value between 0 and
   10: 0, 2, 4, 6, 8, 10.}  
   10: 0, 2, 4, 6, 8, 10.}  
Line 330: Line 346:
for your browser.  There are instructions for doing this at the OPeNDAP
for your browser.  There are instructions for doing this at the OPeNDAP
home page.)
home page.)
\subj{The web interface works for Sequence data, too.}
 
===The web interface works for Sequence data, too.===
 
The OPeNDAP Data Access Form interface works for Sequence data as well as
The OPeNDAP Data Access Form interface works for Sequence data as well as
Grids.  However, since Sequence constraint expressions look different
Grids.  However, since Sequence constraint expressions look different
Line 359: Line 377:
==GCMD==
==GCMD==


\subj{The GCMD now catalogs OPeNDAP URLs!The \xlink{Global Change
 
===The GCMD now catalogs OPeNDAP URLs!===
  The \xlink{Global Change
   Master Directory}{http://gcmd.gsfc.nasa.gov} is a source of a huge
   Master Directory}{http://gcmd.gsfc.nasa.gov} is a source of a huge
amount of earth science data.  They now catalog OPeNDAP URLs for the
amount of earth science data.  They now catalog OPeNDAP URLs for the
Line 375: Line 395:
\xlink{\but{Datasets}}
\xlink{\but{Datasets}}
{http://www.unidata.ucar.edu/cgi-bin/dods/datasets/datasets.cgi?xmlfilename=datasets.xml}
{http://www.unidata.ucar.edu/cgi-bin/dods/datasets/datasets.cgi?xmlfilename=datasets.xml}
\subj{The OPeNDAP project supports an ad hoc list of data URLs.}
 
===The OPeNDAP project supports an ad hoc list of data URLs.===
 
in the table of contents\texorhtml{}{ or right here}.  You can find a URL and a
in the table of contents\texorhtml{}{ or right here}.  You can find a URL and a
brief description for several hundred different datasets from that
brief description for several hundred different datasets from that
Line 387: Line 409:
datasets.  For example, let's look at the [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html Reynolds data]
datasets.  For example, let's look at the [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html Reynolds data]
we saw in chapter~1:
we saw in chapter~1:
\subj{The web interface allows browsing data directories.}
 
===The web interface allows browsing data directories.===
 
\begin{vcode}[.]{sib}
\begin{vcode}[.]{sib}
http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html
http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html
Line 418: Line 442:
can request the entire dataset, or subsample it just like any other
can request the entire dataset, or subsample it just like any other
OPeNDAP dataset.
OPeNDAP dataset.
\subj{A file server is a list of other datasets, but it's a dataset, too.}
 
===A file server is a list of other datasets, but it's a dataset, too.===
 
There is a file server for GSO/URI's archive of AVHRR sea surface
There is a file server for GSO/URI's archive of AVHRR sea surface
temperature data:
temperature data:
Line 436: Line 462:
The OPeNDAP Matlab GUI browser contains its own frequently updated list of
The OPeNDAP Matlab GUI browser contains its own frequently updated list of
available datasets.  Using that software, you can select datasets with
available datasets.  Using that software, you can select datasets with
\subj{The Matlab GUI has its own list of available data.}
 
===The Matlab GUI has its own list of available data.===
 
a mouse from a large selection of the available URLs.  For more
a mouse from a large selection of the available URLs.  For more
information, please refer to the \OPDmgui\ manual.
information, please refer to the \OPDmgui\ manual.
Line 447: Line 475:
\tbd{Add links to the following list.}
\tbd{Add links to the following list.}
    
    
*Use a generic web client like <code>#<\code> (a standard part of  the OPeNDAP package), the free programs  [http://www.gnu.org/manual/wget-1.5.3/html_mono/wget.html <code>#<\code>]  or [http://lynx.browser.org <code>#<\code>], or even a browser  like <code>#<\code> or <code>#<\code> to download  data into a local data file.  To be able to use the data further,  you will probably have to download the ASCII version by using the        <code>#<\code> suffix on the URL, as in the examples shown. \subj{Use a generic web client or an OPeNDAP client to get the data you've chosen.}  
*Use a generic web client like \lit{geturl} (a standard part of  the OPeNDAP package), the free programs  [http://www.gnu.org/manual/wget-1.5.3/html_mono/wget.html \lit{wget}]  or [http://lynx.browser.org \lit{lynx}], or even a browser  like \lit{Netscape Navigator} or \lit{Internet Explorer} to download  data into a local data file.  To be able to use the data further,  you will probably have to download the ASCII version by using the        <code>.asc<\code> suffix on the URL, as in the examples shown.  
*There are pre-packaged OPeNDAP clients available that can download  binary OPeNDAP data from the web into a useful form.  As of \today ,  command line clients (<code>#<\code>) are available for the Matlab  and IDL data analysis environments, with which you can download OPeNDAP  data directly into IDL or Matlab objects.  \indc{loaddods!Matlab or    IDL client}
===Use a generic web client or an OPeNDAP client to get the data you've chosen.=== 
*The [http://ferret.wrc.noaa.gov/Ferret Ferret] and GrADS  free data analysis packages both support OPeNDAP.  You can use these  for downloading OPeNDAP data, and for examining it afterwards.  (There  are limitations.  As of \today , Ferret can not read datasets served  as Sequence data.)
*There are pre-packaged OPeNDAP clients available that can download  binary OPeNDAP data from the web into a useful form.  As of \today ,  command line clients (\lit{loaddods}) are available for the Matlab  and IDL data analysis environments, with which you can download OPeNDAP  data directly into IDL or Matlab objects.  \indc{loaddods!Matlab or    IDL client}  
*The Matlab analysis package also supports an OPeNDAP client attached  to a graphical user interface.  You can use the GUI to create a  constrained OPeNDAP URL, and download the data directly into Matlab.  The \OPDmgui\ contains more information about the Matlab GUI  client.
*If you have a data analysis program or package that you like,  you can look into the possibility of linking that package to the  OPeNDAP toolkit library, in effect making your program into a  web-capable OPeNDAP client. \indc{DODS!linking to your    software} DODS!libraries    exist to mimic the behavior of the  \netcdf and \jgofs\ data access APIs.  If your program already uses  one of these APIs, getting it to run with OPeNDAP may be as simple as  changing the libraries to which you link it.  The \OPDuser\    describes how to do this, and the \OPDapi\ describes how you can  use the OPeNDAP toolkit directly to create a new application that  doesn't use one of the established data access  APIs. The use of these clients, like the ways in which you can analyze the
data you find, is beyond the scope of this (or any) book.  Enjoy.
\printindex
 
=Peeking at Data=
 
Now that we know a little about the shape of the data, and the data
attributes, let's look at some of the data.
\subj{Use subscripts to sample a Grid.}
You can request a piece of an array with subscripts,
just like in a C program or in Matlab or many other computer
languages.  Use a colon to indicate a subscript range.
\begin{vcode}[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?time<center><math>0:6</math></center>]{sib}
...sst/mnmean.nc.asc?time[0:6]
</pre>
This [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?time[0:6] URL] will produce \texorhtml{figure~\ref{reynolds,timevec}}{the following:}
\figureplace{Part of a vector.}{h}{reynolds,timevec}{timevec.ps}{timevec.gif}{}
You can do the
[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?mask[28:30][206:209] same]
for one of the grids:
\begin{vcode}[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?mask<center><math>28:30</math></center><center><math>206:209</math></center>]{sib}
...sst/mnmean.nc.asc?mask[28:30][206:209]
</pre>
\subj{Sampling a Grid produces part of the Grid, including the map vectors.}
Which produces a portion of the land mask somewhere near Alaska's
Kenai peninsula\texorhtml{, shown in figure~\ref{reynolds,mask}}{:}
\figureplace{Part of an OPeNDAP Grid.}{h}{reynolds,mask}{mask.ps}{mask.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?mask[28:30][206:209]}
Notice that when you ask for part of an OPeNDAP Grid, you get the array
part along with the corresponding parts of the map vectors.
If you are interested in the Reynolds dataset, you are probably more
interested in the sea surface temperature data than the land mask.
The temperature data is a three-dimensional grid.  To sample the
[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?sst[12:13][28:30][206:209] <code>#<\code>]
Grid, you
just add a dimension for time:
\begin{vcode}[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?sst<center><math>12:13</math></center><center><math>28:30</math></center><center><math>206:209</math></center>]{sib}
...sst/mnmean.nc.asc?sst[12:13][28:30][206:209]
</pre>
This produces something like\texorhtml{ the figure shown in
  figure~\ref{reynolds,sst}}{this:}
\figureplace{Part of the Reynolds SST data}{h}{reynolds,sst}{sst.ps}{sst.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?sst[12:13][28:30][206:209]}
Note that the sst values are in celsius degrees multiplied by 100, as
indicated by the <code>#<\code> attribute of the [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.das DAS].
Further, it's important to remember with this dataset, that the data
were obtained by calculating spatial and temporal means.
Consequently, the data points in the <code>#<\code> array should be ignored
when the corresponding entry in the <code>#<\code> array indicates they
are over land.
%
==Sampling Grids by Value==
 
 
=Sequence Data=
 
(2)
Gridded data works well for satellite images, model data, and data
compilations such as the Reynolds data we've just looked at.  Other
data, such as data measured at a specific site, is not so readily
stored in that form.  OPeNDAP provides a data type called a \new{Sequence}
to store this kind of data.
\subj{A Sequence is a relational table.}
A Sequence can be thought of as a relational data table, with each
column representing a different data value, and each row representing
a different data "instance."  For example, an ocean \ind{temperature
profile} can be stored as a Sequence of pressure and temperature pairs,
and a weather station's data can be stored as a Sequence with time in
one column, and each weather variable occupying another column.
Let's look at a couple of Sequences.  The first one is a collection of
CTD data (hydrographic data, including temperature, pressure,
salinity, and so on):
<pre>
http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd
</pre>
The [http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.das DAS]
(append <code>#<\code> to the URL) for this data is pretty uninformative,
telling us only that all the data are stored as strings\texorhtml{.
  You can see this in figure~\ref{rlctd,das}.}{:}
\figureplace{A DAS for Sequence data.}{h}{rlctd,das}{rlctd-das.ps}
{rlctd-das.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.das}
\subj{You can sometimes find data attributes among the data.}
On the other hand, a lot of the information we would get from the DAS
is actually encoded in the data itself, which you can see by looking
at the data's
[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.dds DDS] (append
<code>#<\code> to the URL)\texorhtml{, shown in figure~\ref{rlctd,dds}.}{:}
\figureplace{A DDS for Sequence data.}{h}{rlctd,dds}{rlctd-dds.ps}
{rlctd-dds.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.dds}
We can get some idea of the data coverage by asking for some of the
time and location data, with a URL like this:
\begin{vcode}[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s]{sib}
...rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s
</pre>
This produces a response shown \texorhtml{in
  figure~\ref{rlctd,coverage}.}{[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s here].}
\figureplace{The <code>#<\code> dates and locations}{h}{rlctd,coverage}
{rlctd-cov.ps}{rlctd-cov.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s}
\subj{Use a selection clause to select Sequence rows.} 
After reviewing the data in the last request, perhaps we decide we
only want to see data from one of the cruises listed, or maybe only
data from the month of May.  We can add a \new{selection clause} to
the constraint expression to select only that data.  For example:
\begin{vcode}[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s&month_s=5]{sib}
...rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s&month_s=5
</pre>
This produces a table containing all the rows from
\texorhtml{figure~\ref{rlctd,coverage}}{the last example} where the
month datum is May.  \texorhtml{Try entering the new URL in your browser
  and see what you get.}{Click \xlinkn{here}
to see that table.}
Selection clauses can be stacked endlessly against a URL, allowing all
the flexibility most people need to sample data files.  Here's an
example of a
[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?o2\&month_s=5\&pres>50\&pres<100 URL]
that requests all the oxygen data in the file taken in May at a
specific depth range:
\begin{vcode}[http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?o2&month_s=5&pres>50&pres<100]{sib}
...rlctd.asc?o2&month_s=5&pres>50&pres<100
</pre>
The first clause in a constraint expression has a name, too.  It is
the \new{projection clause}.  This is the list of variables that you
wish to have returned, subject to the constraint of the selection
clause.  In the previous example, the projection clause consiste only
of the <code>#<\code> variable.  In the one before that, the list was
longer, containing 7 variables.\indc{constraint expression!projection
  clause}
\tbd{There is a get_row() method for Sequences now, so that you can
  select a sequence row by its ordinal number.  When this makes it
into the server releases, document it.}
 
=An Easier Way=
 
\subj{The OPeNDAP query form is an easier way to sample data.}
OPeNDAP also includes a way to sample data that makes writing a
constraint expression somewhat easier.  Append <code>#<\code> to the URL,
and you get a form that directs you to add information to sample the
data at a [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html URL]:
\begin{vcode}[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html]{sib}
...sst.mnmean.nc.html
</pre>
Sending a URL ending in <code>#<\code> returns a form like this:
\figureplace{The OPeNDAP Dataset Access Form}{h}{reynolds,ifh}{ifh.ps}{ifh.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html}
It's useful to have a browser window open with one of these query forms in it
while you read this section.  \texorhtml{}{Click
  \xlinkn{here}  {http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html}
  to bring up a copy of the form to use while you read.}
Near the top of the page, you'll see a box entitled "Data URL".  At
this point, if you've been following along, it should look pretty
familiar.  If you're just jumping in, it's the OPeNDAP URL connected to
the data we're interested in, but unsampled.
Moving down the page, there is a list of "Global Attributes", which
is really just for your perusal.  At this point, there's not much to
be done with this, but it is often helpful information.
\subj{Select variables by clicking on a checkbox.}  The important part
of the page is the "Variables" section.  For each variable in the
dataset, you'll see the data description (e.g. ``Array of 32 bit Reals
[lat = 0..179]''), a checkbox, a text input box, and a list of the
variable's attributes.  If you click on the checkbox, you'll see the
variable's array bounds appear in the text box, and you'll see that
variable appear in a constraint expression appended to the Data URL at
the top of the page.  If you edit the array bounds in the text box,
hitting "enter" will place your edits in the Data URL box.
In the oh-so-unlikely event you dare try all this without your
documentation \new{vade mecum} along, there's a \but{Show Help}
button up near the top of the page.  Clicking there will show you
instructions about how to proceed.
\note{You'll see a "stride" mentioned.  This is another way to
  subsample an OPeNDAP array or Grid.  Asking for <code>#<\code> gets you the first
  five members of the <code>#<\code> array.  Adding a stride value allows
  you to skip array values.  Asking for <code>#<\code> gets you
  every second array value between 0 and
  10: 0, 2, 4, 6, 8, 10.}
Move on down the variable list, editing your request, and experiment
with adding and changing variable requests.
When you have a request you'd like to make, look at the buttons at the
top of the page. 
\figureplace{Dataset Access Form Detail}{h}{reynolds,ifh-buttons}
{ifh-buttons.ps}{ifh-buttons.gif}{}
You can click on \but{Get ASCII}, and the data
request will appear in a browser window, in comma-separated form.  The
\but{Get Binary} button will save a binary data file on your local
disk.  (The \but{Send to Program} will send the URL directly to an OPeNDAP
client.  However, it requires a suitable OPeNDAP client to be running on
your computer, and also requires you to install a helper application
for your browser.  There are instructions for doing this at the OPeNDAP
home page.)
\subj{The web interface works for Sequence data, too.}
The OPeNDAP Data Access Form interface works for Sequence data as well as
Grids.  However, since Sequence constraint expressions look different
than Grid expressions, the form looks slightly different, too.  You
can see \texorhtml{from figure~\ref{rlctd,ifh-seq}}{below} that the
variable selection boxes allow you to enter relational expressions for
each variable.  Beside that, however, the function is exactly the same.
\indc{html
  interface!Sequence data}
\figureplace{Dataset Access Form for Sequence Data (detail)}{h}
{rlctd,ifh-seq}{ifh-seq.ps}{ifh-seq.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.html}
\texorhtml{}{Click \xlinkn{here}  {http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.html} to see a copy of
  a Sequence form.}
\note{Not all OPeNDAP servers support all the OPeNDAP functionality.  There
  are a few non-standard OPeNDAP servers out there in the world that only
   support the bare minimum required.  That minimum is to respond to
  queries for the DDS, DAS, and (binary) data.  The ASCII data and the
  web access form are optional add-ons that are not required for the
  basic OPeNDAP function.}
\chapter{Finding More OPeNDAP URLs}
The OPeNDAP package was developed to improve ways to share data among
scientists.  Many times, data comes in the form of a URL enclosed in
an email message.  But there are several other ways to find data served
by OPeNDAP servers.
 
=GCMD=
 
\subj{The GCMD now catalogs OPeNDAP URLs!}  The \xlink{Global Change
  Master Directory}{http://gcmd.gsfc.nasa.gov} is a source of a huge
amount of earth science data.  They now catalog OPeNDAP URLs for the
datasets that have them.  You can search on "OPeNDAP" right from the
main page to find many of these datasets.  Try that search, then click
on one of the data set names that returns, and look at the bottom of
the resulting Set Description'' page, under the heading ``Related
URL.''
If you make that search, check the list for the Reynolds data from
chapter~1; it should be there.
 
=OPeNDAP Dataset List=
 
The \OPDhome\ has a list of available OPeNDAP datasets.  Click on
\xlink{\but{Datasets}}
{http://www.unidata.ucar.edu/cgi-bin/dods/datasets/datasets.cgi?xmlfilename=datasets.xml}
\subj{The OPeNDAP project supports an ad hoc list of data URLs.}
in the table of contents\texorhtml{}{ or right here}.  You can find a URL and a
brief description for several hundred different datasets from that
[http://unidata.ucar.edu/packages/dods/home/data.shtml list].
 
=Web Interface=
 
This is a little bit sneaky.  Many sites that serve one OPeNDAP dataset
serve several others as well.  The OPeNDAP web interface (if it's enabled
by the site) allows you to check the directory structure for other
datasets.  For example, let's look at the [http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html Reynolds data]
we saw in chapter~1:
\subj{The web interface allows browsing data directories.}
\begin{vcode}[.]{sib}
http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html
</pre>
If we use the same URL, but without the file at the end, we can browse
the directory of data:
\begin{vcode}[.]{sib}
http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/
</pre>
The OPeNDAP server checks to see whether the URL is a directory, and if
so, it generates a directory listing, like \texorhtml{in figure~\ref{reynolds,ifh-dir}.}{[http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/ this]:}
\figureplace{Web Interface Index Listing}{h}{reynolds,ifh-dir}
{ifh-dir.ps}{ifh-dir.gif}
{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/}
You can see from the directory listing that the monthly mean dataset
we've been looking at is accompanied by a weekly mean set, and a daily
set.  You can click on those datasets for more information about them,
and proceed to examine and use them just as we've done with the other
examples in chapter~1.
\note{This list is produced by an OPeNDAP server.  It only really understands
OPeNDAP data files.  If the directory you're looking at has other files
in it, clicking on them will probably produce an error.}
 
=File Servers=
 
Some datasets you'll find are actually lists of other datasets.  There
are a few of these ''file servers''  in the [http://unidata.ucar.edu/packages/dods/home/data.shtml OPeNDAP Dataset  List] on the
\OPDhome .  A file server is itself an OPeNDAP dataset, organized as a
Sequence, containing URLs with some other identifying data (often time).  You
can request the entire dataset, or subsample it just like any other
OPeNDAP dataset.
\subj{A file server is a list of other datasets, but it's a dataset, too.}
There is a file server for GSO/URI's archive of AVHRR sea surface
temperature data:
<pre>
http://maewest.gso.uri.edu/cgi-bin/nph-ff/catalog/avhrr.catalog
</pre>
Look at this server's [http://maewest.gso.uri.edu/cgi-bin/nph-ff/catalog/avhrr.catalog.dds DDS], and the [http://maewest.gso.uri.edu/cgi-bin/nph-ff/catalog/avhrr.catalog.html web interface], and then try asking
for some data like [http://maewest.gso.uri.edu/cgi-bin/nph-ff/catalog/avhrr.catalog.asc?DODS_URL\&year=2000\&month=1 this]:
\begin{vcode}[http://maewest.gso.uri.edu/cgi-bin/nph-ff/catalog/avhrr.catalog.asc?DODS_URL&year=2000&month=1]{sib}
  .../catalog/avhrr.catalog.asc?DODS_URL&year=2000&month=1
</pre>
This produces a list of all the data URLs corresponding to
measurements taken in the month of January, 2000.
 
=Matlab GUI=
 
The OPeNDAP Matlab GUI browser contains its own frequently updated list of
available datasets.  Using that software, you can select datasets with
\subj{The Matlab GUI has its own list of available data.}
a mouse from a large selection of the available URLs.  For more
information, please refer to the \OPDmgui\ manual.
\chapter{Further analysis}
This guide is about forming an OPeNDAP URL.  After you have figured out
how to request the data, there are a variety of things you can do with
it.  (OPeNDAP software mentioned here is available from the \OPDhome .)
\tbd{Add links to the following list.}
 
*Use a generic web client like <code>#<\code> (a standard part of  the OPeNDAP package), the free programs  [http://www.gnu.org/manual/wget-1.5.3/html_mono/wget.html <code>#<\code>]  or [http://lynx.browser.org <code>#<\code>], or even a browser  like <code>#<\code> or <code>#<\code> to download  data into a local data file.  To be able to use the data further,  you will probably have to download the ASCII version by using the        <code>#<\code> suffix on the URL, as in the examples shown. \subj{Use a generic web client or an OPeNDAP client to get the data you've chosen.}
*There are pre-packaged OPeNDAP clients available that can download  binary OPeNDAP data from the web into a useful form.  As of \today ,  command line clients (<code>#<\code>) are available for the Matlab  and IDL data analysis environments, with which you can download OPeNDAP  data directly into IDL or Matlab objects.  \indc{loaddods!Matlab or    IDL client}  
*The [http://ferret.wrc.noaa.gov/Ferret Ferret] and GrADS  free data analysis packages both support OPeNDAP.  You can use these  for downloading OPeNDAP data, and for examining it afterwards.  (There  are limitations.  As of \today , Ferret can not read datasets served  as Sequence data.)  
*The [http://ferret.wrc.noaa.gov/Ferret Ferret] and GrADS  free data analysis packages both support OPeNDAP.  You can use these  for downloading OPeNDAP data, and for examining it afterwards.  (There  are limitations.  As of \today , Ferret can not read datasets served  as Sequence data.)  
*The Matlab analysis package also supports an OPeNDAP client attached  to a graphical user interface.  You can use the GUI to create a  constrained OPeNDAP URL, and download the data directly into Matlab.  The \OPDmgui\ contains more information about the Matlab GUI  client.  
*The Matlab analysis package also supports an OPeNDAP client attached  to a graphical user interface.  You can use the GUI to create a  constrained OPeNDAP URL, and download the data directly into Matlab.  The \OPDmgui\ contains more information about the Matlab GUI  client.  

Revision as of 21:55, 13 March 2007

An OPeNDAP Quick Start Guide

Tom Sgouros


What To Do With An OPeNDAP URL

The \OPD\ is a system that allows you to access data over the internet, from programs that weren't originally designed for that purpose, as well as some that were. With OPeNDAP, you access data using a URL, just like a URL you would use to access a web page. However, before you request any data, you need to know how to request it in a form your browser can handle. OPeNDAP data is stored in binary form, and by default, it is transmitted that way, too. The other problem with an OPeNDAP URL is that a single URL might point to an archive containing 50 megabytes of data. You rarely want to request the whole thing without knowing a little about it. OPeNDAP provides sophisticated sub-sampling capabilities, but you need to know a little bit about the data in order to use them. \texorhtml{}{So here's what to do if someone gives you a raw URL, and

 says there's some OPeNDAP data on the other end.

\htmlmenu{4}

What To Do With An OPeNDAP URL

} (1) Suppose someone gives you a hot tip that there's a lot of good data at:

http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc

This URL points to monthly means of sea surface temperature, worldwide, compiled by Richard Reynolds at the Climate Modeling branch of NOAA, but pretend you don't know that yet.\indc{Climate

 Modeling!NOAA} 

The simplest thing you can do with this URL is to download the data it points to. You could feed it to an OPeNDAP-enabled data analysis package like Ferret, or you could append .asc<\code>, and feed the URL to a regular web browser like Netscape. This will work, but you don't really want to do it because in binary form, there are about 28 megabytes of data at that URL. \note{An OPeNDAP server will work with many different clients, some of

 which are supported by the OPeNDAP team, and some of which are
 supported by others.  The operation of any individual package is
 beyond the scope of this manual.  This guide explains how to use a
 typical web browser such as Netscape Navigator to discover
 information about the data that will be useful when analyzing data
 in any  package.}

You need to sample the data

A better strategy is to find out some information about the data. OPeNDAP has sophisticated methods for subsampling data at a remote site, but you need some information about the data first. First, we'll try looking at the data's \new{Dataset Descriptor Structure} (DDS). This provides a description of the "shape" of the data, using a vaguely C-like syntax. You get a dataset's DDS by appending .dds<\code> to the URL. \figureplace{An OPeNDAP DDS (\lit{sst.mnmean.nc.dds})}{htb} {reynolds,dds}{reynolds-dds.ps}{reynolds-dds.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.dds} From the DDS shown, you can see that the dataset consists of five pieces:

Find out what's in the data

  • A 180-element vector called "lat",
  • A 360-element vector called "lon",
  • A 226-element vector called "time",
  • A "Grid" containing a three-dimensional array of integer values (\lit{Int16}) called \lit{sst}, and three "Map" vectors, which may look familiar, and
  • Another Grid called \lit{mask}. The \new{Grid} is a special OPeNDAP data type that includes a

multidimensional array, and \new{map vectors} that indicate the independent variable values. That is, you can use a Grid to store an array where the rows are not at regular intervals. \texorhtml{There's

 a simple grid in figure~\ref{grid,diagram}.}{Here's a simple grid:}

} \figureplace{A Grid}{h}{grid,diagram}{gridpts.ps}{gridpts.gif}{} The array part of the grid would contain the data points measured at each one of the squares, the X map vector would contain the positions of the columns, and the Y map vector would contain the positions of the rows. Of course you can also use a Grid to store arrays where the columns and rows are at regular intervals, and you'll often see OPeNDAP data that way. (The other special OPeNDAP data type worth worrying about is the Sequence . You'll see more about them in section~2. There are also \new{Structures} and \new{Lists}, but they exist largely for internal uses, and you don't often see these used in real datasets.) You can see from the DDS that the Reynolds data is in a 180x360x226 element grid, and the dimensions of the Grid are called "lat", "lon", and "time". This is suggestive, but not as helpful as one could wish. To find out more about what the data is , you can look at the other important OPeNDAP structure: the DAS, or \new{Data Attribute Structure}. This is somewhat similar to the DDS, but contains information about the data, such as units and the name of the variable. Part of the DAS for the Reynolds data we saw above is \texorhtml{shown in ~\ref{reynolds,das}.}{shown in the figure below.

 Click here
 or on the figure to see the rest of it.}

\figureplace{An OPeNDAP DAS (\lit{sst.mnmean.nc.das})}{h} {reynolds,das}{reynolds-das.ps}{reynolds-das.gif} {http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.das} \note{The DAS is populated at the data provider's discretion. Because

 of this, the quality of the data in it (the \new{metadata}) varies
 widely.  The data in the Reynolds dataset used in this example are
 COARDS compliant.  Other metadata standards you may encounter with
 OPeNDAP data are HDF-EOS, EPIC, FGDC, or no metadata at all.}

Find out more about the data variables

Now we can tell something more about the data. Apparently the \lit{lat} vector contains latitude, in degrees north, and the range is from 89.5 to -89.5. Since this is a global grid, the latitude values probably go in order. We can check this by asking for just the latitude vector, like \xlinkn{this} \begin{vcode}[.]{sib} http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?lat

What we've done here is to append a \new{constraint expression}

to the OPeNDAP URL, to indicate how to

constrain our request for data. Constraint expressions can take many forms. This guide will only describe a few of them. (You can refer to the \OPDuser\ for more complete information about constraint expressions.) Try requesting the \xlinkn{time} and longitude

The info service also provides the DAS and DDS information.

vectors to see how this works. } According to the DAS, time is kept in "days since 1-1-1 00:00:00" in this dataset. You can also learn from the DAS the actual time period recorded in the data which, because of your familiarity with the Julian calendar, you instantly recognize as beginning in November, 1981. You might also notice that the \lit{mask} array is used to indicate land and sea, and has only the values 0 and 1.

OPeNDAP provides an \new{info service} that returns all the information we've seen so far in a single request. The returned information is also formatted differently (some would say "nicer"), and you can occasionally find server-specific documentation here, as well. Some will find this the easiest way to read the attribute and structure information. You can see what information is available by appending .info<\code> to a URL, like this: \begin{vcode}[.]{sib} http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.info

Peeking at Data

Now that we know a little about the shape of the data, and the data attributes, let's look at some of the data.

Use subscripts to sample a Grid.

You can request a piece of an array with subscripts, just like in a C program or in Matlab or many other computer languages. Use a colon to indicate a subscript range.

\begin{vcode}

{sib}

...sst/mnmean.nc.asc?time[0:6]

This [0:6 URL] will produce \texorhtml{figure~\ref{reynolds,timevec}}{the following:} \figureplace{Part of a vector.}{h}{reynolds,timevec}{timevec.ps}{timevec.gif}{} You can do the [28:30[206:209] same] for one of the grids:

\begin{vcode}

{sib}

...sst/mnmean.nc.asc?mask[28:30][206:209]

Sampling a Grid produces part of the Grid, including the map vectors.

Which produces a portion of the land mask somewhere near Alaska's Kenai peninsula\texorhtml{, shown in figure~\ref{reynolds,mask}}{:} \figureplace{Part of an OPeNDAP Grid.}{h}{reynolds,mask}{mask.ps}{mask.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?mask[28:30][206:209]} Notice that when you ask for part of an OPeNDAP Grid, you get the array part along with the corresponding parts of the map vectors. If you are interested in the Reynolds dataset, you are probably more interested in the sea surface temperature data than the land mask. The temperature data is a three-dimensional grid. To sample the [12:13[28:30][206:209] \lit{sst}] Grid, you just add a dimension for time:

\begin{vcode}

{sib}

...sst/mnmean.nc.asc?sst[12:13][28:30][206:209]

This produces something like\texorhtml{ the figure shown in

 figure~\ref{reynolds,sst}}{this:}

\figureplace{Part of the Reynolds SST data}{h}{reynolds,sst}{sst.ps}{sst.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.asc?sst[12:13][28:30][206:209]} Note that the sst values are in celsius degrees multiplied by 100, as indicated by the \lit{scale_factor} attribute of the DAS. Further, it's important to remember with this dataset, that the data were obtained by calculating spatial and temporal means. Consequently, the data points in the \lit{sst} array should be ignored when the corresponding entry in the \lit{mask} array indicates they are over land. %

Sampling Grids by Value

Sequence Data

(2) Gridded data works well for satellite images, model data, and data compilations such as the Reynolds data we've just looked at. Other data, such as data measured at a specific site, is not so readily stored in that form. OPeNDAP provides a data type called a \new{Sequence} to store this kind of data.

A Sequence is a relational table.

A Sequence can be thought of as a relational data table, with each column representing a different data value, and each row representing a different data "instance." For example, an ocean \ind{temperature profile} can be stored as a Sequence of pressure and temperature pairs, and a weather station's data can be stored as a Sequence with time in one column, and each weather variable occupying another column. Let's look at a couple of Sequences. The first one is a collection of CTD data (hydrographic data, including temperature, pressure, salinity, and so on):

http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd

The DAS (append .das<\code> to the URL) for this data is pretty uninformative, telling us only that all the data are stored as strings\texorhtml{.

 You can see this in figure~\ref{rlctd,das}.}{:}

\figureplace{A DAS for Sequence data.}{h}{rlctd,das}{rlctd-das.ps} {rlctd-das.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.das}

You can sometimes find data attributes among the data.

On the other hand, a lot of the information we would get from the DAS is actually encoded in the data itself, which you can see by looking at the data's DDS (append .dds<\code> to the URL)\texorhtml{, shown in figure~\ref{rlctd,dds}.}{:} \figureplace{A DDS for Sequence data.}{h}{rlctd,dds}{rlctd-dds.ps} {rlctd-dds.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.dds} We can get some idea of the data coverage by asking for some of the time and location data, with a URL like this: \begin{vcode}[1]{sib} ...rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s

This produces a response shown \texorhtml{in

 figure~\ref{rlctd,coverage}.}{here.}

\figureplace{The \lit{rlctd} dates and locations}{h}{rlctd,coverage} {rlctd-cov.ps}{rlctd-cov.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s}

Use a selection clause to select Sequence rows.

After reviewing the data in the last request, perhaps we decide we only want to see data from one of the cruises listed, or maybe only data from the month of May. We can add a \new{selection clause} to the constraint expression to select only that data. For example: \begin{vcode}[2]{sib} ...rlctd.asc?cruiseid,station,year_s,month_s,day_s,lat_s,lon_s&month_s=5

This produces a table containing all the rows from \texorhtml{figure~\ref{rlctd,coverage}}{the last example} where the month datum is May. \texorhtml{Try entering the new URL in your browser

 and see what you get.}{Click \xlinkn{here}

to see that table.} Selection clauses can be stacked endlessly against a URL, allowing all the flexibility most people need to sample data files. Here's an example of a >50\&pres<100 URL that requests all the oxygen data in the file taken in May at a specific depth range: \begin{vcode}>50&pres<100 {sib} ...rlctd.asc?o2&month_s=5&pres>50&pres<100

The first clause in a constraint expression has a name, too. It is the \new{projection clause}. This is the list of variables that you wish to have returned, subject to the constraint of the selection clause. In the previous example, the projection clause consiste only of the \lit{o2} variable. In the one before that, the list was longer, containing 7 variables.\indc{constraint expression!projection

 clause} 

\tbd{There is a get_row() method for Sequences now, so that you can

 select a sequence row by its ordinal number.  When this makes it

into the server releases, document it.}

An Easier Way

The OPeNDAP query form is an easier way to sample data.

OPeNDAP also includes a way to sample data that makes writing a constraint expression somewhat easier. Append .html<\code> to the URL, and you get a form that directs you to add information to sample the data at a URL: \begin{vcode}[3]{sib} ...sst.mnmean.nc.html

Sending a URL ending in .html<\code> returns a form like this: \figureplace{The OPeNDAP Dataset Access Form}{h}{reynolds,ifh}{ifh.ps}{ifh.gif}{http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html} It's useful to have a browser window open with one of these query forms in it while you read this section. \texorhtml{}{Click

 \xlinkn{here}  {http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html}
 to bring up a copy of the form to use while you read.}

Near the top of the page, you'll see a box entitled "Data URL". At this point, if you've been following along, it should look pretty familiar. If you're just jumping in, it's the OPeNDAP URL connected to the data we're interested in, but unsampled. Moving down the page, there is a list of "Global Attributes", which is really just for your perusal. At this point, there's not much to be done with this, but it is often helpful information.

Select variables by clicking on a checkbox.

 The important part

of the page is the "Variables" section. For each variable in the dataset, you'll see the data description (e.g. ``Array of 32 bit Reals [lat = 0..179]), a checkbox, a text input box, and a list of the variable's attributes. If you click on the checkbox, you'll see the variable's array bounds appear in the text box, and you'll see that variable appear in a constraint expression appended to the Data URL at the top of the page. If you edit the array bounds in the text box, hitting "enter" will place your edits in the Data URL box. In the oh-so-unlikely event you dare try all this without your documentation \new{vade mecum} along, there's a \but{Show Help} button up near the top of the page. Clicking there will show you instructions about how to proceed. \note{You'll see a "stride" mentioned. This is another way to

 subsample an OPeNDAP array or Grid.  Asking for \lit{lat[0:4]} gets you the first
 five members of the \lit{lat} array.  Adding a stride value allows
 you to skip array values.  Asking for \lit{lat[0:2:10]} gets you
 every second array value between 0 and
 10: 0, 2, 4, 6, 8, 10.} 

Move on down the variable list, editing your request, and experiment with adding and changing variable requests. When you have a request you'd like to make, look at the buttons at the top of the page. \figureplace{Dataset Access Form Detail}{h}{reynolds,ifh-buttons} {ifh-buttons.ps}{ifh-buttons.gif}{} You can click on \but{Get ASCII}, and the data request will appear in a browser window, in comma-separated form. The \but{Get Binary} button will save a binary data file on your local disk. (The \but{Send to Program} will send the URL directly to an OPeNDAP client. However, it requires a suitable OPeNDAP client to be running on your computer, and also requires you to install a helper application for your browser. There are instructions for doing this at the OPeNDAP home page.)

The web interface works for Sequence data, too.

The OPeNDAP Data Access Form interface works for Sequence data as well as Grids. However, since Sequence constraint expressions look different than Grid expressions, the form looks slightly different, too. You can see \texorhtml{from figure~\ref{rlctd,ifh-seq}}{below} that the variable selection boxes allow you to enter relational expressions for each variable. Beside that, however, the function is exactly the same. \indc{html

 interface!Sequence data}

\figureplace{Dataset Access Form for Sequence Data (detail)}{h} {rlctd,ifh-seq}{ifh-seq.ps}{ifh-seq.gif}{http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.html} \texorhtml{}{Click \xlinkn{here} {http://dods.gso.uri.edu/cgi-bin/nph-jg/rlctd.html} to see a copy of

 a Sequence form.}

\note{Not all OPeNDAP servers support all the OPeNDAP functionality. There

 are a few non-standard OPeNDAP servers out there in the world that only
 support the bare minimum required.  That minimum is to respond to
 queries for the DDS, DAS, and (binary) data.  The ASCII data and the
 web access form are optional add-ons that are not required for the
 basic OPeNDAP function.}

Finding More OPeNDAP URLs

The OPeNDAP package was developed to improve ways to share data among scientists. Many times, data comes in the form of a URL enclosed in an email message. But there are several other ways to find data served by OPeNDAP servers.

GCMD

The GCMD now catalogs OPeNDAP URLs!

 The \xlink{Global Change
 Master Directory}{http://gcmd.gsfc.nasa.gov} is a source of a huge

amount of earth science data. They now catalog OPeNDAP URLs for the datasets that have them. You can search on "OPeNDAP" right from the main page to find many of these datasets. Try that search, then click on one of the data set names that returns, and look at the bottom of the resulting Set Description page, under the heading ``Related URL. If you make that search, check the list for the Reynolds data from chapter~1; it should be there.

OPeNDAP Dataset List

The \OPDhome\ has a list of available OPeNDAP datasets. Click on \xlink{\but{Datasets}} {http://www.unidata.ucar.edu/cgi-bin/dods/datasets/datasets.cgi?xmlfilename=datasets.xml}

The OPeNDAP project supports an ad hoc list of data URLs.

in the table of contents\texorhtml{}{ or right here}. You can find a URL and a brief description for several hundred different datasets from that list.

Web Interface

This is a little bit sneaky. Many sites that serve one OPeNDAP dataset serve several others as well. The OPeNDAP web interface (if it's enabled by the site) allows you to check the directory structure for other datasets. For example, let's look at the Reynolds data we saw in chapter~1:

The web interface allows browsing data directories.

\begin{vcode}[.]{sib} http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/sst.mnmean.nc.html

If we use the same URL, but without the file at the end, we can browse the directory of data: \begin{vcode}[.]{sib} http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/

The OPeNDAP server checks to see whether the URL is a directory, and if so, it generates a directory listing, like \texorhtml{in figure~\ref{reynolds,ifh-dir}.}{this:} \figureplace{Web Interface Index Listing}{h}{reynolds,ifh-dir} {ifh-dir.ps}{ifh-dir.gif} {http://www.cdc.noaa.gov/cgi-bin/nph-nc/Datasets/reynolds_sst/} You can see from the directory listing that the monthly mean dataset we've been looking at is accompanied by a weekly mean set, and a daily set. You can click on those datasets for more information about them, and proceed to examine and use them just as we've done with the other examples in chapter~1. \note{This list is produced by an OPeNDAP server. It only really understands OPeNDAP data files. If the directory you're looking at has other files in it, clicking on them will probably produce an error.}

File Servers

Some datasets you'll find are actually lists of other datasets. There are a few of these file servers in the OPeNDAP Dataset List on the \OPDhome . A file server is itself an OPeNDAP dataset, organized as a Sequence, containing URLs with some other identifying data (often time). You can request the entire dataset, or subsample it just like any other OPeNDAP dataset.

A file server is a list of other datasets, but it's a dataset, too.

There is a file server for GSO/URI's archive of AVHRR sea surface temperature data:

http://maewest.gso.uri.edu/cgi-bin/nph-ff/catalog/avhrr.catalog

Look at this server's DDS, and the web interface, and then try asking for some data like this: \begin{vcode}[4]{sib}

 .../catalog/avhrr.catalog.asc?DODS_URL&year=2000&month=1

This produces a list of all the data URLs corresponding to measurements taken in the month of January, 2000.

Matlab GUI

The OPeNDAP Matlab GUI browser contains its own frequently updated list of available datasets. Using that software, you can select datasets with

The Matlab GUI has its own list of available data.

a mouse from a large selection of the available URLs. For more information, please refer to the \OPDmgui\ manual.

Further analysis

This guide is about forming an OPeNDAP URL. After you have figured out how to request the data, there are a variety of things you can do with it. (OPeNDAP software mentioned here is available from the \OPDhome .) \tbd{Add links to the following list.}

  • Use a generic web client like \lit{geturl} (a standard part of the OPeNDAP package), the free programs \lit{wget} or \lit{lynx}, or even a browser like \lit{Netscape Navigator} or \lit{Internet Explorer} to download data into a local data file. To be able to use the data further, you will probably have to download the ASCII version by using the .asc<\code> suffix on the URL, as in the examples shown.

Use a generic web client or an OPeNDAP client to get the data you've chosen.

  • There are pre-packaged OPeNDAP clients available that can download binary OPeNDAP data from the web into a useful form. As of \today , command line clients (\lit{loaddods}) are available for the Matlab and IDL data analysis environments, with which you can download OPeNDAP data directly into IDL or Matlab objects. \indc{loaddods!Matlab or IDL client}
  • The Ferret and GrADS free data analysis packages both support OPeNDAP. You can use these for downloading OPeNDAP data, and for examining it afterwards. (There are limitations. As of \today , Ferret can not read datasets served as Sequence data.)
  • The Matlab analysis package also supports an OPeNDAP client attached to a graphical user interface. You can use the GUI to create a constrained OPeNDAP URL, and download the data directly into Matlab. The \OPDmgui\ contains more information about the Matlab GUI client.
  • If you have a data analysis program or package that you like, you can look into the possibility of linking that package to the OPeNDAP toolkit library, in effect making your program into a web-capable OPeNDAP client. \indc{DODS!linking to your software} DODS!libraries exist to mimic the behavior of the \netcdf and \jgofs\ data access APIs. If your program already uses one of these APIs, getting it to run with OPeNDAP may be as simple as changing the libraries to which you link it. The \OPDuser\ describes how to do this, and the \OPDapi\ describes how you can use the OPeNDAP toolkit directly to create a new application that doesn't use one of the established data access APIs. The use of these clients, like the ways in which you can analyze the

data you find, is beyond the scope of this (or any) book. Enjoy. \printindex