blob: e434cc9cf5fd9f207a5fb5a0763d7d8de0bf446b [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<head>
<TITLE>Be Careful with file URLs</TITLE>
<meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
</head>
<body>
<TABLE WIDTH="100%" BORDER="0" CELLSPACING="0" CELLPADDING="4">
<TR>
<TD BGCOLOR="#666699">
<H1 ALIGN="CENTER" STYLE="margin-top: 0in; text-decoration: none"><A HREF="http://www.openoffice.org"><IMG SRC="../images/open_office_org_logo.gif" ALT="OpenOffice.org" ALIGN="RIGHT" BORDER="0"></A><FONT COLOR="White">Be Careful with file URLs</FONT></H1>
</TD>
</TR>
</TABLE>
<HR NOSHADE SIZE="3">
<H2>Different Ways to Name Files</H2>
<P>There are (at least) five ways to name files:</P>
<OL>
<LI>
<P>The platform-specific notation, called <B>pathnames</B> here
(e.g., <CODE>/abc/def/ghi.txt</CODE> on Unix,
<CODE>a:\bcd\efg\hij.txt</CODE> on DOS and Windows, and
<CODE>abc:def:ghi.txt</CODE> on Macintosh).</P>
</LI>
<LI>
<P>A UNC-like notation, called <B>UNC names</B> here (e.g.,
<CODE>//./abc/def/ghi.txt</CODE> or
<CODE>//./a:/bcd/efg/hij.txt</CODE>). The osl layer used to make
heavy use of these as a platform-independent notation, but since osl
has shifted to file URLs as the platform-independent notation (see
below), UNC names have been deprecated and became pretty much useless
(and are only mentioned here for completeness).</P>
</LI>
<LI>
<P>The file URLs used by the osl layer as a platform-independent
notation, called <B>osl URLs</B> here (e.g.,
<CODE>file:///abc/def/ghi.txt</CODE> or
<CODE>file:///a:/bcd/efg/hij.txt</CODE>). Read on to learn why it is
important to explicitly label these file URLs as <EM>osl</EM>
URLs.</P>
</LI>
<LI>
<P>The file URLs used by the File Content Provider (FCP) within the
Universal Content Broker (UCB), called <B>FCP URLs</B> (e.g.,
<CODE>file:///home/usr123/work/abc.txt</CODE> or
<CODE>file:///user/work/abc.txt</CODE>). Normally, osl URLs and FCP
URLs are the same (after all, the FCP uses osl to access the files).
But the FCP has a feature called <EM>mount points</EM> that allows it
to restrict access to only certain files (those that lie below a given
set of mount points in the file system hierarchy), and to give names
to these files that hide their real locations.<P>
<P>For example, if you have a mount point named <code>user</code> at
the osl URL <code>file:///home/usr123</code>, the osl URL
<code>file:///home/usr123/work/abc.txt</code> corresponds to the FCP
URL <code>file:///user/work/abc.txt</code>. If you only have that
single mount point, the osl URL
<code>file:///home/usr567/work/def.txt</code> has no corresponding FCP
URL (and cannot be accessed via the FCP).</P>
</LI>
<LI>
<P>The URLs used by the UCB, called <B>UCB URLs</B> (e.g.,
<CODE>file:///a:/bcd/efg/hij.txt</CODE> or
<CODE>vnd.sun.star.wfs:///user/work/abc.txt</CODE>). Normally, FCP
URLs and UCB URLs are the same, because the UCB hands file URLs
directly to the FCP. But there is a special content provider, the
Remote Access Content Provider (RAP), that allows to rewrite URLs
before passing them on to other content providers. This is used, for
example, in the Sun ONE Webtop (S1W), where there are typically two
file systems: a client file system accessed via normal (FCP) file URLs
(i.e., there is no rewriting RAP between the UCB and the client FCP),
and a server file system accessed via (FCP) URLs where the
<CODE>file</CODE> scheme has been replaced with
<CODE>vnd.sun.star.wfs</CODE> (i.e., there is a rewriting RAP between
the UCB and the server FCP).</P>
</LI>
</OL>
<P>The last two notations (FCP URLs and UCB URLs) are relatively unknown,
because in a plain OpenOffice installation neither mount points nor the RAP
are used, so that osl URLs, FCP URLs and UCB URLs are all identical. But when
you want to write correct code that also works in unusual deployments (or in
the S1W, which should be regarded not too unusual), you have to be well aware
of these different notations all labeled as "URLs."</P>
<H2>Where Different Notations are Used</H2>
<P>As mentioned before, use of UNC names is deprecated. Also, since most code
accesses the FCP not directly, but via the UCB, FCP URLs are only of interest
to hard core UCB users (who should know what they are doing, anyway). So, in
the following we can concentrate on three different notations: pathnames, osl
URLs, and UCB URLs.</P>
<H3>Where Pathnames are Used</H3>
<P>Pathnames are used in only a few places, because the default notation used
by osl (the lowest level of concern to us) already are osl URLs (which are a
level above pathnames). It can be argued that interfaces that use pathnames
should use osl URLs instead, and that pathnames are only of interest when
communicating with the external world (other processes, or the human
user).</P>
<P>One place where pathnames are used is class <code>utl::TempFile</code>.</P>
<H3>Where osl URLs are Used</H3>
<P>The osl file system functions (in <code>osl/file.h</code> and
<code>osl/file.hxx</code>) now generally use osl URLs in their interfaces.</P>
<P>There should be few places above osl where osl URLs instead of UCB URLs are
used (because generally all file access should be done through the UCB, and
not directly via osl). One notable exception is the handling of temporary
files (see above).</P>
<H3>Where UCB URLs are Used</H3>
<P>Generally, all interfaces that are designed to communicate resource names
within the OpenOffice framework should use UCB URLs, and all implementations
that access resources by these names should do so via the UCB. Another
advantage of this is that without any extra effort not only file resources can
be accessed, but also other resources like HTTP and FTP (by using appropriate
URLs, but these URLs can be opaque to the code, only interpreted by the
UCB).</P>
<H2>Converting between Different Notations</H2>
<P>Sometimes it may be necessary to convert between different notations, and
the routines to do so are well available:</P>
<UL>
<LI>
<P>The methods <code>osl::FileBase::getFileURLFromSystemPath()</code>
and <code>osl::FileBase::getSystemPathFromFileURL()</code> (and their
plain C counterparts in <code>osl/file.h</code>) convert between
pathnames (called "system paths" here) and osl URLs.</P>
</LI>
<LI>
<P>The methods
<code>utl::LocalFileHelper::ConvertSystemPathToURL()</code> and
<code>utl::LocalFileHelper::ConvertURLToSystemPath()</code> convert
between pathnames (again called "system paths" here) and UCB URLs.</P>
<P>Because there can be scenarios where you have multiple FCPs on
different file systems, it can be ambiguous how to convert from a
pathname (that does not contain any information identifying a specific
file system) to a UCB URL. Therefore,
<code>ConvertSystemPathToURL()</code> requires an additional parameter
<code>BaseURL</code> that identifies the FCP to be used.</P>
</LI>
<LI>
<P>There are convenience methods
<code>utl::LocalFileHelper::ConvertPhysicalNameToURL()</code> and
<code>utl::LocalFileHelper::ConvertURLToPhysicalName()</code> that
choose the <EM>local</EM> FCP as <code>BaseURL</code> and then forward
to the above <code>LocalFileHelper</code> methods.</P>
<P>For this to work, the UCB maintains a notion of <EM>locality</EM>
of content providers. This is an heuristic algorithm based on how the
UCB accesses individual content providers (within the same process,
via a pipe on the same machine, via a socket over a network). The net
effect is that the UCB should always choose as most local the FCP
running on the same machine as the UCB, and using these
<code>LocalFileHelper</code> methods will then always convert between
UCB URLs and pathnames that are valid on this machine.</P>
<P><code>ConvertURLToPhysicalName()</code> also makes sure to do the
conversion only if the given UCB URL corresponds to a local pathname
(and not to a pathname on a non-local file system).</P>
</LI>
</UL>
<P>There is no direct way to convert between osl URLs and UCB URLs. To
convert from an osl URL to a UCB URL, use
<code>osl::FileBase::getSystemPathFromFileURL()</code> followed by
<code>utl::LocalFileHelper::ConvertPhysicalNameToURL()</code>. To convert
from a UCB URL to an osl URL, use
<code>utl::LocalFileHelper::ConvertURLToPhysicalName()</code> followed by
<code>osl::FileBase::getFileURLFromSystemPath</code>. But be aware that this
only works if the osl URL and the UCB URL shall denote files within the same
file system.</P>
<TABLE WIDTH="100%" BORDER="0" CELLSPACING="0" CELLPADDING="4">
<TR>
<TD BGCOLOR="#666699">
<P><FONT COLOR="White">Author: <A HREF="mailto:stephan.bergmann@sun.com"><FONT COLOR="White">Stephan Bergmann</FONT></A> (Last modification $Date: 2003/12/06 22:37:31 $). Copyright 2001 <A HREF="http://www.openoffice.org"><FONT COLOR="White">OpenOffice.org</FONT></A> Foundation. All Rights Reserved.</FONT></P>
</TD>
</TR>
</TABLE>
</body>
</HTML>