Acessibilidade / Reportar erro

Composite Nodes, Contextual Links and Graphical Structural Views on the WWW

Abstract

Recently, several open hypermedia systems (OHS) have proposed solutions for integrating with the World-Wide Web (WWW). The goal is to overcome WWW limitations using more powerful and sophisticated hypermedia data models while exploring the Web large distribution and standards, that permit interoperability and easy-of-use. This paper describes the integration of the HyperProp OHS with the WWW in a platform independent solution based on the Netscape browser. The integration allows: separation of links and presentation information from data content; definition of compositions (in order to structure documents, besides allowing the definition of contextual links and reuse of hypermedia structures); visualization of document structure; user navigation through graphical views; and navigation through guided tours and history trails. Particularly, the graphical views use filtering mechanisms based on extended fisheye view techniques, that avoid user disorientation when navigating through large documents. The integrated system also offers a web-site analysis tool. Based on the file system structure of web servers, the tool automatically generates compositions of HTML documents that, through the use of graphical views, help user navigation and information search on web-sites.

open hypermedia system; nested context model; structural view; web-site analysis


Composite Nodes,

Contextual Links and Graphical Structural Views on the WWW

Rogério Ferreira Rodrigues
Depto. de Informática, PUC-Rio
Rua Marquês de São Vicente 225
22453-900 – Rio de Janeiro, Brasil
rogerio@telemidia.puc-rio.br Débora C. Muchaluat-Saade
Depto. de Informática, PUC-Rio
Rua Marquês de São Vicente 225
22453-900 – Rio de Janeiro, Brasil
debora@telemidia.puc-rio.br Luiz Fernando Gomes Soares
Depto. de Informática, PUC-Rio
Rua Marquês de São Vicente 225
22453-900 – Rio de Janeiro, Brasil
lfgs@inf.puc-rio.br

Abstract Recently, several open hypermedia systems (OHS) have proposed solutions for integrating with the World-Wide Web (WWW). The goal is to overcome WWW limitations using more powerful and sophisticated hypermedia data models while exploring the Web large distribution and standards, that permit interoperability and easy-of-use. This paper describes the integration of the HyperProp OHS with the WWW in a platform independent solution based on the Netscape browser. The integration allows: separation of links and presentation information from data content; definition of compositions (in order to structure documents, besides allowing the definition of contextual links and reuse of hypermedia structures); visualization of document structure; user navigation through graphical views; and navigation through guided tours and history trails. Particularly, the graphical views use filtering mechanisms based on extended fisheye view techniques, that avoid user disorientation when navigating through large documents. The integrated system also offers a web-site analysis tool. Based on the file system structure of web servers, the tool automatically generates compositions of HTML documents that, through the use of graphical views, help user navigation and information search on web-sites.

Keywords: open hypermedia system, nested context model, structural view, web-site analysis

1. Introduction

An Open Hypermedia System (OHS) must be extensible, allowing the inclusion of new features, besides offering hypermedia services to client applications through an open and distributed architecture. Most recently, research emphasis on OHSs has started looking for solutions for integration with other applications. With this purpose, a group has been formed to standardize general hypermedia systems work [18]. Following this approach, several systems [1, 2, 4, 8, 9] have presented proposals for interoperating with the World-Wide Web (WWW or simply Web) [3]. Undoubtedly, the main reasons for the WWW popularity are its large distribution and its simplicity and easy-of-use. However, the WWW has some limitations as a hypermedia system, such as:

  • its data model defines links embedded in nodes (HTML pages), resulting in some shortcomings:

it does not allow separation between nodes and links, which makes link and data maintenance difficult, and data reuse without inheriting relations (links) impossible;

it does not permit creating references in read-only pages;

it requires a special content format (e.g., HTML, VRML);

links can just be followed in one direction, preventing one to know which links reference a certain document.

  • one can only define unidirectional point-to-point

    go to relations (1:1 links) and there is no support for defining temporal and spatial synchronization relations;

  • standard characteristics of hypermedia systems such as guided tours and structural views of documents are not offered;

  • there is no support for version control and cooperative work.

The integration of OHSs with the WWW is motivated by trying to solve these limitations, using more sophisticated and powerful hypermedia models, while exploring the Web large distribution and standards.

Today, some efforts have been done to support spatio-temporal synchronization on the Web, exemplified by the SMIL recommendation [22]. Also, efforts have been done to provide version control and cooperative work, exemplified by the WEBDAV group proposal [10]. However, the approach in this paper is how to put all these facilities together in the Web of today, without changing the current environment.

As mentioned before, with similar goals as ours, there are many proposals for integrating OHSs with the WWW. Among them we can highlight Chimera [1], Hyper-G [2], Microcosm [4, 9] and DHM [8]. They can be classified in: compile time integration [9] and runtime integration [1, 2, 4, 8]. In compile time integration, as the name suggests, a total translation of documents from one hypermedia model to the other is made before presentation, while in runtime integration, this conversion is done step by step, during user navigation. Compile time integration almost always limits potentialities of OHSs. It requires the development of translation tools, but does not require changes in the system architecture. On the other hand, runtime integration requires a new architecture that puts together basic elements from both systems architectures (servers, clients, protocols, data formats, etc.). Some possibilities for combining these basic elements are discussed in [1].

The definition of hypermedia documents is based on usual concepts of nodes and links. All previously mentioned proposals separate links from content of nodes. Some models offer compositions to logically group nodes, that can be nested in any depth (Chimera, Hyper-G and DHM). However, links are stored in an independent repository in these models, and sometimes this repository is unique (Hyper-G and DHM).

The HyperProp system is based on an object oriented model called NCM ¾ Nested Context Model [21]. Different from mentioned related work, NCM compositions contain not only nodes (that can be nested compositions) but also links, allowing a node to be contained in different nested contexts1 1 In this paper, we will consider contexts and compositions as synonymous. in the same document, and allowing the definition of context dependent links for a node.

This paper discusses the integration of the HyperProp OHS with the Web, where a runtime integration in a platform independent solution is presented. Several ideas are based on DHM related experiences [8]. The main integration purposes are:

(i) to incorporate NCM facilities to WWW documents, allowing the definition of compositions, the navigation in structural-view browsers, the definition of guided tours, the creation of bi-directional m:n links, the definition of spatio-temporal synchronization relations, and the support for version control and cooperative work;

(ii) to use WWW browsers to present NCM documents;

(iii) to allow creating general or contextual links touching WWW nodes independent of having write access to them;

(iv) to allow NCM documents to reference WWW documents and vice-versa.

Although not all NCM facilities are yet incorporated in the first integration prototype, as will be discussed throughout this paper, the current solution already shows how it could be done.

This paper is organized as follows. Section 2 briefly describes the NCM conceptual model, presenting the adaptations made to integrate it with the Web. Section 3 shows the integrated system, discussing its implementation details. Section 4 presents the graphical visualization tools to navigate and edit the document structure. Section 5 describes the web-site analysis tool used to generate composite structures from WWW sites. Section 6 discusses related work. Finally, Section 7 is dedicated to conclusions and future work.

2 HyperProp Nested Context Model

In NCM, a node has as main attributes a content, a list of anchors and a descriptor. Each anchor defines a region in the node content. Every node has a default anchor, which represents the whole content of the node. The descriptor contains information specifying how the node should be presented, in time and space.

NCM distinguishes two basic classes of nodes, called content node and composite node. Intuitively, a content node contains data whose internal structure is application dependent, modeling traditional hypermedia nodes. Content nodes may be specialized in other classes (text, graphic, audio, video, etc.) as required by applications. A specialization of the text node class, called HTML node, was defined. This class has as its content attribute a URL that specifies a file containing an HTML page. Each anchor of its list of anchors has two attributes: name and position. Name corresponds to the text defined by the anchor and position indicates the number of characters from the beginning of the HTML file to the region (text) defined by the anchor. The attributes defined in the node descriptor are used to dynamically build a cascade style sheet [11] to present the node content.

The definition of an HTML node anchor might seem a problem for anchor maintenance region, depending on the number of characters. Indeed, changes in the node content may lead to inconsistent anchors. However, based on the name attribute, the system automatically checks these inconsistencies and disables links touching these anchors. The system authoring tool, described in Sections 3 and 4, also offers mechanisms for verifying and automatically correcting anchor inconsistencies.

The HTML node class will allow building WWW documents using NCM features, through the mapping of an HTML page to an NCM HTML node2 2 In order to be more precise, an HTML node is an NCM type of content node, as defined previously. On the other hand, we call a WWW node an HTML page of the WWW system. . Figure 1 presents an example of an HTML node (all its attributes) and Figure 2 presents the content of the file identified by the URL. It is worth to note that in the content of an HTML node there may be embedded anchors and links (WWW anchors and links). However, NCM anchors and NCM links are separated from the content.

Figure 1:
Example of attributes of an HTML node and also its descriptor attribute.
Figure 2:
HTML content identified by the URL (content attribute of the HTML node). For simplicity, there is no embedded WWW anchor and link in the content.

In NCM, a composite node C has, as its content, a set S of links and nodes, which may be content or composite nodes, recursively. We say that an entity E in S is a component of C and that E is contained in C. We also say that A is recursively contained in C if and only if A is contained in C or A is contained in a node recursively contained in C. An important restriction is made: a node cannot be recursively contained in itself.

Compositions offer modularity, encapsulation and abstraction mechanisms, permitting the structured definition of documents. Figure 3 shows an example of a book (composition B1) composed by chapters C1, C2 and C3. Composition C1 (chapter C1) contains sections S1.2 and S1.1 and a media object O1, etc. Reference and synchronization relationships among nodes are represented by links. For instance, a reference relationship between the media object O1, component of chapter C1, and chapter C3 is represented by link l1.

Figure 3:
Book B1 given to an advanced reader.

Note that when links are defined in composite nodes we can have a structured reuse of relations implicitly given by the composition as well as of relations explicitly defined inside the composition. Indeed, relation inheritance in composition nesting is very important in order to allow a more general reuse of the structure of a document. As an example, suppose the composition representing book chapter C1 of Figure 3. For a book given to a reader (composition B2), it could be desirable to introduce a relationship between section S1.2 of C1 and section S2.1 of C2, to give him/her a hint of related matters. For another advanced reader, the book (composition B1) should be delivered without that relationship, as in Figure 3. Note that B2 could be defined as a composition containing B1 and the introduced relationship, reusing all the structure of B1, as shown in Figure 4.

Figure 4:
Book B2 given to a beginner.

As a node can be recursively contained in different compositions (for example, node O1 in compositions C1 and C2 in Figure 4) and as composite nodes can be nested to any depth, it is necessary to introduce the concept of perspective. Intuitively, the perspective of a node identifies through which sequence of nested composite nodes a given node instantiation is being observed. Formally, a perspective of a node N is a sequence P=(Nm,...,N1), with m ³ 1, such that N1=N, Nmis a node not contained in any composite node, Ni+1 is a composite node and Ni is contained in Ni+1, for i Î [1,m). In the example of Figure 4, content node O1 has two different perspectives: P1=(B2,B1,C1,O1) and P2=(B2,B1,C2,O1).

An NCM link is an m:n relation composed by m source end points, n target end points and a meeting point. Since a node may pertain to more than one composite node, the source and target end points of a link are identified by pairs of the form <(Nk,...,N1),E>, such that: N1 is a node, called anchor node of the link, Ni+1 is a composite node and Ni is contained in Ni+1, for all i Î [1,k), with k>0, and Nk is a component of the composite node, that contains the link, or the composite node itself. E is an event that identifies an anchor of N1. An event is defined by the presentation of an anchor (presentation event), by its selection (selection event) or by the changing of a node attribute. The meeting point defines conditions and actions. Conditions must be satisfied at the source end points (for example, the selection of an anchor by a user click) in order that actions be applied at the target end points (for example, the presentation of the whole content of a node on the screen).

Besides different compositions containing the same node, NCM links can be defined in any composite node of a node perspective. Thus, links that anchor on a node are dependent on the context (nesting of compositions) a node is within. In order to define these contextual links we have: given a node N1 in a perspective P=(Nm,...,N1), with m>0, we say that a link l is a contextual link in P if and only if there is a composite node Ni in P containing l, for i Î [1,m], such that:

i) if i>1 then (Ni-1,...,N1) is the node list of one of the end points of l;

ii) else, (N1) is the node list of one of the end points of l.

As mentioned before, in NCM, attributes that specify how a node should be presented are stored in a separated object called descriptor. Every time a node will be presented, it must be associated to a descriptor explicitly defined on-the-fly by the end user or by descriptors defined during the authoring phase. During authoring, descriptors can be defined as an attribute of a link used to reach the node, or as an attribute of the composite node that contains the node to be presented, or as an attribute of the node itself (descriptor attribute in Figure 1). In NCM, a descriptor explicitly defined on-the-fly by an end user bypasses descriptors defined during the authoring phase. These in turn have the following precedence order: first, that defined in the link used to reach the node; second, that defined in the composite node that contains the node, if it is the case; and finally, that defined within the node. If no descriptor is defined, a default descriptor for the node class is used.

Two important subclasses of the composite node class are: public hyperbase and private base. A public hyperbase HB is a composite node that contains content nodes and composite nodes, except private bases and public hyperbases. If a composite node C is in HB, all nodes recursively contained in C are also contained in HB. Intuitively, HB has all nodes stored in a public database, that can be distributed in one or more NCM servers, similar to WWW servers. A private base is a composite node that contains content and composite nodes, except public hyperbases. Intuitively, a private base contains all documents used during a user work session.

In NCM, it is not possible to define links that are always attached to a certain node, independent of its perspective. In order to bypass this problem and offer this functionality to the NCM/WWW integrated system, we defined a special composition named WWW context (Cw). By definition, all HTML nodes are contained in Cw (including all WWW nodes). In order to allow links defined in Cw anchoring on an HTML node H to be inherited by all perspectives of H, we define Cw as the first node to contain H in all of its perspectives. Thus, for all links defined in other compositions (different from Cw), all end points having H as an anchor node, are pairs of the type <(Nk,...,Cw,H),E>, maintaining all end point properties. It is important to mention that the inclusion of nodes in Cw is a model solution. In practical implementations, Cw, different from other NCM compositions, works as a link repository.

The definition of nodes and links in compositions will allow, besides separating links from node content, different visualizations for a same node, depending on the user navigation context. It will also allow data, as well as relation reuse, as was seen in Figures 3 and 4. Models such as Chimera, Hyper-G and DHM, define compositions containing only nodes, preventing them to obtain these benefits. On the other hand, Microcosm DLS system [4] offers contextual links, but it does not have any support for organizing nodes, having only a logical group of links, without composition nesting. More general, NCM permits exploring all these benefits. More details about NCM, including its formal specification, can be found in [19, 21].

3 Integrated System Architecture and Implementation

The integrated system adds an NCM client to a WWW client (Netscape browser in the current implementation), using an applet. Figure 5 shows the NCM/WWW system architecture.

Figure 5:
Architecture of the NCM/WWW integrated system.

The NCM server is a Java stand-alone program that connects to a database, using the JDBC protocol (step 1 in Figure 5). The database contains all NCM nodes stored in the public hyperbase, including HTML nodes, but without their data content files, which may be stored in any WWW server.

A user, willing to open a work session (private base), has to start the WWW browser and to request the start page (a known URL) stored in a WWW server to become an NCM/WWW client (step 2 in Figure 5). The requested page defines three HTML frames: one that is not visible on the screen, but that starts an applet to control the client private base (Controlling Applet in Figure 5); one to control guided tours and navigation on history trails (Trail Frame in Figure 5); and one to present the content of HTML nodes (Content Frame in Figure 5).

Due to applet security restrictions, the NCM server must run in the same machine as the WWW server which downloads the HTML page starting the NCM/WWW client. This restriction could be overcome by the use of Netscape signed applet solution [16]. In this case, once authorized by a Netscape client, an applet could connect to any site. This solution was not adopted in our implementation yet, since we believe that in a near future a Java standard proposal will be adopted by all browsers. Despite of that, the NCM database containing the public hyperbase can be distributed, and only a communication element needs to be in the same machine of the WWW server.

The first task performed by the controlling applet is to connect itself with the NCM server through a communication interface element (step 3 in Figure 5) using Java RMI (Remote Method Invocation). Afterwards, the NCM client starts a new window, called the private base browser shown in Figure 6.

Figure 6:
Private base browser and editor.

The private base, as already mentioned, represents the user work session, where users can edit and navigate through new documents (nodes and links) or through documents recovered from a previous work session. In the private base browser/editor users can define new compositions, insert/remove nodes in/from compositions, create/delete links in/from compositions, create and edit content nodes, create and edit anchors, create and edit descriptors, and create and edit trails to be used as guided tours.

When a user finishes editing a document, (s)he can make it available for other users saving it in the document repository, represented by the public hyperbase. During a work session, users can also import documents from the public hyperbase. In order to facilitate document search in the public repository, there is also a public hyperbase browser, similar to the private base browser shown in Figure 6, but without editing operations. Public hyperbase and private bases will give support for cooperative work. Indeed, this will also requires a well defined version control mechanism. In the current implementation, the operations of importing/exporting documents from/to the public hyperbase only create copies of these documents. However, as a future work, we intend to implement these operations using the already defined NCM version control mechanism [21].

The graphical browsers give users a global view of how nodes are hierarchically organized (tree view shown on the left side of Figure 6), and also a view of relationships among nodes (graph view shown on the right side of Figure 6. WWW contexts (Cw) are not shown in the browsers in order to avoid overloading the user.) In the tree view, only composite nodes are displayed (as folders), showing the nesting organization. Depending on the composite node selected in the tree view, its recursive content is displayed in the graph view. In the latter view, links appear as arrows, connecting related nodes. Expanded compositions appear as rectangles, collapsed compositions appear as folder icons and content nodes appear as icons, which depend on their types (text, image, etc.).

The user can navigate through compositions expanding or collapsing them in both views. Depending on the user focus, filtering algorithms, based on an extension of fisheye view strategies, automatically hide or show nodes, in order to improve user orientation. Another browser with similar characteristics is activated when users navigate to a WWW node, or even by user demand. This window shows the structural view of the web-site (web-site browser) containing the target WWW node. The graphical browser characteristics and the automatic web-site structure generation are explained in Sections 4 and 5, respectively.

In order to present a content node N contained in a private base, the user should select the node (double click), in a desired perspective. This activates a call to the NCM formatter (Figure 5), which is responsible for controlling document navigation. If desirable, at this moment, the user can define a descriptor to be used to present the selected node. If the user does not specify any descriptor, the precedence rules defined in Section 2 are followed.

The selection of a node to be presented can also be done navigating through NCM links or through WWW embedded links that have the node as its target anchor.

The formatter has three elements: compiler, executor and controller (in fact, a set of controllers) [19]. When a node N will be presented, within a perspective P, the executor determines the descriptor D to be used and instantiates a controller passing, as parameters, N, D and the list of visible anchors of node N (those that are conditions of contextual links in P – see Section 2), stored in a global structure maintained by the compiler.

The controller can only obtain the data content of a node through the NCM server (since an applet can only establish a connection with the WWW server that downloads it), although the data content of a node (URL) may be a file in any WWW server around the Web. Thus, using the communication interface (see Figure 5), the controller sends the URL to the NCM server. The server, in its turn, being a stand-alone Java program with no applet security restrictions, can get the content of the URL from the appropriate WWW server and send it to the controller. The fact that every node content should be delivered by the NCM server creates a bottleneck that could be minimized distributing NCM servers in several locations. Another way to solve this problem is to Another way to solve this problem is to use the previously mentioned Netscape solution for signed applets [16], which permits clients to directly connect to any WWW server. When the controller receives the data content, it checks the visible anchors that are not inconsistent and marks them. It also filters and modifies WWW links (HTML anchors of WWW nodes) eventually embedded in the content, in order to grant that any click will be signalized to the executor. From the descriptor, the controller dynamically builds a cascading style sheet to present the node. It is worth to mention that in NCM/WWW the computation of contextual links (visible anchors) and the building of pages is completely done in the client, different from other integration experiences, as will be discussed in Section 6.

At present, an applet cannot identify a user click in HTML pages (other applet restriction), so JavaScript was used to do that. When inserting NCM anchors (defined in HTML nodes) or dynamically filtering embedded WWW anchors, the controller inserts calls to a JavaScript routine. When an anchor is selected, the JavaScript routine, using LiveConnect facility, signals the event to the executor. Depending on whether the selection occurs on the NCM anchor or on the WWW embedded anchor, the navigation control will be slightly different.

In the case of WWW embedded anchors, the JavaScript routine passes to the executor the target URL, corresponding to a node contained in Cw (remember that Cw stores all WWW nodes by definition). A call is then made to the web-site analyzer routine, explained in Section 5, to build the web-site browser, if it is not built yet. In the case of NCM anchors, only the selected anchor is passed. It is up to the executor to evaluate NCM links where the anchor selection event is a source end point. For those links whose conditions are satisfied, target actions are executed. In both previous cases, the controller is then destroyed and the presentation process restarts, fetching the new node. In the current implementation, NCM links are always of go to type (1:1), making link evaluation trivial. However, synchronization relations will be allowed in the next prototype. NCM m:n links will be evaluated as it is currently done in the HyperProp formatter stand-alone version [19].

The dynamic building of pages also uses JavaScript facilities, once applets are not able to directly write (in HTML language) to pages presented in the browser. Calls to JavaScript routines, that interprets HTML tags automatically, must be done to present the data content of a node.

The facility for creating embedded links in WWW documents referencing NCM documents is not yet implemented, but it is foreseen in the system. The idea is to implement a cgi-script (or a servlet) that will be available at the WWW server (which runs in the same machine as the NCM server), and will receive, as parameter, the perspective of an NCM node to be presented. An embedded link will then be able to reference NCM nodes. When called, the script will automatically start the NCM/WWW formatter in the client, loading the NCM document.

The controlling applet cannot intercept all kinds of user interaction for navigation in Netscape (another Java restriction). If a user types a URL directly in the browser or if (s)he navigates through a bookmark or through Netscape trails (back and forward buttons), the browser stops the controlling applet and shows the requested page. It would be easy to bypass this problem if Netscape had a feature similar to the Internet Explorer’s BeforeNavigate event. However, as [8] affirms, this limitation could be desirable for giving users the ability to interrupt the navigation through the OHS/WWW system.

As for trail navigation, our temporary adopted solution offers another trail navigation interface through the trail frame presented in Figure 5, independent from that of the Web browser. A trail allows users go to previous, next, first and last nodes of the navigation history, or of a previously created guided tour. However, it is not elegant to offer two distinct user interfaces for trail navigation, and we intend to improve this solution.

4 Fisheye-View Graphical Browsers

The public hyperbase and the private base browsers give users a global view of how nodes are structurally organized and also a view of relationships among nodes, offering filtering mechanisms necessary for displaying complex nested composite structures and to avoid user disorientation.

Usually, large hypermedia documents have a very complex structure, requiring sophisticated algorithms to build the structural view. There are several techniques to display structures of graphs [20]:

  • Presenting all nodes and links of the graph allows users to have a global view of the structure. The drawback is that maps become confusing and do not help user navigation when the number of nodes and links increases.

  • Using scroll and zoom to view portions of the graph makes readable maps, but loses the overall structure.

  • Using two or more views, one with a global view and another with a small zoomed portion of the graph, has the advantage of displaying local details and the overall structure of the information space, but forces the user to mentally integrate the views.

  • Using filtering mechanisms, such as fisheye views, to build a global view, only displays what is more interesting to the user at a certain moment, maintaining map legibility. The difficulty is to find out

    what is more interesting to the user at that moment.

The last option is the most desirable solution, but the most complex. Filters to hide information, and possibly landmarks (special nodes that always appear), will generally be needed. Maintaining the legibility of maps is our main purpose. The user will only visualize what is interesting to him/her, depending on his/her position (focus) in the hyperdocument structure and what is important regarding the overall structure of the hypermedia document. In order to build those filters, HyperProp uses an extension of the fisheye-view strategy for conceptual models that offer composite nodes [13]. The main motivation of the fisheye strategy is to balance local detail and global context with respect to a node in focus. Local detail is necessary to give information about the navigation possibilities, depending on the focus in the document structure. Global context is important to give information about the focus position within the overall structure.

Another important feature of the fisheye-view strategy is that it allows users to choose the amount of information shown, tailoring the map according to their interests. Thus, users can choose to see more or less detail in the structural view. Another desirable characteristic is to be at two focuses at once in the view. This can be needed for example because some small structure is being copied from one place to somewhere else, or because a relation is being authored between two parts of the document.

The fisheye strategy proposes a degree of interest (DOI) function that gives to each node a value indicating its interest to the user. This function is decomposed in two components. The first one is called the a priori importance and describes the intrinsic importance of a node considering the overall structure of the document. The second one determines the distance between a node and the focus. The essence of fisheye views is that the DOI function increases with the a priori importance and decreases with the distance. Only those nodes whose DOI value is greater than or equal to a threshold K will be displayed. This threshold can be configured by users, allowing them to tailor the map according to their interests. Using this filtering mechanism, we try to maintain map legibility by showing or hiding nodes, unless users want to see all nodes and links.

Since a node may be contained in different composite nodes, it can be displayed more than once in the same map. On the other hand, if an occurrence of a node is shown, it does not mean that the others will also be displayed. The DOI function have to be separately computed for each perspective of a node.

It is important to mention that the used fisheye technique satisfies some properties that improve user orientation when navigating through a hypermedia document, which are:

  • if an occurrence of node

    N1 is displayed in the map, all compositions in the perspective to reach this occurrence of

    N1 are also displayed, allowing us not to loose the overall structure. Since in NCM the same node can be contained in different compositions, it is important to inform the user the perspective to access this node;

  • for each link anchoring on the focus node

    X, the other end point is always displayed, informing the user all navigation possibilities from the focus node. In the perspective of node

    Y (link target node), the most external compositions have priority of presentation over the most internal ones;

  • if a composite node is the focus, its components are always displayed in the view;

  • nodes that do not have links to the node in focus, and that are components of compositions in the focused perspective, will have descending priority of presentation from the most internal to the most external composite nodes. For example, suppose that node

    X is the focus by perspective

    P=(

    C1,C2, ,Ck,X), and that composition

    Ci contains node

    Ni, for

    [1,

    k]. Nodes

    Nk,

    Nk-1, ,

    N1 will have descending presentation priority.

The graphical browsers take into account the fact that, with composition data models, it is possible to build views that explore the feature of opening or closing composite nodes, showing or hiding their components, depending on the user’s current interest. When a composition is closed, all its components are hidden from the view, limiting the amount of information displayed and enhancing map legibility. As the fisheye technique selects visible nodes depending on their DOI value, as the user navigates through the browser, some compositions open and some close, avoiding to pollute the map with lots of information.

If the user is not interested in seeing the components of a composition, he can simply close it, hiding its components from the view. When this composition is on focus later during navigation, its components will be displayed again.

In order to illustrate our fisheye views techniques, we present some examples of navigation in the private base browser. Figure 7 shows the complete structure of a document that contains information about projects, research groups and partners of the TeleMídia Research Laboratory at PUC-Rio. Figures 8, 9, 10, 11 and 12 show fisheye views of the same document.

Figure 7:
Complete structure of a document about the TeleMídia Lab.
Figure 8:
Initial view.

Figure 9: Focus on composition Projects.


Figure 10: Focus on composition HyperProp (component of Projects).


Figure 11: Focus on composition RAVel (also component of Projects).


Figure 12: Composition RAVel is still the focus, but the view shows more details.

In this example, there are five major compositions in the document: History, Introduction, Partners, Projects and Staff. Figure 8 shows the initial view, where only the first document components are shown. From the view, the user can navigate through the document structure and know more information about parts of the document focusing on desired nodes. If the user focuses on composition Projects, the view displayed is presented in Figure 9. If the user then focuses on composition HyperProp, component of node Projects, the view displayed is presented in Figure 10. If the focus changes to node RAVel, also component of composition Projects, the new view is illustrated by Figure 11. Note how composition HyperProp is closed (its components are hidden from the view), and how composition RAVel is opened in Figure 11. Maintaining the focus on node RAVel, but increasing the amount of details (decreasing the value of threshold K), we have Figure 12.

The original fisheye strategy proposed by Furnas [7] is applied to structures where the DOI function can be easily defined, for example, lists and trees. Note that in our case, although we have a tree composition hierarchy, because of links, we have a graph structure. Other authors have proposed some extensions to the strategy trying to solve the problem for generic graphs [23]. The extension of the fisheye strategy developed in the HyperProp system applies to structures permitting nested compositions and the usual relation modeled by links.

There are other works [6, 17] that extend the fisheye model for nested compositions. The basic difference between these works and our proposal [13] is in the definition of the distance component of the DOI function. In [6, 17], only depth navigation (navigation through compositions) is used to compute distance between nodes. In [13], depth navigation and hyperlinks navigation are considered. For more details about the computation of the DOI function, the reader should refer to [13].

5 Web-site Analysis Tool

The analysis tool implemented in the NCM/WWW integrated system generates a nested structure of NCM composite nodes and links from the hierarchical file structure of a WWW site3 3 We consider a site as a directory identified by a URL and all its corresponding sub-tree. For example, when a user navigates to the http://imperatriz.telemidia.puc-rio.br/telemidia/intro.htm page, the tool investigates directory telemidia and all possible sub-directories. . The analysis result appears in a separated graphical browser with the same filtering mechanisms provided by the private base and public hyperbase browsers. This web-site browser is activated by user demand or automatically when a user follows a WWW embedded link, allowing navigation and information search on the Web.

In the structural analysis of a WWW site, a breadth first search is done in its directory tree. Each directory is mapped to a composite node and data files are mapped to specific content nodes, depending on their media type. For example, html and htm files are mapped to HTML nodes, gif and jpg files are mapped to image nodes, etc. Directory components are included as corresponding composition components.

There are two types of mapping and users should configure which one is going to be done. The first type is the simplest one and considers only directories and HTML text files. As a consequence, the structural view does not show nodes (files) with different formats (for example, gif figures are not considered). This mapping generates a simplified view of the site, giving users information about HTML file relationships, but loosing information about part of the site structure. The second type considers all files and directories and creates a complete composite data structure. This mapping is suggested when authors define structured web-sites following the rules proposed in [12]. Unfortunately, the techniques proposed are not able to analyze dynamic sites. This is addressed as a future work.

Independent of the type of analysis, WWW embedded links are identified by HREF tags. If the target of a link is part of the site, a corresponding NCM link is created. As in NCM, different from HTML, links are components of composite nodes containing their end points (source and target nodes) directly or recursively. We have to define which composite node will contain each link in the mapping. For example, in Figure 13, link l1 can be defined in compositions Dir0, Dir1 or Dir2. Our choice was to define the link in the first common ancestor of the end points. Thus, link l1is defined in composition Dir2, having File1 as the source end point and Dir4/File2 as the target end point. Similarly, link l2is defined in composition Dir1, having Dir2/Dir4/File2 as source and Dir3/File3 as target. This decision of choosing which composite node will contain the link favors the graphical browser filtering mechanisms, since it minimizes the distance between interconnected nodes (see Section 4).

Figure 13:
Directory hierarchy and its corresponding composite structure.

If a link points to a target file located in another site, it will not be displayed in the map. As a future work, we intend to show a visual indication of this link occurrence in the source node.

As a future work we also intend to implement a third type of mapping that, like the second one, will consider all files. The difference will be that the tool will analyze HTML file content and automatically infer compositions. An HTML page containing a GIF image as seen in Figure 14, for example, will be mapped as a composite node containing an HTML text node and the GIF image node. The composition will also contain a synchronization link from the HTML text node to the GIF node, representing the temporal relationship between them.


Figure 14: Composition telemidia representing the HTML page telemidia.html.

6 Related Work

Other OHSs have also proposed solutions for integration with the Web. Microcosm group has developed tools for compile time integration [9]. However, the solution has a number of restrictions as discussed in [1]. In this section, we will only focus our discussion on runtime integration solutions.

Microcosm group also proposed the Distributed Link Service [4], DLS, for WWW users. DLS allows clients to connect to link servers to request a set of links to be applied on data in WWW documents. The system permits users to subscribe to many different linkbases. There is a main link database for the server, which is always used, and additional link databases from which the user may choose. These additional databases allow the server to offer a range of different link sets, known as contexts. However, as there is not a notion of nested composition, it is impossible to build a new linkbase inheriting relations defined in another one. In other words, users must know all linkbases that compose the context. Moreover, the lack of compositions does not allow structuring documents.

Some integration experiments were developed by Chimera [1]. This OHS allows organizing WWW nodes hierarchically in its hyperwebs, but it stores links in separate databases, permitting links between hyperwebs in its current version. Although the notion of compositions is embedded in the hyperwebs, they only group nodes, and the system does not have the possibility of defining contextual links as in NCM/WWW and DLS.

Hyper-G [2] is a large-scale distributed hypermedia system and one of the first OHSs to integrate with the Web. It allows nested compositions of nodes, but, like Chimera, it stores links in a separate database, also not allowing contextual links.

Integration experiments are also related in [8], where the DHM/WWW integrated system is presented. DHM conceptual model separates links from node content, but it stores links in a unique database, having the same shortcomings mentioned for the Chimera solution.

Chimera user interface, in order to allow link navigation through Web documents, opens a Link pane (another window) showing links of the current document. DLS offers an interface where a region of a WWW node may be selected and navigational operations may be triggered from a client browser menu (e.g., follow link operation). NCM/WWW, Hyper-G and DHM integration solutions have, as main objective, maintaining the WWW browser navigation user interface, so that users would follow OHS links the same way they do with WWW links.

In Chimera, DLS, DHM and Hyper-G, once an anchor is selected, it is passed to the server. The server finds out which are the possible destination nodes and computes the new contextual links. Then it returns the destination URLs and all new visible anchors to the client, in the case of Chimera, DHM and DLS, or returns the HTML page with all new anchors embedded, in the case of Hyper-G. In NCM/WWW, this task is done in the client, since it knows all contextual links of a perspective. The advantage of this approach is to relieve the server from being a possible bottleneck.

DLS client interface was designed to support the use of Netscape for communication with DLS servers. The nature of the client interface and the way it communicates with the server depend on the particular platform being used. Hyper-G requires a special browser (and also a special text document format, HTF) in order to achieve all benefits of the system, including the ability to create links and collections. DHM and NCM/WWW present a platform-independent solution, also based in the Netscape browser using applets, JavaScript and LiveConnect. DHM also presents two other platform-dependent solutions: one using Netscape plugins and other for Internet Explorer.

None of the related work offer a graphical browser that helps user navigation neither information search. Particularly in the area of Web structure visualization, some other research groups have been concentrating efforts on the matter. Some of them use site analysis tools for creating composite (cluster) hierarchies of nodes that are presented using graphical browsers [15, 24]. Others use site analysis to identify similarity relations among nodes, generating different structural views for the same information space [5]. And others are proposing new visualization techniques for Web structure [14].

The HyPursuit system [24] proposes a technique based on content and link analysis of documents for computing cluster hierarchies in the WWW. The system allows multiple coexisting cluster hierarchies based on different rules for grouping documents. Relationships among documents are given by the number of common terms, ancestors and descendants and by the number of direct links between documents. The navigation interface offered to users is purely textual.

In [5], another proposal for creating node clusters in the Web is presented, analyzing the documents’ content, links and navigation patterns. After the analysis, similarities between documents are extracted, originating new links that can be used to produce different hypermedia versions of the same information space. No comments are made about cluster nesting, but it is affirmed in the work that fisheye techniques could be used to improve the graphical view.

In [15], techniques for viewing WWW nodes, grouping them in cluster hierarchies, are proposed. Two types of analysis are used: structure-based clustering, where the hypermedia structure is analyzed and content-based clustering, where nodes’ attributes are analyzed. After generating clusters, a graphical browser called Navigational View Builder is used to visualize the structures. Filtering techniques based on nodes’ attributes, link types and structuring formats are also used. Authors report difficulties for cluster generation using content-based clustering in the WWW, since Web documents do not have many useful semantic attributes.

In [14], a technique for creating focus+context views of WWW sites is described. A site analysis tool considers documents structure (directory hierarchy and links) and node access frequency in order to determine site landmarks. After this analysis, two types of views are offered: a local view, which displays the focus node, nodes directly connected to the focus and the smallest path to landmarks; and a global view, which only presents landmark nodes. This proposal uses two distinct structural maps, forcing the user to mentally integrate them. In our proposal, local and global information are integrated in a unique view. However, the technique for automatically computing landmarks, exposed in [14], is very interesting and can be incorporated to our work with advantages.

Instead of analyzing WWW sites to identify similarity relationships among nodes by content analysis, structure analysis or even using navigation patterns, our proposal maps the file structure of web-sites to a composite structure. If web-sites are created using the directory hierarchy to structure documents as suggested in [12], the graphical browsers will generate more readable structural views.

Finally, none of the systems mention mechanisms for temporal and spatial synchronization relations, probably because their conceptual data models do not handle them. HyperProp has a well-defined support for synchronization relationships [19], that will also be incorporated in future versions of NCM/WWW using the present NCM formatter, fully implemented as a Java object.

7 Conclusion

This paper presented HyperProp OHS solution for integration with the Web. The development stage of the current prototype is similar to other mentioned related work, but presents other solutions and additional facilities. As mentioned before, in the current implementation, not all NCM facilities are incorporated in WWW documents yet. In short, among the currently integrated facilities are: separation of links from node content, definition of compositions to structure documents (allowing the addition of contextual links and reuse of structures of nodes and links), visualization of documents structure, separation of the presentation characteristics from the node content (allowing different exhibitions for the same object), user navigation through graphical views, and navigation through guided tours and trails that maintain the document navigation history.

Support for version control, cooperative work and the definition of n-ary relations that could also specify spatial and temporal synchronization among nodes are addressed as future work, already in progress.

None of related work that integrated OHS and the WWW offer a graphical browser for helping user navigation and information search. The NCM/WWW integrated solution provides graphical browsers for visualization and editing of NCM/WWW documents and also a web-site analysis tool to provide visualization of structured WWW sites. Moreover, all browsers support complex and large hypermedia documents, offering filtering mechanisms based on extended fisheye-views to control the amount of information displayed. These techniques allow showing the navigation possibilities depending on the user focus and also its position within the global structure in the map, playing an important role for user orientation.

Although HyperProp supports other media viewers, only Netscape is used in the current implementation. Thus, other NCM content nodes as images, audio and video are allowed only if Netscape (or a plugin) can handle them. As a future work, we intend to incorporate other viewers and also allow the presentation of more than one NCM content node in a single HTML page, simultaneously. In order to be able to do that, studies concerning the definition of presentation pages from temporal and spatial synchronization of NCM nodes are also in progress. These studies include the use of dynamic HTML to implement the behavior changes that can be defined in descriptors [19].

Unfortunately, it is still not possible to implement a platform and browser independent solution for the OHS and WWW integration, but a natural tendency is that WWW browsers become more opened, and Java language offers more features, allowing a more general solution in a near future. We hope that this paper has contributed in this direction.

Acknowledgments

This work was partially supported by Embratel, CNPq and Finep.

References

  • [1] K.M. Anderson. Integrating Open Hypermedia Systems with the World Wide Web. In Proceedings of the 8th ACM International Hypertext Conference, Southampton, pages 157-166, 1997.
  • [2] K. Andrews, F. Kappe, H. Maurer. Serving Information to the Web with Hyper-G. Computer Networks and ISDN Systems, 27(6):919-926, 1995.
  • [3] T.J. Berners-Lee, R. Cailliau, A. Luotonen, H.F. Nielsen, A. Secret. The World-Wide Web. Com. of the ACM, 37(8):76-82, 1994.
  • [4] L. Carr, D. Roure, W. Hall, G. Hill. The Distributed Link Service: A Tool for Publishers, Authors and Readers. In Proceedings of the 4th International World-Wide Web Conference, Boston, pages 647-656, 1995.
  • [5] C. Chen. Structuring and Visualizing the WWW by Generalized Similarity Analysis. In Proceedings of the 8th ACM International Hypertext Conference, Southampton, pages 177-186, 1997.
  • [6] K. Fairchild, S. Poltrock, G. Furnas. SemNet: Three-Dimensional Graphic Representation of Large Knowledge Bases. In R. Guindon, editor, Cognitive Sciences and its Applications for Human-Computer Interaction, Lawrence Erlbaum, pages 201-233, 1988.
  • [7] G. Furnas. Generalized Fisheye Views. In Proceedings of the ACM SIGCHI'86 Conference on Human Factors in Computing Systems, Boston, pages 16-23, 1986.
  • [8] K. Grønbaek, N.O. Bouvin, L. Sloth. Designing Dexter-based hypermedia services for the World Wide Web. In Proceedings of the 8th ACM International Hypertext Conference, Southampton, pages 146-156, 1997.
  • [9] W. Hall, H. Davis, G. Hutchings. Rethinking Hypermedia: The Microcosm Approach. Kluwer Academic Publishers, Massachusetts, 1996.
  • [10] IETF WEBDAV Working Group. WWW Distributed Authoring and Versioning (webdav). http://www.ietf.org/html.charters/web-dav.html , July 1998.
  • [11] H.W. Lie, B. Bos. Cascading Style Sheets, level 1. W3C Recommendation, http://www.w3.org/pub/WWW/TR/REC-CSS1 , December 1996.
  • [12] D.C. Muchaluat-Saade, R.F. Rodrigues, L.F.G. Soares. WWW Fisheye-View Graphical Browser. In Proceedings of the 5th International Conference on Multimedia Modeling, Lausanne, pages 80-89, 1998.
  • [13] D.C. Muchaluat-Saade, L.F.G. Soares. Fisheye View for Compound Graphs. Technical Report of TeleMídia Lab, PUC - Rio, May 1998.
  • [14] S. Mukherjea, Y. Hara. Focus+Context View of World-Wide Web Nodes. In Proceedings of the 8th ACM International Hypertext Conference, Southampton, pages 187-196, 1997.
  • [15] S. Mukherjea, J. Foley. Visualizing the World-Wide Web with the Navigational View Builder. Computer Networks and ISDN SystemsSpecial Issue on the 3rd International World-Wide Web Conference, 27(6):1075-1087, 1995.
  • [16] Netscape Communications Corporation. Java Capabilities API. http://developer.netscape.com/docs/manuals/signedobj/capsapi.html , 1998.
  • [17] E.G. Noik. Exploring Large HyperDocuments: Fisheye Views of Nested Networks. In Proceedings of the 5th ACM Conference on Hypertext, Seattle, pages 192-205, 1993.
  • [18] P.J. Nürnberg (editor). Open Hypermedia Systems Working Group. http://www.ohswg.org, November 1997.
  • [19] R.F. Rodrigues, L.F.G. Soares, G.L. Souza. Authoring and Formatting of Documents Based on Event-Driven Hypermedia Models. In Proceeding of the IEEE Conference on Protocols for Multimedia Systems and Multimedia Networking, Santiago, pages 74-83, 1997.
  • [20] M. Sarkar, M. H. Brown. Graphical Fisheye Views. Com. of the ACM, 37(12):73-83, 1994.
  • [21] L.F.G. Soares, M.A. Casanova, N.L.R. Rodriguez. Nested Composite Nodes and Version Control in an Open Hypermedia System. International Journal on Information Systems; Special Issue on Multimedia Information Systems, 20(6):501-519, 1995.
  • [22] Synchronized Multimedia Working Group of the World Wide Web Consortium. Synchronized Multimedia Integration Language (SMIL) 1.0 Specification. W3C Recommendation http://www.w3.org/TR/REC-smil/   June 1998.
  • [23] K. Tochtermann, G. Dittrich. Fishing for Clarity in Hyperdocuments with Enhanced Fisheye-Views. In the Proceedings of ACM ECHT’92 Conference, Italy, pages 213-221, 1992.
  • [24] R. Weiss, B. Vélez, M. Sheldon, C. Namprempre, P. Szilagyi, A. Duda, D. Gifford. HyPursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering. In Proceedings of the 7th ACM International Hypertext Conference, Washington, 1996.
  • 1
    In this paper, we will consider
    contexts and
    compositions as synonymous.
  • 2
    In order to be more precise, an HTML node is an NCM type of content node, as defined previously. On the other hand, we call a WWW node an HTML page of the WWW system.
  • 3
    We consider a site as a directory identified by a URL and all its corresponding sub-tree. For example, when a user navigates to the
    http://imperatriz.telemidia.puc-rio.br/telemidia/intro.htm page, the tool investigates directory telemidia and all possible sub-directories.
  • Publication Dates

    • Publication in this collection
      04 Feb 1999
    • Date of issue
      Nov 1998
    Sociedade Brasileira de Computação Sociedade Brasileira de Computação - UFRGS, Av. Bento Gonçalves 9500, B. Agronomia, Caixa Postal 15064, 91501-970 Porto Alegre, RS - Brazil, Tel. / Fax: (55 51) 316.6835 - Campinas - SP - Brazil
    E-mail: jbcs@icmc.sc.usp.br