Functional requirements for bibliographic description in digital environments

Nowadays in digital information environments, various types of resources coexist with heterogeneous metadata formats and standards and efforts have been made to achieve interoperability in order to use multiple metadata standards and reuse metadata records by developing strategies, which range from simple mappings among metadata elements to complex structural modeling. Dealing with information resources requires a description of form and machine readable content with results that are understandable to humans and can meet the interoperability requirements between information environments. Considering this, this research proposes a reflection and suggests the development of an architecture for semantic bibliographic descriptions that ensures interoperability in digital information environments. By using exploratory and descriptive analysis, we found that Descriptive Cataloguing methodologies and a bibliographic ontology description, explained in the rules and codes of cataloguing and in metadata standards, redesign the development of better structured new digital information environments to retrieve information and effectively establish interoperability.


Introduction
As the Web environment has heterogeneous and ephemeral information, it can be characterized as chaotic, where professionals from different areas look for solutions to deal more meaningfully with the content that is stored there and made more effectively available for use.The Semantic Web, developed by Tim Berners-Lee and led by the World Wide Web Consortium (W3C), is a project that aims to solve and minimize information retrieval problems on the Web through automated access to information resources.To this end, we intend to implement a technological structure and establish wider knowledge representation on the network with the semantic establishment of information contained in units, therefore enabling more effective retrieval techniques (Berners-Lee et al., 2001).
Over the past decades, it has been observed that a large amount of information has been published, stored and made available on the Web.However, only part of this information, in specific environments such as digital libraries, for instance, has metadata standards for bibliographic descriptions, which can standardize information resources to meet interoperability requirements.
According to Castro and Santos (2010), digital libraries are a segment on the Internet that seeks to develop and create methods and techniques to standardize information resources.However, ensuring interoperability among "islands" of well structured and standardized information between different bibliographic formats is a matter of research and a challenge for the scientific community.
Available current technologies based on research regarding the Semantic Web have been developed so as to attempt to add semantics to bibliographic descriptions to deal with digital content more effectively.The Semantic Web offers some solutions to problems such as standardizing information representation, creating a new outlook to store and process data.Some of these solutions could be implemented to enhance search results in the context of digital libraries.
The concern of the librarian community about new rules for bibliographic descriptions applied to digital environments has led professionals to rethink their profession in an endeavor to adapt to the new changes brought about by the arrival of technological recontextualization of libraries.
Taking this into account, we intend to point out and reflect on such changes by proposing a model of functional requirements for modelling bibliographic catalogs, guided by description logic methodologies of Descriptive Cataloguing, in ontologies for bibliographic descriptions explained in the cataloguing rules and codes and metadata standards to effectively establish interoperability in digital information environments.

Impacts on digital environments from the bibliographic domain perspective
In the present scenario of the Library and Information Science community, there is a concern and growing acknowledgement that there will be a need for a successor to the Machine Readable Cataloguing bibliographic format (MARC 21) due to new changes in the bibliographic domain which are spread by/as the result of intensive use of Information and Communication Technologies (ICT).According to Coyle (2011), these discussions tend to focus mainly on structural issues, for example: will the new format be eXtensible Markup Language (XML)?Will it use Resource Description Framework (RDF) and be linked to data standards?What these questions do not address is the much more difficult task of translating the semantics of library data into a new standard.According to Thomale (2010), it takes only a short investigation of the data coded in MARC21 to reveal that the tags and subfields themselves are inadequate to define the actual data elements carried out in library catalog records."The first step in the transformation of MARC21 to another format is to identify the data elements that are contained within the MARC21 record, which is not so simple" (Thomale, 2010, p.3).
The new-found enthusiasm for RDF to become the basis for bibliographic data of libraries has been aroused by considerable efforts and applications that convert MARC 21 to RDF, but none of them are official or recommended by the International Cataloguing Descriptive standards.
Among the officially accepted initiatives of converting data to RDF are the Library of Congress (LC) standards for resource description, such as Metadata Object Description Standards (MODS).Other proposals to convert library data to RDF can be found in the International Standard Bibliographic Description (ISBD) in RDF; Functional Requirements for Bibliographic Records (FRBR) in RDF; and Resource Description and Access (RDA) in RDF.
Each one of these efforts takes a standard library and uses RDF as its underlying technology, creating a complete metadata schema that defines each standard element in RDF.The result is that it has a series of RDF "silos" and each data element is defined as belonging exclusively to this standard.
There are four different declarations, for example, in the element "place of publication" in ISBD, RDA, FRBR and MODS, each with its own Uniform Resource Identifier (URI) and there are also differences among them (e.g., RDA separates place of publication, production, etc., while ISBD does not), but clearly they should have common vocabulary to address these issues (Coyle, 2012).
A possible solution would be if the different instances of "place of publication" could be dealt with as having a common meaning, such that FRBR elements could be linked to an ISBD element, but this does not occur.The reason why it does not occur is because each of them restricts the elements in a unique way (individual) that defines their relationship with a particular data context (which we usually think of as record structures).The elements are not independent from context which means that each one can be used within this particular context.This is the very antithesis of the linked data concept, where data sets from multiple sources share metadata elements.Furthermore, the elements that create the "link" in linked data are reused.To achieve process, the metadata elements must be unrestricted for a particular context (Coyle, 2012).
The connection can also be achieved by vertical relationships, similar to the terms (broader and narrower) in a thesaurus.This option is less direct, but it makes it possible to mix data sets that have different levels of granularity.In the case of place of publication of the ISBD, the three RDA elements that deal with this separately could be defined more broadly.Coyle (2012) states that, unfortunately, this is not possible because of the way that ISBD and RDA have been defined in RDF.Coyle (2012) points out a series of RDF "silos", data expressions in RDF that lack the ability to cross linked data because they are required to specify data structures and little is gained in terms of linked data from a bibliographic point of view.Not only are the RDF schemes incompatible with others, but none will be linked to bibliographic data from communities outside libraries that publish their data on the Web.Coyle (2012) argues that due to the initial stage of development of linked data for library environments, there are two options regarding the use of RDF.
-Define super-elements that float above the record formats and that are not bound by the constraints of the RDF-defined records.In this case, there would be a general place of publication that is a super element corresponding to all the places of publication in the various records, and would be subordinate to a general concept of place that is widely used.To implement linking, each record element would be extrapolated to its super elements.
-Define the data elements outside any particular record format first, and then use these in the record schemes.In this case, there would be only one instance of place of publication and it would be used in the various bibliographic records whenever an element is needed.These records would be interchangeable as linked data using their component elements, and would interact with other bibliographic data on the Web using the RDF-defined elements and their relationships.Coyle (2012) concludes that we need to create data first, and then records for applications according to the necessity of each information environment.These records will operate internally in library systems, while the data has the potential to make connections in linked data space.More effort has to be made to discover and define the elements of our data, and look outward at all of the data we want to link to in the vast information universe.
Libraries focus on bibliographic records that comprise the institutional collections, usually a complex document that acts as a catalog substitute, such as a book or a music recording.RDF, in this context, does not mention anything about the records; it only says that there are data that represent things (resources) and relationships between these things.What is often confused is that anything can be something in RDF, therefore the book, the author, the page, the word on the page, anything, or all of these could be things in the universe (Coyle, 2012).
Discussions about the future of digital libraries and their configuration to adopt Semantic Web technologies stem from the need for environments and information systems to create a data structure which takes advantage of the RDF capabilities, allowing the explicit form of relationships and promoting interoperability.
It is worth mentioning that bibliographic relationships have always existed within the bibliographic catalogs of data from a bibliographic record, through rules and cataloguing schemes; however, using and (re)using information in environments handling bibliographic and cataloguing data was not explained to individuals working at institutions.

Structural data modelling in digital information environments
When thinking about creating and developing a data model, we encounter issues such as granularity and data analysis to be catalogued.This is not new in the bibliographic domain where an implicit data model is developed in the description rules (AACR2) and in the exchange formats of bibliographic data (MARC 21).
Currently, with the development of conceptual models (FRBR and FRAD) and new rules (RDA) for modelling information environments, (it is recognized that) there is a tendency to structure and define data to be catalogued, prepare to move them and make them compatible with the Semantic Web.
In the intangible layers of digital information environments (defined in the representation and description of the information resource), there is an increase in structure and granularity of data.Yee (2009) points out that more structure and granularity enable system users to have more sophisticated presentations and increase the possibility of producing interoperable data.
Any change or mapping which was made to create interoperable data would produce the lowest common denominator (simpler and less granular data), and when interoperable, full recovery would not be possible due to loss.It could be easier and cheaper to apply them having data with less structure and granularity and having the simplest potential for the communities involved (Yee, 2009, p.59).
A name is taken as an example.According to the cataloguing rules (AACR2), the surname stands out from the name itself, first registering the surname, followed by a comma and then the first name.This amount of granularity could often be a problem for the cataloguer in an unknown culture who does not necessarily use these rules.More granularity may result in ambiguous situations for those who are collecting the data."Another example is related to the creator´s gender, in which the cataloguer would not necessarily know if a given creator was self-defined as a female or a male" (Yee, 2009, p.59).
Yee (2009) comments that if we add a birth and death date, whatever dates we use are all together in a $d subfield without any separate coding to indicate which is the birth date and which is the death date (although an occasional "b" or "d" will provide this information).We could provide more granularity for dates, but that would make the MARC 21 format much more complex and difficult to learn.
In the representation of the field 100 (personal author), for example, the authorized way to describe the metadata content is defined as follows: 100 1#$a Adams, Henry, $d1838-1918.
In this case, the $d subfield (dates associated with the name NR2 ) is 1838, the author's birth date, and 1918 is the death date.
According to Yee (2009), granularity and structure may also cause "tension" with each another.More granularity may lead to a lower structure (or additional complexity to hold the structure together with the granularity).In the search for more data granularity than there is now (RDA attempts to support RDF in XML encoding), the data is atomized to make it useful for computers, but that does not necessarily make the data more understandable to humans.To be useful to humans, it should be possible to group and arrange them in a meaningful way for cataloguing, indexing and displaying.The developers of Simple Knowledge Organization System (SKOS) refer to the vast amount of unstructured information (i.e., human readable) on the Web by labelling bits of data as semantic relationships of records in a machine-actionable way so it does not necessarily provide the kind of structure required to make data readable by humans and, therefore, useful for people on the Web (Yee, 2009).
To reinforce their thinking, Yee (2009, p.59) states that: The more granular the data, the less the cataloguer can build order, sequencing, and linking to data; the coding must be carefully designed to allow the desired order, sequencing and linking so that cataloguing, indexing and displaying are possible, which might call for even more complex coding.
Regarding the data structure, Castro (2012) defines it as the intangible layer of instancing bibliographic data modelled for representation and description, such as formats and/or metadata standards, to allow interoperability in digital information environments for human and non-human agents, ensuring more accessible interfaces to users for later retrieval, use and (re)use of information resources.
When data structure is mentioned in the bibliographic domain, we think of a conceptual data model that is established by FRBR.This makes use of an entity-relationship model, which consists of two main concepts: 'things' and relationships.FRBR defines 10 categories of 'things' , which are called entities: Work, Expression, Manifestation, Item, Person, Corporate body, concept, object, event and place.
The entities may be included, for example, as a work, a text, a book, etc.The attributes correspond to the data features related to the entity and serve to differentiate the intellectual or artistic content.The relationships describe the links between one entity and another, to help users manage information resources in a system (Moreno, 2006).Moreno (2006) presents the main terminology in the conceptual model of FRBR data: Work is an abstract entity, a distinct artistic or intellectual creation.Expression entity is the intellectual achievement or specific art that is needed for a work to be done, excluding aspects of changing the physical form.One Manifestation is the materialization of an expression of a work, i.e., its physical support, which could be books, journals, multimedia kits, films, etc., which is represented by the Item, a single copy of a manifestation.The last two entities reflect the physical form, they are concrete entities, while the first two reflect the intellectual or artistic content (p.35, author's words in bold, my translation)3 .
The context and exemplification of FRBR can be interpreted as follows.Take, for example, a literary novel (Work), which has an original text and it is translated or some change has been made, such as an illustrated edition (Expression); the ways in which the work is available can be found/viewed in a printed format (manual) or an electronic/digital format (Manifestation); and the time the work is available on the library shelf, i.e., the copies related to that work are denominated Items ("materialization" of the bibliographic resource).FRBR is a "new" approach for contemporary Descriptive Cataloguing in its conventional manner, so as to provide a more effective and intuitive recovery of bibliographic items, acting as a reference librarian, i.e. it lists all material linked to the search term by showing them all at once on a single interface.
For example, if a particular author or book has other manifestations, such as records, CD and DVD, when the user is searching, the system/agent (through the FRBR relationship model) will list all these manifestations and recover them at once, presenting them to the users.Riley (2010) has studied RDF comparative terminology in more depth in the context of Library and Information Science, specifying: -Subject: in libraries, it covers an information resource in terms of content; in RDF -it is a declaration about something (information resource).TransInformação, Campinas, 28(2):223-231, maio/ago., 2016 http://dx.doi.org/10.1590/2318-08892016000200008-Vocabulary: in libraries, it means a certain kind of controlled vocabulary (authorized terms, hierarchical structures, related terms, etc.); in RDF -more flexible settings (including formal definitions of classes and properties of an information resource).
-Class: in libraries, a classification scheme Dewey Decimal Classification (DDC); Universal Decimal Classification (UDC), etc. indicating the general topic or area of knowledge covered by the information resource; in RDF -a type or category belonging to an object or information resource.
-Schema: XML Schema defines a set of elements designed to be used together; RDF Schema defines classes and properties intended to be used anywhere, alone or in combination.
The difficulty in any data modelling exercise, especially in the bibliographic domain is deciding what to treat as an entity or class and what to treat as an attribute or property.FRBR decided to create a class called expression to deal with any changes in the content of a work.FRBR is in harmony when compared with the RDF data model: the FRBR entities are recorded as classes, while the relationships are recorded as properties.
Functional Requirements for Bibliographic Records in RDF adds only three classes.Two of them (Endeavor and Responsible Entity) are super sets of FRBR classes.
Endeavor is a generalization that can be related to a work, expression or manifestation, i.e. a class whose members are any products of an artistic or creative activity.Responsible Entity is a more general term that can be related to a collective entity or a person.These classes specify information more clearly about the intellectual content of a resource, without having to provide additional information.The third class added is the subject.All three include any instance of the subject in its schemes.FRBR clearly treats the subject as a relationship (Davis & Newman, 2005;Coyle, 2012).
Thus, the condition of digital information environments interoperating their data is conceptual modelling, and is defined and codified by a number of functional requirements established by the metadata architectures, rules and schemes of bibliographic description and ontologies, which will provide better structured environments to ensure more effective information retrieval for users (human and non-human).

Architecture for semantic bibliographic description: the results
By examining the available scientific literature concerning the new ways of cataloguing, which is greatly affected by the extensive use of contemporary digital technologies, we seek to identify conceptual and methodological elements to develop a structural modelling information environment that is based on the bibliographic description schemes, metadata standards, computer languages, metadata architectures, conceptual models for bibliographic data and ontology for bibliographic description.
The keyword is interoperability when creating digital information environments, particularly for digital libraries.To this end, for interoperability to be efficient, we need to take a closer look at structural layers and designs of digital bibliographic catalogs, i.e. when representing and describing data in order to enhance ways of searching and retrieving information.
As a starting point to ensure interoperability, this research, based on observations and reflections in the field of Descriptive Cataloguing, highlights some more effective functional requirements and guidelines that can be used to establish interoperability in digital information environments.To do this, we intend to understand the intangible structure in a proposal presented in overlapping layers, as shown in Figure 1, as they must work in synergy for consistency and full functioning of the digital environment.
-Data typology: in this initial stage, the designer (cataloguer) defines which data will be worked on for feeding and modelling the information environment based on the bibliographic resource being catalogued.Example: audio, pictorial, textual data, etc.It is worth mentioning that we will concentrate only on textual data in this study, which is explained in the cataloguing codes (AACR2) and the metadata standards (MARC 21).
-Data preparation: once the bibliographic data to be used in the system are defined, data preparation consists of adopting tools to convert the data into RDF.The data were extracted from other sources, i.e. not RDF data.W3C has recommended some converters that help conversion, such as RDFizer.The option for choosing RDFizer is because it does not officially have MARC data -Treating and storing data: After converting the data into RDF, the next layer is when the cataloguer actually adopts the rules and/or bibliographic description schemes (AACR2 and RDA), i.e. cataloguing bibliographic resources when standardizing the metadata; the definition of metadata standards (MARC 21) of the RDF metadata architecture for structuring data and RDF Schema validate it.It is worth mentioning that at this time the cataloguer should also adopt Functional Requirements for Bibliographic Records (FRBR), in accordance with the Functional Requirements for Authority Data (FRAD) and Functional Requirements for Subject Data (FRSAD).Ontologies appear in this context to define the concepts of the elements of a bibliographic record by means of rules and description schemes for the methodological preparation of metadata and metadata standards.
-Presentation (Display) of data: The final stage is to make the data available (output) and present them to users in the information environment.The data may appear in the way they were built and stored (input) in layers 1 and 3, for the tangible layer of recovery and also displayed on the Web.
It is believed that these requirements and recommendations can provide a better modelling of structured catalogs for later retrieval, use and re(use) of information, ensuring interoperability and enhancing the semantic bibliographic relationships, which is an initiative that goes against the ideas envisioned by Semantic Web.

Conclusion
It is believed that the functional requirements for structural modelling of digital information environments in this research can enable sharing among metadata standards, environments and different information systems, working in a collaborative philosophy among the information resources available and the technologies that are found in its construction, establishing interoperability, optimization of bibliographic relationships and expanding the standardized building of resources on the Web.Ontological models of bibliographic relationships have advantages for structuring new catalogs.One idea with many variations is to extend MARC 21 format fields with explicit links to other works.The organization of MARC 21 has very complex data files.It is possible to convert the idiosyncratic structure of MARC records, i.e., field codes, subfields and the conditional values of the indicators, to more standard formats.However, the need to do so prevents using advances in mainstream database technologies.
Using ontologies, metadata and theoretical and methodological frameworks for Descriptive Cataloguing, which have been articulated, can redesign new options for information environments, whether in database modelling (catalogs), or even in the way of representing information resources, ensuring the possibility of semantic interoperability and helping to represent resources, providing users with a variety of ways for searching, accessing and retrieving relevant and significant information, as well as using, preserving and (re)using a single interface.
Concerning Information Science professionals, especially cataloguing librarians, they cannot overlook the need to create bibliographic records, which are based on a methodological and theoretical-epistemological approach that upholds and complies with predetermined records, established by rules and/or bibliographic description schemes which are internationally accepted, aiming at universal bibliographic control.
The strategic framework to promote interoperability in digital information environments takes places when the intangible layer (representation and data description) of instancing and persistence of bibliographic data is linked to the visible layer of presentation for users.The cataloguer designs by anticipating the information environment using ontologies for bibliographic descriptions explained in the rules and schemes to represent and describe bibliographic resources, conceptually defining the metadata elements, as well as the conceptual models and metadata standards, ensuring the uniqueness of resources, displaying them according to the requests of search strategies and user recovery.
Thus, the architecture for semantic bibliographic description and representation and the interoperability levels developed in this study currently provides structural modelling of digital information environments based on the heterogeneity of metadata schemas, therefore making it one single structure to be adopted by many cataloguers, expanding its scope to standardized building resources on the Web.In addition, adopting this model effectively establishes interoperability, mainly due to the technological impact on Descriptive Cataloguing.

Figure 1 .
Figure 1.Functional requirements for structural modelling in the bibliographic domain.Source: Developed by the author (2012/2015).Note: AACR2: Anglo-American Cataloging Rules -Second Edition; CDD: Dewey Decimal Classification; FR AD: Functional Requirements for Authority Data; FRBR: Functional Requirements for Bibliographic Records; FRSAD: Functional Requirements for Su bject Data; ISBD: International Standard Bibliographic Description; LCSH: Library of Congress Subject Headings; MARC 21: Machine Readable Cataloging; OWL: Ontology Web Language; RDA: Resource Description and Access; RDF: Resource Description Framework; RFDS: Resource Descriptio n Framework Schema; SKOS: Simple Knowledge Organization System.