• Archives preserve the internal documents of a structure (company or institution) or individual (personal records). In general, archives deal with “unicates”, that is, documents existing in only one copy, managing them into groups that are organised by the archivist : files, archival boxes, or sets of bound documents. These groupings are often the only ones described in the catalogue, which is why archives Herv Le Crosnier always contain as yet unknown documents, treasures for the curious researcher to discover. By extension, some have spoken of archiving the media, a term that considers an audio or audiovisual stream as consisting of a single copy, even if its vehicle for distribution (movies, series and music recordings) consists of multiple documents. Before the innovation of cloud computing, this stream was impossible to maintain in its entirety. Instead, archives use “sampling”, keeping only randomly selected examples to inform future historians of the archived period’s social practices ;
• Museums also preserve unique objects, but accompanied by a tracking file that tells the object’s known history, including its various owners and any restorations made.
But these distinctions, which describe different approaches to documents, are becoming increasingly tenuous in the age of digitisation and the internet.
The digital document’s trademark feature is its duplication at a marginal cost close to nil. The main costs relate to its creation (the cost of the prototype, whether of an original work, or of an already existing work that is digitized), and infrastructure (from datacenters to user terminals, and the communication network that connects them). Yet many of these documents are never duplicated : blog pages, newspaper websites, online shopping catalogues, photographs uploaded to services such as Flickr or Picasa, and so on. The existence of a single, centralised access point allows website publishers to profitise production and infrastructure, either by selling advertising space, or through subscriptions. Libraries, with their collective and multi-centennial experience, distribute document storage to render their collections permanent, accessible, and in close proximity to readers. This makes the web as a whole operate more like an archive, consisting primarily of unicates despite the low duplication cost. One even speaks of “web archiving” to describe the process of copying onto high-capacity external hard drives samples extracted from websites at regular intervals – or at least copying their appearance at a given point in time, so that future readers can access not only textual content, but also the “materiality” of reading as it existed at the time of a document’s production.
Herv Le Crosnier So it is more than just an enigmatic recasting that generates the name “Internet Archive” for the main online copy registration service since 1995. The service does more than archive the Web itself ; it also attempts to create an accessible online audiovisual “archive” (film collections, musical recordings and digitised books). In France, the web archiving division between two institutions emphasises its contradictory nature : the National Library of France (Bibliothque nationale de France) conserves websites with the.fr 2 extension, dominated by “text” or “published” documents, while the National Audiovisual Institute (l’Institut national de l’Audiovisuel) conserves “broadcast” streams, whose numbers are increasing (online radio, television, and music websites, among others).
The entire stream flowing through the web is considered a document for archiving. In the United States, the Library of Congress has an agreement with Twitter to “archive” all that is exchanged on that particular social network, that is, messages of up to 140 characters and the conditions surrounding their release (who publishes, to which “followers”, what is “re-tweeted” ; in short, the author’s social graph image) 3.
This allows us to measure the work required to save a trace of the present for future generations. The size of the Web always exceeds expectations. Documents deposited into this vast global network haven’t been organised by anyone, save the producers of information themselves. One now speaks of autoritativit 4, literally “authorativity”, to describe this new phenomenon that puts authors in the position of being the sole decision-makers of a publication, for example by clicking their blog’s “publish” button. The editorial process, which transforms a document into a book or other publicly-accessible medium, has been replaced by the immediacy of publication/broadcast. And services whose goal is to save memory are put in the position of filling a bottomless pit : who should “judge” the “value” of a document and save it for the future This has never been the library’s prerogative, or its mission. Should we leave this role to an algorithm that measures and evaluates based on usage (number of clicks, number of links, etc.) at the risk of sentencing to oblivion major 2 Which doesn’t include the entirety of websites published in France, but only about a third.
3 Olivier Ertzscheid, Twitter, un patrimoine superflu(x) Affordance, 9 mai 2010, http:// affordance.typepad.com/mon-weblog/2010/05/twitter-le-patrimoine-du-superflux-.html 4 Autoritativit, In : Dictionnaire des concepts info-documentaires,Savoir-CDI.
http://www.cndp.fr/savoirscdi/index.php id=Herv Le Crosnier works that only time and the patience of readers can measure This has happened so often in the past 5 that we know popularity is not necessarily synonymous with quality. Finally, in this universe of algorithmic performance, how can documents that are written or spoken in minority languages fight back The media model cannot be used as a guide for the actions of librarians/archivists.
THE LIBRARY FUNDAMENTALS As we have seen, “digital libraries” must be distinguished from “web archives”. The function of the former is not to record the Web, but to extract and organise its document content for future readership. It is also a way to gauge the world’s digitisation, by relating it to the documents themselves. Building digital libraries, which store, organise and make available these types of documents is becoming an urgent task to ensure knowledge development and sharing. When we speak of storage, we must also mean preservation, long-term if possible, meaning document encoding formats are regularly refreshed (for videos, images, and even digital books).
To understand the functions of digital libraries, it is first important to recall the library fundamentals. A library is a public organism whose functions include building document collections, describing them (through cataloguing and indexing), conserving them, and offering them up for reading by a specific readership. All the terms of this definition are important.
First, the library is not governed by commercial imperatives. The whole of society, through public and therefore collective financing, feels the need to create spaces that guarantee long-term access, available to all, of the products of human knowledge. Like any collective solution, this has its drawbacks, notably a “reaction time” that differs from the interest and attention that the media can engender. But this ensures that the documents forming bibliographic stores are selected so that each idea, theory, perspective and language is fairly represented, while time and criticism allow the most significant documents to emerge.
5 Samuel Beckett, for example, who was later awarded the Nobel Prize for literature, sold only 150 copies in France of his play Waiting for Godot the year it was published.
Herv Le Crosnier Second, a library is a collection of documents that is organised, that is coherent, and that represents a collective will. Digital production, and even more so the Web, tends to be judged by collection size. One speaks of millions of digitised books, millions of photographs, the quantity of video posted each minute, and so on. Is this race, this desire to “up the numbers” effective Is it not a tangent deriving from the simple fact that it has become technically possible A collection, on the other hand, pursues a goal. It seeks to ensure either the completeness of a restricted domain (as with research libraries), or public satisfaction (the diversity necessary to meet the needs of a public library situated in the heart of an informal neighbourhood). In the case of digital production, one must distinguish the time of the collection’s creation from the time of document access. This can be done by the digital library’s own catalogue, but is usually done through collective catalogues, a role often performed by external actors who form the libraries’ collections. Search engines can index documents in several collections. Protocols are in place to promote the creation of global indexes, despite the specificity of each collection.
Thus, the oai-pmh 6 protocol allows external search engines to create indexes from metadata for each document in a collection that is open to such “harvesting”.
Finally, library collections are audience-specific. The mission of a university library, which caters to students, is not the same as that of a specialised laboratory, or in the opposite direction, actions promoting reading and literacy conducted by “street” or neighbourhood libraries, or sites of exclusion (prisons, hospitals, and so on). By defining a specific audience, one obliges the acquisition of staff and services tailored to that audience ; a library is above all a set of services, from the reception desk to its guiding mission – helping readers find documents they would not otherwise be able to discover.
By this logic, a library is defined more by its audience, activities, positioning and project, than by documents themselves, which are, let’s recall, “duplicates” existing in multiple copies in many places worldwide. Should we not maintain this approach in the digital world Rather than focusing on documents, number of pages scanned, and catalogue size, shouldn’t 6 Franois Nawrocki, “The OAI Protocol and Its Uses in Libraries” (Le protocole OAI et ses usages en bibliothque), Ministry of Culture, France, February 2005.
http://www.culture.gouv.fr/culture/dll/OAI-PMH.htm Herv Le Crosnier we return to the relationship between the collection and the public, which is guaranteed apart from any commercial pressures on the librarian And in any case, isn’t this the only approach that not only emphasises the conservation of documents in minority languages and cultures, but also offers for these languages’ speakers access to documents from other linguistic universes that are suited to their needs METADATA The traditional library is not just a set of books, or even a collection tailored to its readership. It also performs two tasks : inventory and description on the one hand, and organising the knowledge included in its collection on the other. It is through metadata management that we encounter the performance of these tasks in digital libraries.
Metadata denotes all the information that we can obtain about a document. It is thus a primarily descriptive cataloguing of information : author, illustrator and prefacer references ; edition indicators (dates of publication and, if applicable, re-release ; collections, publisher references, date of scanning, etc.), the work’s collation (number of pages, book specifics, especially if it contains marks indicating the ex-libris history, scanning method, file format, duration of sound or audiovisual recordings, etc.) and finally, references regarding the book’s filiations to other works in the same “family” (including the indication “original edition” if it is an original with translations, a multi-volume work, and so on). Descriptive cataloguing provides a work with a material and editorial context, whether it was printed onto media first (cd, dvd) or published directly online.
This question of context is even more important when it comes to digital documents. Too often, the ease of using text itself to retrieve documents makes us forget the need to place a given document in its broader context (date and terms of publication, document type and genre, etc.). Let me add that digital production with its unique ability to link information, allows us to go even further than we are usually able in material libraries.
Thus, links to author biographies, photographs, the reproduced covers of all linguistic versions of the same book, or a list of available critical readings, can increase a book’s range of perspective, placing it in the context of an entire publishing venture.
Herv Le Crosnier Metadata also includes information describing or summarising the knowledge contained in a book, thereby ensuring the consolidation of books or data covering the same topics. First, there is classification, the ability to place a given work in the same semantic field as others. In this way, we can scan a subset of knowledge to measure complexity, and to discover within it works with a unique viewpoint regarding classics in a given field. All scientific sectors have classifications adapted to the production of documents in their area of knowledge. Besides the ability to group classifications to promote inadvertent or serendipitous discovery, keywords and tags offer precise indicators that a document addresses a specific subject. For this, there are two different approaches that can be complementary in the case of digital libraries :
• Follow established descriptor rules by selecting from a pre-existing and closed list that is shared by several libraries (the concept of the “authority list”), or in accordance with a pre-established framework (for example if a document addresses a particular place, person, event, historical period, and so on) ;
• Let each person decide on tags, possibly adding them to a list. This model is called a “folksonomy”. It creates independent information from each tag or descriptor, provided by the readers themselves, which raises questions of validity and disperses research, but by the same token considerably enriches description through the action of anonymous amateurs 7.
Here as well, the digital network allows for all these approaches to cohabit.
Professional metadata workers can transform their professions to create and validate the experience and knowledge of readers, using independent tags to build structured descriptors. They thus offer a framework to allow amateurs and fans to exercise their collaborative will to share knowledge.