7 The fruitfulness of this approach has been demonstrated with heritage photographs, when communities of anonymous individuals add information to the photographs, enabling librarians to improve the accuracy of the information. The Library of Congress’s experience of making numerous photographs available on Flickr is significant, as is PhotosNormandie, established by a group of historians, about the Battle of Normandy :
Patrick Peccatte, “PhotosNormandie at five years – a record in the form of FAQ” (PhotosNormandie a cinq ans – un bilan en forme de FAQ), Culture Visuelle, 27 January 2012. http://culturevisuelle.org/dejavu/For the Common Good : The Library of Congress Flickr Pilot Project, 30 October 2008.
http://www.loc.gov/rr/print/flickr_report_final.pdf Herv Le Crosnier Finally, metadata is akin to a fact sheet that accompanies a book. One may then create a catalogue by grouping those sheets and making them searchable. Or one can facilitate discovery by allowing a reader to browse between abstracts/summary sheets before diving into an entire book or document. Finally, the metadata allow to track a document’s history, both physically (for example, the history of a found work that has been scanned, such as the Timbuktu Manuscripts 8) and intellectually (by linking to translated versions, audio versions or films, access to preparatory documents, or even a manuscript’s digital archives).
As a separate sheet of paper, the presentation of the metadata itself can also promote multilingualism and shared understanding. Regardless of a document’s language, the description can preliminarily provide a multilingual approach, allowing for the discovery of a book written in a foreign language. This is especially useful for publicising scientific research written in local languages. If the metadata record is translated into the scientific community’s several major languages, a document intended for local users (students, young researchers, civil society and policy makers) can place itself within the global field of knowledge. It can also be translated later, depending on the interest it generates.
Metadata should be exchanged between multiple systems, and should be deployed in different ways to meet specific needs. They are the basis of the semantic web, and as such, they are written primarily in a computerreadable format. The most widespread model is currently the rdf format (Resource Description Framework) 9, which is standardised and extensible, and can handle all the languages of the world, each piece of information being preceded by a language code in multilingual cases. Even though rdf is a computer-readable metadata format, it still lacks software that can easily capture the contextual information. The format’s flexibility on the one hand, and the growing quantity of information that we maintain in the descriptor flyer on the other, means that metadata management systems are not yet entirely user-friendly.
There is thus a contradiction between the results obtained by software only, such as optical character recognition (ocr), which indexes an entire text using search engines ; and the results from working directly with the metadata. The first method is of the “industrial” variety, with a digitisation 8 Tombouctou Manuscripts Project, http://www.tombouctoumanuscripts.org/ 9 RDF primer, 2004, http://www.w3.org/TR/rdf-primer Herv Le Crosnier process chain. It is economically more efficient, and works well as long as one accepts to view “the book divided into pages”, in the words of JeanNoel Jeanneney 10. Conversely, the use of metadata is close to artisanal or skilled labour, with every professional being able to add information about the card, including translations of terminology, titles, and summaries. This type of work can also be opened to users themselves in a process known as crowdsourcing.
Documents are often translated, re-edited, or subtitled. Printed works tend to disperse the various translated versions of a text, just as cds or dvds do with video and sound. But computerised catalogues, based on extensible metadata records, can compensate for this dispersion, offering the reader a list of the various versions. This concept was developed by libraries in the late 90s under the acronym frbr (Functional Requirements for Bibliographic Records 11). It offers a mechanism for discovering the translated versions of a work best suited to each reader, and can be extended to digital documents, articles from scientific journals, blog pages, or videos with overdub and sign language animation 12. In the opposite direction, adopting the frbr model to access documents will strengthen the desire to translate. We often hear of a document through a title or other criterion linked to a specific language, usually that of the original document, and everyone searches for that version. However, a version may exist in the reader’s own language, which can be discovered using frbr.
This process of discovery reinforces interest in translation, quite simply because there will readers to take up the translated version.
DIGITISATION Digitising consists of reproducing pre-existing analogue documents in digital form. One may digitise books, films, videos, photographs, or sound recordings. Digitisation means controlling two elements : the digital file formats, and rendering the digitised document searchable.
10 Jean-Nol Jeanneney, When Google Defies Europe : The Case for a Jump Start (Quand Google dfie l’Europe : plaidoyer pour un sursaut), Fayard/Mille et une nuits, 2005.
11 Modles FRBR, FRAD et FRSAD, Bibliothque nationale de France.
http://www.bnf.fr/fr/professionnels/modelisation_ontologies/a.modele_FRBR.html Barbara Tillett, What is FRBR A Conceptual Model for the Bibliographic Universe, Library of Congress, 2004. http://www.loc.gov/cds/downloads/FRBR.PDF 12 See in this book : Annelies Brafort & Patrice Dalle, Accessibility in Cyberspace : Sign Languages.
Herv Le Crosnier Digitising transforms an original into a digital copy. One obtains an image whose quality corresponds to the technical means available at the time of scanning—which is advancing rapidly. Scanned images from only twenty years ago are a far cry from the definition and quality of those scanned today. In addition, so-called “bitmap” files, which retain colour information for every pixel (a point, in computer code), are quite memory heavy and difficult to send. More manageable systems and compression formats, such as.jpeg or.png, have been developed. The same phenomenon exists for audio (.mp3,.flac,.vorbis, etc.) and video (.mpeg4,.ogg-theora, etc.). A format is always a balance between quality and manageability. One of the functions of digital libraries is then to maintain the original digital file in the highest possible definition, and to manage format changes to make the best use of the available technological environment (network quality, screen resolution, and so on).
Making files searchable and findable is more complex. For example, it means recognising characters, words, sentences, and above that the sense of a scanned text 13. It may mean transcribing an audio stream into text, identifying a video’s sequences or an image’s regions. We can then retrieve documents based on words (text search), or even by entering images to retrieve similar images 14.
Even more complex are the operations for adding contextual information and metadata to digitised files, compressing them to make them manageable, so that they can be retrieved via automatic indexing, which can then help to reorganise pages into a book, a collection of photos into a catalogue, videos into their various translations in overdub and subtitling. These actions require human intervention, with global coordination, to benefit from the leverage of the library network. But these are the operations that ensure that scanned documents will be treated with the same respect as their earthly originals : that they will be kept in an orderly form, are fit for public consumption, and can thus constitute the digital library collection.
It is also in this third, often neglected phase, that multilingualism intervenes, for its indexing utility (for example, to have the names of cities in a country’s language and script in addition to names in other languages), 13 This process differs following languages. See in this book : Pann Yu Mon & Madhukara Phatak, Search Engines and Asian Languages.
14 See for example http://tineye.com Herv Le Crosnier its ability to link documents with a common original source (translation), or documents dealing with similar topics (classification).
THE E-BOOK Besides web pages and their archives, and in addition to documents that are scanned for online distribution, we are now seeing the emergence of the e-book. This new digitised book is a documentary object that has inherited from web technology the ability to be read on machines or computers. Most often they are read on reading-specific electronic devices called e-readers (Nook, Kobo, Kindle) ; tablets (iPad, Kindle Fire, Samsung) ; and smartphones that function both as a phone and a pocket computer (iPhone, Android). But as “books”, they have two qualities :
portability—being easy to carry and use without any network connection ;
and, similarly to printed books, all of their content “resides between two covers” 15. The e-book’s techniques may come from the online experience, but its core concept – organising content and shaping reflection – comes from the world of publishing.
E-books today are often designed as complementary versions of printed books. The book you are currently reading, which is available in both printed and digital formats, is no exception. However, we will increasingly see books come out that have a book’s organization and ease of exchange and citation, but are only published in a digital version. In particular, this will be used for confidential professional exchange, essays, text collections, rapidly-evolving documents like guides or tutorials, and even textbooks.
Libraries will be faced with the question of how to store and maintain the availability of these sorts of public documents. This challenge will expand their roles and responsibilities, including obligating them to develop a system of distance lending. As far as that goes, we are witnessing the birth of “chronodegradable” books, which cannot be read after a specified loan period. Libraries might also offer remote e-readers (usually within web browsers).
But language’s big challenge is whether published e-books are mono- or multilingual. These documents’ ease of creation, the ability to improve their content over the course of re-publishing and to provide the books 15 Michel Melot, Nicolas Taffin (ill.), Books (Livres), L’il neuf ditions, 2006, p. 27.
Herv Le Crosnier with tools to facilitate reading, just as dictionaries, glossaries and notes, facilitate reading books in minority languages. These e-books can also integrate text with videos or pictures. Finally, some e-readers include a Text-To-Speech feature that facilitates content access for the visually impaired.
The proliferation of creative tools 16 and the potential for professionals to truly agree on e-book formats are two issues at stake to help these materials spread widely and push along the development of digital books, especially in countries without printing infrastructure or access to paper or bookstores. Libraries thus have a major role to play. Some online bookstores like Amazon are trying to pass off their book selling as akin to the function of a “library”, reducing the latter’s work to that of a commercial transaction. This reinforces the need for actual libraries that handle digital documents to remain independent from the book market (and more widely the document market), as an organization that services all of society, and guarantees the balance of ideas and the rights of readers.
AUDIO AND VIDEO RECORDINGS Recording tools are advancing. This will propel the formation of audio and video collections, which in their turn will enter the digital libraries.
Which means that familiarity and use of the three steps discussed above will become vital : format compression management, maintenance and updating, retaining documents in the best possible condition for later reissuing, and finally, adding metadata to provide context and summarise ideas and knowledge in order to facilitate reading overview and selection.
The proliferation of these recording tools provides an opportunity for libraries to enrich and diversify their collections from a linguistic viewpoint. The standardisation of grammar in writing is far from having an oral equivalent. French, for example, changes in pronunciation, intonation, inflection and even flavour between the inhabitants of Paris, Lille, Marseille, Quebec, Senegal and the Antilles. Even within the major languages, there exist strong differences that grammar books and academic works cannot always harness, insofar as these different ways of speaking spring from a linguistic community’s history. Additionally, dialects evolve 16 See for example the online tool Polifile (http://polifile.com) for creating digital books in ePub format.
Herv Le Crosnier rapidly, especially under the uniformising force of the media that construct recognised and distinguished forms of pronunciation.
The capacity to build an oral archive is also a means for transmitting the knowledge stored in endangered languages, or to rebuild collections of extinct or endangered languages and dialects to our best ability, by using original recordings 17.
This oral information gathering process 18, which aims to collect popular phrases from stories, voices, songs and music, has long been part of the mission of libraries and ethnographic museums. It is “language preservation” in the sense that it maintains a record of historical developments and changes in linguistic phenomena. Technology proliferation, democratisation (lower costs), and the ability to build and make available these collections will strengthen this work. Libraries, whose mission and culture projects them into the temporal dimension of documents, have a special role to play in this process.