An essay by V. Cardigan, as provided by Emma Tonkin
Art by Dawn Vogel
The theoretical concept of a provably complete library, generated by an infinitely parallelised random process, is well-known. In this article, I report on the results of a sample implementation of Borel’s well-known “typing monkey” thought experiment. Through analysis and evaluation of our practical findings, I identify best practices, issues encountered, and potential future developments in the field.
Ever since the publication of Émile Borel’s contribution to the field of modern librarianship in his well-known 1913 article, “Mécanique Statistique et Irréversibilité,” the construction of a provably complete library (PCL) has been tantalisingly close to humanity’s grasp.
Borel’s pioneering vision described the employment of an infinitely large number of “singes dactylographes,” usually rendered into English as “typing monkeys,” each of which would generate a unique text on his or her individual typewriter. Although the majority of such texts would naturally contain nothing but gibberish, a small subset would consist of all existing (and possible) works of literature. The aggregate works of an infinite set of monkeys would therefore far exceed the reach of copyright libraries, institutions that are entitled to a copy of every book ever published in the host nation. Such a library would also contain valid textual encodings of all possible binary formats, software packages, scripts, and multimedia works. The PCL is a provably complete textual, image, multimedia, and software library.
Borel’s work has generally been dismissed as purely theoretical due to the difficulties inherent in the practical construction of an infinite dataset. In particular, although contemporary computing platforms are able to simulate a series of individual typing monkeys, they do not permit the simulation of the requisite infinite set of typing monkeys in a realistic period of time. In the past, parallelization has failed to significantly relieve this difficulty. The best parallel architectures commercially available today cannot provide the infinite parallelism required by Borel’s Gedankenexperiment.
A practical application of infinitely parallel computing
Last year, we in the Information and Library Services department were pleased to learn of the eighteen-billion-pound development of a new infinitely parallel quantum computing (IPQC) facility within our university. This construction made extensive use of existing facilities provided by the Physics Department’s very high energy physics research group and was to be operated by the cryptography research group, a collaboration between the Schools of Computing and Mathematics.
Whilst I am not an expert on the system’s theoretical underpinnings, it is my understanding that, although a quantum computer only requires a single universe in which to operate, this facility electively transcends the limitations of our universe by breaking through into an infinitely parallel local multiverse. Hence, it allows the programmer to draw upon a true infinite-dimensional Hilbert computational space. Metaphorically, the IPQC facility can be viewed as an uplink to the wireless networking endpoint of the monolithic high-performance computing facility provided by our local multiverse.
The system’s potential as an exploitable resource for the Information and Library Services department was immediately clear. Under the leadership of our department head, Dr McTavish, we therefore made a successful funding application to the University. In this, we drew attention to the fact that the IPQC was the first system in existence capable of underpinning a real-life PCL. We additionally noted that such a library was more than simply a curiosity, since it would permit us to:
– Reduce software licencing costs. The full set of all possible software packages represented in the PCL, McTavish suggested, must include all software currently employed within the university and all possible equivalent packages and upgrades. We therefore proposed that the University identify and adopt alternative software packages drawn from within the PCL, phasing these into general use over the next three financial years. This was projected to save the University a significant sum of money in software licencing fees.
– Eliminate academic journal subscriptions. Despite the considerable savings achieved over the last decade by the University’s librarians, our academic journal subscriptions draw on increasingly scarce financial resources. Whilst the copyright implications of automatically generated article duplicates are not entirely clear, no legal precedent could be found that prohibits their use.
For more stories like this before they appear on the site, check out our Patreon!
– Replace our existing institutional repository installation. Since this library would contain all possible papers ever to be published by all staff members, it would avoid any need for manual deposit of preprints by the authors, thus representing a significant cost and time saving on the part of our librarians. Furthermore, since academics and researchers could simply search the PCL for their future works, then submit them to relevant journals, it would also free considerable time for staff development and administrative activities. This time saving would permit us to provide better value for money to research funders, as well as increasing research impact and staff satisfaction.
McTavish also noted that the PCL could also, in principle, be used to identify and preemptively file patents, to beat others to publication, to preview unpublished papers, and so on. It was agreed with the University that, following the initial implementation phase, the relevant ethical issues would be explored by a focus group led by the University’s School of Law.
Method and implementation
Using the IPQC, an entirely functional simulation of an infinite set of singes dactylographes was readily accomplished. Each of the resulting documents were stored across the multiverse in locally available storage. At this stage, in principle, our work had fulfilled the specifications laid down in the University’s grant.
Following the completion of the document generation phase, it rapidly became clear to us that the storage of digital information is not the key challenge. Although necessary, the storage of an infinite set of documents is not sufficient. For usability and accessibility reasons, it is also necessary to support practical methods of searching for and accessing information.
Ideally, I hoped that we might develop application programming interfaces able to provide local developers with access to the PCL, so that the Complete Library service could be queried via the University intranet. To do this, we would first need to develop an infinitely parallel search algorithm capable of operating within infinite-dimensional Hilbert space. This was a daunting task.
However, Professor Whitloaf (Chair of studia generalia) suggested to me that there was no need to manually generate this algorithm. Since the necessary algorithm was itself textual, it would already be available somewhere in our PCL dataset. Therefore, we just had to create a bootstrap algorithm capable of identifying and extracting valid sorting algorithms, using straightforward unit testing of all materials within the Library. We could then search the Library using an algorithm retrieved directly from the Library itself. This example of “eating our own dogfood” seemed a parsimonious and efficient solution, which struck the team as both elegant and achievable. We therefore designed and ran a bootstrap algorithm, which would identify, extract and list compatible sorting algorithms from the Library’s dataset.
Results and discussion
An optimal sorting and indexing algorithm was successfully retrieved within milliseconds. I provide a listing for this algorithm, which according to our analysts is suitable for use on an infinite-dimensional Hilbert space (see Listing A). We then received another two hundred and fifty thousand equivalent algorithms within the next minute, a rate that has subsequently continued unabated, forcing us to conclude that our bootstrap software contains a bug.
Dr Leona Butler, a postdoctoral researcher who conducted a thorough analysis of the returned data, observed that the set of valid indexing algorithms would itself be infinitely large. We were disturbed by this finding. Although the IPQC offers infinitely parallel processing capacity, meaning that it took very little time to unit-test appropriately sized objects within the system, the finite rate of intra-universe input-output serialisation remains a significant bottleneck. Specifically, bandwidth limitations resulting from the present high-energy particle physics uplink implementation limit input-output capacity to under a hundred megabits per second on average. Unfortunately, we had not thought to build a system interrupt into the bootstrap algorithm, so we are unable to halt the ongoing listing process.
We have successfully developed a Provably Complete Library facility and tested it by retrieving algorithms suitable for information indexing and retrieval. However, we are unable to implement the resulting algorithms on the University’s IPQC, as it is now projected to continue broadcasting an infinitely large number of subtle variants of this indexing algorithm until the end of time. We have proposed that a second IPQC be funded, but as we are aware of only one suitable local multidimensional Hilbert space in which to compute, this option is entirely impractical unless someone figures out how to press the reset button on the multiverse.
Although I have not been able to make the PCL available to university faculty or students, I propose that our result be viewed as a concrete and successful contribution to a separate field, notably, the Search for Extraterrestrial Intelligence (SETI) programme. Any future species within range of the local multiverse with access to an adequate level of technology, and which chooses to connect to and analyse the output of the computational Hilbert space that our realities share, will discover the eternal beacon that the human race has left behind.
Very little future work is now conceivable in this domain. It is a shame, of course, that our first and only information-retrieval operation on this unique and irreplaceable resource has dropped it into an irretrievably infinite loop. Still, as Professor Whitloaf says, it could’ve been worse. At least we got it working. Once.
This work was carried out with the support of the University and the European Union. A full list of relevant grants is published in Appendix 2.
Vera Cardigan received a Master’s in Library and Information Science from the University of the West of Peterborough and is a member of the International Library Society’s Experimental Librarianship Interest Group. She is employed as a senior librarian at St Alexander’s University College, formerly Oxford Agricultural Polytechnic.
Emma Tonkin is an engineer with a PhD in computer science and a lingering fascination with classical studies. She is employed in a research project in the sub-basement of a University building. She likes to write fiction and sometimes even manages it on purpose.
Dawn Vogel has been published as a short fiction author and an editor of both fiction and non-fiction. Although art is not her strongest suit, she’s happy to contribute occasional art to Mad Scientist Journal. By day, she edits reports for historians and archaeologists. In her alleged spare time, she runs a craft business and tries to find time for writing. She lives in Seattle with her awesome husband (and fellow author), Jeremy Zimmerman, and their herd of cats. Visit her website at http://historythatneverwas.com.