today, work in this direction is mainly advanced by J. Bengtson. However, no EL is currently available for Na-Dene, and further research is needed to verify this hypothesis. Occasional attempts to broaden the scope of the family by including extinct languages of Eurasia such as Sumerian or Etruscan, as well as other language families in the Americas (e. g. Salishan), have not been successful so far.

Eurasiatic In the 1960s V. M. Illich-Svitych presented a set of arguments in favour of a genetic relationship [ among several families of Eurasia and North Africa;

the proposed superfamily was called Nostratic ( Il ). Evidence in favour of Nostratic includes regular correspondences between ] lich-Svitych 1971 daughter proto-languages, large etymological lists that rely heavily on the basic lexicon, and a small in ventory of reconstructed grammatical morphemes. Several decades later S. Starostin has proposed that it is better to talk about two coordinate superfamilies: Eurasiatic and Afroasiatic (for the former the old name Nostratic is sometimes retained, to avoid confusion with Greenbergs Eurasiatic a similar classification proposal, although based mainly on the mass comparison method, that excludes Dravid ian and Kartvelian). The EHL etymological network includes several databases of Eurasiatic languages:

the Altaic comparative dictionary, published in 2003;

its electronic version contains the proto language database (2805 entries) supported by five subordinate reconstructions: Turkic, Mon golian, Tungus-Manchu, Korean, and Japanese;

[ ] the Eskimo database (1774 entries);

according to O. Mudrak Mudrak 1984, Eskimo forms part of Eurasiatic18 and within it is particularly close to Altaic. It should be mentioned, how ever, that no database on Aleut is available so far;

a detailed database on comparative Dravidian, prepared by G. Starostin (2211 entries), based [ ] primarily on etymological data available in Burrow Emeneau 1984, but incorporating a modified variant of the traditional reconstruction;

the Indo-European database by S. Nikolayev is arguably the largest collection of proto-mor phemes of this well-studied family (3178 entries). Two subordinate databases Germanic and Baltic were also compiled by the same scholar;

[ the Uralic database (1898 entries) is loosely based on K. Rdeis etymological dictionary Rdei ] 1988 with significant additions from a team of Uralic linguistics scholars working in Moscow. No subordinate databases are available so far, but the daughter families are undergoing intensive study;

([ the Kartvelian database (1310 entries) is based on G. Klimovs etymological dictionary Klimov ]) 1998 with some additions by S. Starostin that reflect his study of the external contacts of this family.

The Eurasiatic origin of the Eskaleut languages was also proposed by J. Greenberg.

Murray Gell-Mann, Ilia Peros, George Sarosn No other B- or A-type families can be demonstrated to belong to Eurasiatic (one possible hy pothesis suggests that the Chukchi-Kamchatkan languages also form parts of it, but it is also possi ble that in reality they belong to a different superfamily;

the whole issue needs very serious work).

The combined Eurasiatic database brings together reliable etymologies both from already published sources and those newly discovered by members of the EHL team (altogether 2077 entries of vary ing quality, but including core ones).

The age of Eurasiatic and its major descendants may be crudely estimated as:

Eurasiatic 12 KYA Dravidian 5 KYA Uralic 6 KYA Altaic 8 KYA Eskimo 2 KYA Indo-European 7 KYA Kartvelian 4 3 KYA Afroasiatic Afroasiatic is arguably the only superfamily that is generally recognized by mainstream lin guistics, despite (or, perhaps, due to) the fact that its languages are actually far less studied than those of Eurasiatic. It is formed by five daughter families (Semitic, Berber, Chadic, Cushitic, and Omotic) and one isolate (Egyptian). None of these families except for Semitic can boast a compre hensive etymological dictionary, and even for Semitic with its lengthy tradition of study, existing [ ] [ ELs are either outdated Cohen 1974 or, as of now, still incomplete Militarev Kogan 2000 ] 2003. The situation with other branches of Afroasiatic is even less fortunate.

As a result, daughter families are represented differently in the Afroasiatic network of EHL.

The Semitic database (compiled by A. Militarev) contains 2852 etymologies, each found in at least two languages of the family. Next to it is the Egyptian one (also compiled by Militarev) with entries. Other databases have about several hundred etymologies each.

The main Afroasiatic database consists of 3212 etymologies of varying quality. Its compilers, A. Militarev and O. Stolbova, brought together acceptable published etymologies (from dictionaries ) and then enriched it with hundreds of newly discovered ones.

[ ] such as Orel Stolbova Further research on the family in general is tightly connected with improving existing reconstructions for Cushitic, Omotic, and Chadic.

The ages of Afroasiatic and its descendants are roughly estimated as:

Afroasiatic 12 KYA Cushitic 9 KYA Semitic 7 KYA Omotic 7 KYA Chadic 7 KYA Berber 3 KYA Austric The existence of the Austric superfamily is less clearly shown than that of the others. In contrast to Eurasiatic, Sino-Caucasian, and Afroasiatic, no detailed proto-Austric glossaries or equally detailed tables of correspondences between the various daughter branches of Austric have been produced. The number of these proposed daughter branches is four, and they are grouped in pairs: Austronesian / Tai-Kadai and Austroasiatic / Miao-Yao19. Until recently none of these daughter families was repre sented by a comprehensive etymological dictionary. In terms of known etymologies Austronesian fares significantly better than the others, reflecting a fairly long tradition of scholarship. The work, however, is mostly on languages of Western Indonesia, the Philippines, and parts of Eastern Oceania.

Other territories are underrepresented, which often makes it difficult to attribute an etymology to the appropriate chronological level. An EHL Austronesian database is still under construction.

The situation with Tai-Kadai is slightly different. A representative collection of etymologies for [ ] one of its principal branches, Zhuang-Tai, has been published Li 1977 and is available (with a few modifications) on the EHL network (1329 entries). A limited number of common Tai-Kadai forms ) are also available electronically.

[ ] (taken mainly from Peiros The Austroasiatic family is represented much better. A preliminary version of its comparative dictionary by I. Peiros contains 2457 entries, identified mainly through comparison of ten lower level families that include about all of Austroasiatic20.

[ ] See Peiros 1998.

Etymologies from Austroasiatic Etymological Dictionary by H. Shorto have not yet been included.

Distant Language Relationship: The Current Perspective For Miao-Yao no etymological dictionaries or convincing phonological reconstructions are available;

as a result, no corresponding database is found in the EHL network either. All the Miao [ ] Yao proto-forms used in Austric comparisons are taken from Peiros 1998.

A systematic search for Austric etymologies has not yet been conducted. In 2004 2005 S. Sta rostin and I. Peiros collected some lexical similarities in support of Austric, included in the highly provisional Austric database. The 900 proposed etymologies represent mainly lexical similarities between Austroasiatic and Austronesian, although they are actually supported by basic (relatively simple) phonological correspondences. Thus, we can say that our knowledge of Austric is much more limited than that of other superfamilies;

however, the discovery of a number of comparanda that fit the core etymology requirements makes the Austric hypothesis quite plausible.

The approximate age of Austric and its descendants may be estimated thus:

Austric 10 KYA Austronesian 5 KYA21 Austroasiatic 7 KYA Tai-Kadai 5 KYA Miao-Yao 4 KYA Borean?

The EHL network of databases presents what we consider extremely strong evidence in support of the four superfamilies discussed above. Two of them Sino-Caucasian and Eurasiatic are based on reconstructions performed in full accordance with the comparative method, with regular phonetic cor respondences established between reconstructed intermediate proto-languages. Knowledge of the other two Afroasiatic and Austric has not so far reached the same level of confidence;

however, since it is possible to discuss the two theories in terms of regular phonetic correspondences rather than mere similarities, both can be considered scientific, and further work on them is promising.

Since Afroasiatic and Austric are still to be reconstructed and the other two superfamilies are still in need of serious improvements, it is not yet possible to apply strict comparative methods of investigation to even deeper chronological levels. However, the obstacles here are technical rather than theoretical. The widespread idea that comparative research has an impassable threshold of about 10 KYA (such a period of time is claimed to be enough to make related languages lose all relevant similarity) does not take into account the fact that the main objects of research in this case are not modern languages, but reconstructed proto-languages which turn out to be more similar to one another than their modern day descendants. Thus, solid reconstructions for C-type families like Eurasiatic, Sino-Caucasian, and others should eventually help22.

At the present time it is possible to discuss such ultra-deep relationships only on a very specula tive level. Numerous morphemic similarities between various language families of Eurasia have al ready been spotted in the past as potential indication of such a relationship;

many, if not most, of these similar forms (traced back to high level reconstructions) were compiled by S. Starostin into a special database and later supplemented by some of his own findings. Since such morphemic comparisons are rather numerous (several hundred at least), chance resemblances are not very probable, and a Borean super-superfamily hypothesis, open to bona fide discussion, has been formulated23. Statistical analy sis of attested similarities shows that if such a taxon really exists, its initial division was as follows:

(i) Eurasiatic and Afroasiatic ( Illich-Svitychs Nostratic);

(ij) Sino-Caucasian (iij) Austric.

The estimated age of Borean would be around 15 17 KYA.

Our calculations exclude Austronesian languages of New Guinea and some surrounding islands.

It should be noted that the amount of information recoverable for protolanguage states always remains in direct proportion to the number of languages used for comparison. Since lexical loss and replacement, in most cases, occurs independently in daughter languages, the probability of any given morpheme disappearing without a trace in three of them is less than in only two, and so on. This ensures that, given a sufficient number of languages or language branches, the morphemic inventory of the reconstructed ancestral language will be just as large (sometimes even larger!) than that of any single one of its descendants. Such is the case for commonly accepted families like Indo-European, Turkic, Semitic, etc., and there is little reason to doubt that a different situation has to be proposed for Type C families.

The term was originally used by H. Fleming for a somewhat different linguistic entity [Fleming 1991].

Murray Gell-Mann, Ilia Peros, George Sarosn Other families The Borean hypothesis currently links together the 4 superfamilies described above. At the same time, the real scope of Borean remains unknown, since we still lack deep level reconstructions of many families in the American, African, and Indo-Pacific, and Australia. This means that we have no certain means of verifying whether some of these families can also form part of the hypothetical Borean.

So far, preliminary research has been carried out for Khoisan in South Africa, resulting in a [ ] classification and a set of provisional reconstructions by G. Starostin 2003, 2008, whose Kho isan databases are already incorporated into the EHL etymological network. Comparison with the Borean data has not produced any conclusive results, suggesting that Khoisan, at least, cannot be included in Borean, although genetic connections on an even deeper level might be possible.

The preliminary lexicostatistical study of Sub-Saharan African languages identifies at least other superfamiles: Niger-Congo (not quite identical to the Niger-Congo super-family proposed by J. Greenberg), East Sudanic, Central Sudanic, and Kordofanian, plus a number of smaller branches whose position is so far very unclear (such as Songhay or Atlantic languages). We still need to in vestigate how these superfamilies are connected to each other and to Borean. This can be done, however, only through an extensive etymological study of available data.

The situation with native languages of the Americas is different, and still far from being re solved. According to J. Greenberg, they can all be classified into two small stocks (Eskimo-Aleut, Na-Dene) and one huge super-family Amerind. We have significant evidence that the Eskimo Aleut family is part of Eurasiatic, while Na-Dene seems to be related to Sino-Caucasian languages (although in the latter case a completely convincing demonstration is still lacking).

The status of the Amerind proposal remains unclear. The main source for lexical lookalikes between these languages remains J. Greenbergs monograph. We have already mentioned that it was and still is heavily criticized, sometimes for good reasons;

this, however, does not eliminate the problem itself why exactly are there so many lexical similarities found not just between some of the proposed daughter families, but frequently between all the families of the hypothetical Amerind superfamily? Even if we rule out everything that is non-core (i. e. forms with scarce distribution, far-flung semantics, c.), ex plaining all the rest away as chance resemblances would simply be closing our eyes on the problem. This leaves us with two options: either the similarities are of common genetic origin, or they result from inten sive language contact, which is a less probable option given the peculiarities of their dispersal.

A very preliminary proposal based on EHL exploratory studies suggests that in the Americas one can find at least three types of grouping:

(i) The Almosan superfamily (Algic, Salishan, Wakashan, and some other languages) might be related to Chukchee and Nivkh languages of North Asia, forming a so-called Beringian super family;

connections with Borean have been noticed as well24.

(ij) A number of families (Penutian, Hokan, Mayan, Mixe-Zoque, Maipuran, Pano-Tukanoan, c.) presumably form a different superfamily, also with resemblances to Borean25.

(iij) At the same time, we were not able to detect any external relations for such well-estab lished families as Siouan, Gulf, or Otomanguean.

Distant relationships among the Papuan (= Non-Austronesian) languages of New Guinea and abo riginal languages of Australia remain to be investigated. It is possible that in that region we could distin guish up to 4 6 superfamilies (Trans-New-Guinea, Australian, East Papuan, c.), none being properly reconstructed. Some lexical similarities have also been spotted between Trans-New-Guinea morphemes and some of the alleged 'Borean' roots, but these remain too scarce to establish a firm connection.

7. The bottleneck scenario.

Anatomically modern human beings seem to have evolved in Africa around two hundred thousand ([ ]) years ago Alemseged, Coppens, Geraads 2002. It is not known for certain when they acquired language of the modern type. The second wave of migration of anatomically modern human beings out of Africa, according to genetic and archaeological data, seems to have taken place around 60 to 50 KYA.

Compared with the previous known wave (which occurred of the order of a hundred and thirty KYA and is known to have reached as far as Palestine) it was definitely more successful, populating Eurasia and the Indo-Pacific region, including Australia. In Western Europe it gave rise to the Aurignacian culture, Proposed by S. Nikolaev, who is now working on a detailed justification.

Proposed by S. Nikolaev and I. Peiros in their survey of existing linguistic classifications.

Distant Language Relationship: The Current Perspective with its remarkable paintings, engravings, and sculpture in addition to the widespread Upper Paleolithic tools, suggesting that the human beings of the second wave were behaviorally as well as anatomically modern. That makes it likely that they already possessed a language or languages of the modern type.

For tens of thousands of years afterwards, the usual kinds of linguistic transformation presuma bly took place, producing daughter languages, which themselves gave rise to daughter languages, and so forth. In that way a considerable degree of linguistic diversity would have been achieved.

However, that amount of diversity need not necessarily be reflected in the diversity of attested lan guages. Instead, we may be dealing here with a bottleneck effect, in which a great many lan guages (but not necessary all of them) descend from a single ancestor.

The climatic changes near the height of the last Ice Age some twenty thousand years ago shrank drastically the territories suitable for human habitation, with ice caps and deserts occupying a large frac tion of the land mass. We may picture the human beings of that time confined to refugia often separated by hostile areas. Under those conditions linguistic diversity could have been greatly reduced and it may therefore be the case that all or most of the languages of subsequent times are descended from a single ancestor, the tongue of a particular refugium. If the similarities of attested languages are found to suggest a common origin for all or most of them, that origin could well be a speech that survived the height of the Ice Age when most others did not. With the improvement of climatic conditions, humans began to move out of their refugia, colonizing territories previously unsuitable for permanent occupation. This led to growth and subsequent division of their communities, resulting in the development of new languages.

This is a different story from that of monogenesis, according to which the latest common ancestor of all or most attested languages would be the earliest human language of the modern type. According to the bottleneck scheme, by contrast, all or most of the diversity of attested languages would have devel oped over some twenty thousand instead of fifty thousand years, making it somewhat more plausible that one might discover evidence of common descent if such common descent is actually correct. In addition, the bottleneck idea allows the age of modern language to be pushed back to any time between fifty thou sand years ago and two hundred thousand years ago, when anatomically modern humans appeared.

Suppose it is true, as mentioned above, that an enlarged Borean taxon embraces most of the Afri can superfamilies as well as the Amerind languages, then the age of such a supersuperfamily would agree very roughly with the 20 000 years we attribute to the bottleneck. If some or all of the global roots, found across most of the worlds super-families26, are genuine, then they could relate to such a time horizon. Another feature of the bottleneck scheme with a protolanguage less than 20 years old is that the migrations of the speakers of the descendant languages (such as Eurasiatic, Afroa siatic, c.) need have nothing to do with the out of Africa notion which refers to much earlier times.

The phenomenon of linguistic bottleneck is encountered on smaller scales, as in the case of the Australian aboriginal languages. It is believed that humans reached the continent at least 40 KYA. Yet it is generally agreed that all or nearly all of the attested Australian languages form a single superfa mily and the similarities among them make it extremely difficult to believe that the proto-language of that superfamily could be older than twelve thousand years or so27. It seems that a single language or the descendants of a single language spread over all or nearly all of Australia at that comparatively re cent date. It could have been a local language of Australia or of New Guinea;

if we had to guess, we would probably choose the latter, on the basis of some lexical similarities. In any case, one can search for genetic or cultural traits that might have been introduced along with the language in question.

A very familiar example of a bottleneck is the domination of Europe by Indo-European languages.

Here most of the currently spoken European languages, with but a few exceptions like Basque (whose native speakers constitute a small linguo-geographic refugium), are descended from a single language spoken, probably in Southern Russia, some six or seven thousand years ago. Since we are used to that idea, we should be able to entertain the possibility of bottlenecks having occurred on much wider scales.

For more on global roots, see e. g. [Ruhlen 1994]. Many of the particular connections proposed by Ruhlen, as well as other researchers, have been heavily and often justly criticized (e. g. by D. Ringe, L. Campbell, and others), yet a decisive statistical demonstration that would once and for all reject all of the accumulated evidence as non-evidence is still missing. We apply the same cautionary approach to the issue of global roots as we do towards Greenbergs Amerind and all the other theories arrived at through the mass comparison method: all of the comparative data obtained that way are valid as material for further research, to be gradually accepted or discarded as the families in question are subjected to standard comparative analysis.

I. Peros, in his unpublished lexicostatistical study of aboriginal Australian languages, estimates the age of Proto-Australian as constituting around 10,000 to 12,000 years.

Murray Gell-Mann, Ilia Peros, George Sarosn 8. Conclusion The data accumulated within the EHL etymological network vary in quality and convincing force. The program, however, manages to collate the results of half a centurys research on distant language relationships (so far, mostly within Eurasia) with results recently achieved by the EHL team which has the benefits of better data access, methodology that is modernized (but still firmly rooted in tradition), and computer handling of data. Much remains to be done, but even now certain prehistoric scenarios of linguistic development within the last 20 000 years can be drawn up, with the most probable one looking as follows:

(i) At the height of the Ice Age humans were forced to take refuge in one or several zones suitable for survival, causing a decrease of the linguistic diversity that presumably existed before.

(ij) With the improvement of climatic conditions, humans began to move out of their refugia, colo nizing territories previously unsuitable for permanent occupation. This led to growth and sub sequent division of their communities, resulting in the development of new languages.

(iij) In the process of spreading various linguistic groups suffered different fates;

some disappeared with or without any traces, while others expanded, spreading their languages over vast territo ries or shifting from one language to another28.

(iv) One of the most succesful survivors of the Ice Age may be a hypothetical Borean super-superfam ily, whose age is estimated as 15 17 KYA. Inconclusive, but significant evidence for Borean is provided by preliminary comparison of four super-families, the historical reality of which is no longer questioned by EHL members: Eurasiatic, Afroasiatic, Sino-Caucasian and Austric.

(v) Preliminary data also indicate possible connections between Borean and some superfamilies of Africa, America, and the Indo-Pacific region, not included in the four superfamilies mentioned above. Further research into distant relationships of languages is needed to find out whether these additional superfamilies are related to 'Borean' on a higher level or are hitherto unidenti fied branches of 'Borean'. The question of the original 'Borean' homeland also remains open.

Further research into distant relationships of languages is needed to find out whether there are other Ice Age survivors that are related to Borean on a higher level or turn out to be potentially undiscovered members of the family itself. The question of the original Borean homeland also remains open.

Ac knowl e dgme nt s We would like to thank all participants of the EHL program for providing data and sharing their knowledge with us. We also gratefully acknowledge the generous support for the EHL project on the part of Mr. Jerry Murdock and Bryan J. and June B. Zwan Foundation.

, 10 000 , . , , . , .

