Some etymological notes on words for invertebrates

Words for insects and other invertebrates are well-known to be etymologically unstable: they are easily replaced, and also susceptible to divergent developments such as irregular sound changes and folk-etymological distortions. Despite of this, a few such words in the Uralic langauges can be traced back to Proto-Uralic. I will discuss four new or rehabilitated etymologies here.

Saami *ću(o)rō– ‘fly’ ~ Khanty *sī̮r ‘fly eggs’ ~ Mansi *sūrǝ ‘parasitic worm, woodworm’

A Proto-Saami word for ‘fly’ is preserved by all Saami languages, although it has not been included in Lehtiranta’s (1989) reconstruction of the Proto-Saami lexicon. The reason for its exclusion may be that western and eastern Saami languages show different variants of the word which do not correspond to each other completely regularly:

  • SaaS tjovrehke ~ tjovrege ~ tjoere (!), U tjuvrake, P tjuruk, L tjurok ~ tjirok (!) (gen. tjuroga ~ tjiroga), N čurot (gen. čuroha) ~ čurrot, čuru (gen. čurroha ~ čurroga) ‘fly’ (< PSaa *ćurōkkē ~ *ćurōk : *ćuruke̮-)
  • SaaI čuáruš (gen. čuárruš) ‘fly eggs’, Sk čuâraš ~ čuâr (gen. čuõr’ru), K čuõraš (gen. čuõrraš) ~ čuõr (gen. čuurrâ), T čïïʹres (gen. čïõrraz) ‘fly’ (< PSaa *ćuorōś : *ćuoruśe̮- ~ *ćuorōj : *ćuoru.e̮- ~ *ćuorēs : *ćuorāse̮-)

The different morphological formations notwithstanding, the only irregularity here is the correspondence between western Saami *u and eastern Saami *uo. This is a minor irregularity, though, considering that we are dealing with an insect name. Presumably the diphthong *uo is original, as there is also a trace of it in the SaaS variant tjoere (< *ćuore̮); this may have originated as a back formation of some kind, as it differs from all other variants in not containing any suffix. On a further note, in several dialects of North Saami the word for ‘fly’ is homonymous with čuru ~ čuoru ~ čurrot ‘unripe cloudberry’, but this seems to result from folk-etymological contamination, because most eastern Saami languages point to an original *o in the first syllable of this word: SaaI čoorooh, Sk čååraǩ, K čorâh (< PSaa *ćorōkkē) ~ T čurag ‘unripe cloudberry’ (< PSaa *ćurōkkē).

In phonological terms the PSaa stem *ćuor(V)- would presuppose PU *će̮rV- or *ćara-. There are completely straightforward comparanda in Khanty which can reflect the latter form:

  • Vj si̮r ‘fly eggs’, Irt sir-wojǝ ‘a large black fly’ (wojǝ ‘animal’) (< PKh *sī̮r)
  • O seri ‘fly eggs’ (< PKh *sī̮r-ī̮)
  • Ni Kaz sĭrǝnt- ‘lay eggs (of flies); get infested with maggots or fly eggs (of meat)’ (< PKh *sī̮r-ǝnt-)

This comparison has indeed been made by Collinder (1977: 30), but abandoned by subsequent research; there are good grounds for rehabilitating it, though. The default reflex of PU *a(–a) is PKh *ā, whereas PKh *ī̮ is the high grade of underlying *ā in the Khanty system of ablaut. Therefore, we can assume that PKh *sī̮r earlier contained a second-syllable vowel which functioned as an umlaut trigger. Although the background of this assumed vowel remains unclear for the time being, there are several parallels for PU *a(–a) developing into PKh *ī̮ despite the absence of an overt umlaut trigger:

  • PU *aŋa- ‘take off, open’ > PKh *ī̮ŋk- (> Irt eŋx-, Ni eŋx-, Kaz εŋx-,O eŋx- ‘untie, unbind, take off (clothes)’)
  • PU *kačka- ‘bite’ > PKh *kī̮č- (> VVj Sur ki̮č-, Irt xeč-, xeš-, Ni Kaz xĭš- ‘hurt, ache; sting (of nettles)’) (Aikio 2014: 7–8).
  • PU *kara- ‘dig’ > PKh *kī̮r- (> Irt xer-, Ni Kaz xĭr-, O xir-) (Aikio 2015: 55).
  • PU *kaδʹa- ‘leave (tr.)’ > PKh *kī̮j- ~ *ki̮j- (> VVj kăj-, Sur ki̮j-, Irt xăj-, Ni xĭj-, Kaz O xăj-); the short vowel which occurs in most varieties is irregular, but the long vowel is preserved throughout Khanty in the derivative *kī̮ć- ‘stay, remain’ (> VVj Sur ki̮ť-, Irt xeť-, Ni xĭś-, Kaz xĭś-, xăś-, O xiś-).
  • PU *ńanča- ‘stretch’ > PKh *ńī̮ṇč- (> VVj Sur ńi̮ṇč-, Irt ńinč-, Ni ńĭš-, Kaz ńĭṇš-, ńĭš-, O ńins-, ńis-)
  • PU *parma ‘gadfly’ > PKh *pī̮rǝm (> Ni Kaz pĭrǝm, O purǝm) (Aikio 2015: 65).

Therefore, a common PU stem *ćara- can be quite straightforwardly reconstructed for the Saami and Khanty word-sets, and there should be no doubt that they are cognate. Collinder also cites further comparanda from Mansi and Samoyed. From the latter branch he adduced Kam šurǝľār, šürijar, and Mat kürär(ä) ‘fly’, which cannot be related, however, as they have been formed from PSam *kür (> Ngan kir ‘fly’, Kam šür ‘worm, maggot’) (Janhunen 1977: 79). From Mansi he cites K ser-wöärǝp, W sēr-wārǝp, N sēri-wārǝp ‘fly’ (< PMs *sīrī-wǟrǝp). The first part of this compound is W sēr, N sēri ‘whte mold (in meat or fish)’ (~ T sēr ~ sēri ‘maggots, insect eggs’ < PMs *sīrī), and it is derived from the verb T K W N sēr- ‘spawn (of fish), lay eggs (of insects)’ (< PMs *sīr-). Although these match the Khanty words in meaning, the front vowel *ī makes cognation phonologically impossible. Instead, the Mansi words could be explained as early borrowings from Western Khanty, where a sound change *ī̮ > *ī took place. Because the regular relfex of PU *a(–a) is PMs *ū, the true inherited reflex of PU *ćara- instead seems to be PMs *sūrǝ > K sūr ‘parasitic worm (in humans or animals)’, W N sūr ‘woodworm’. The meaning s quite close: we only need to assume a straightforward change from the concept of a fly larva (infesting spoiled foodstuffs and the like) to another kind of larva or worm infesting bodies of live animals or rotten wood.

Interestingly, the reconstructed PU stem *ćara- ‘fly’ turns out to be homonymous with *ćara- ‘shit, defecate’ (> MdE śeŕńems, M śarǝndǝms ~ śärǝńďǝms, MariE šoram, W šaram, Hung szarik ‘shit, defecate’; cf. also Hung szar ‘shit (noun)’). It seems very probable that these word-sets are etymologicaly related; the semantic connection could be explained in two ways. First, the word could simply have been motivated by the fact that many flies are attracted to fecal matter; for this reason a word for ‘shit’ can appear as a modifier in compound words such as SaaS bajhke-tjovrege and Fin paska-kärpänen ‘blowfly’ (literally “shit-fly”). Another possibility is that the verb meaning ‘shit, defecate’ became semantically extended to refer to laying of eggs by flies, as has happened in the case of SaaS bejhkedh, L bajkket, N baikit, Sk pâʹškked ‘shit, defecate; lay eggs (of flies)’ (< PSaa *pe̮śkē-). This interpretation would be supported by KhNi Kaz sĭrǝnt- ‘lay eggs (of flies)’ (and also ‘get infested with maggots or fly eggs (of meat)’); this derivative appears to be identical with the aforementioned MdE śeŕńems, M śarǝndǝms ~ śärǝńďǝms ‘shit, defecate’ (< PMd *śarǝnd-).

The Uralic verb *ćara- has been recently explained as a borrowing from Indo-Iranian *ćar(H)- ‘shit, defecate’ (< PIE *ḱerH-); this is only attested in the Avestan noun sairiia- ‘manure’, but a verbal cognate is found in Slavic (Russian срать, Polish srać, Serbo-Croatian srȁti ‘shit, defecate’, etc.) (Holopainen 2019: 324–326). If the etymology is correct, the word belongs among the oldest Indo-Iranian loanwords in Uralic. Previously also a different Khanty reflex of PU *ćara- has been proposed: KhNi Kaz śɔr ‘shit’ (< PKh *ćār). However, as pointed out by Holopainen, this cannot have been inherited from *ćara- because the regular reflex of PU *ć is PKh *s.

Mansi *tāńćǝ ‘worm, earthworm’

PMs *tāńćǝ can be reconstructed on the basis of MsT tańś, K tōńś, W tōńś, N tōńś ~ tuńś ‘worm, eartheworm’. According to UEW the word has a possible cognate in Finnic: Ludic čünǯ ‘angleworm’ and Veps čunz ‘earthworm’. This etymology is obviously false, however. The phonological shapes of these Finnic forms suggest that we are dealing with an expressive word of recent origin; this data does not allow the reconstruction of any Proto-Finnic form, let alone a Proto-Uralic one. Therefore, another etymology should be sought for the Mansi word.

PMs *tāńć would go regularly back to PU *tońći or *sońći. The latter form allows it to be regularly matched with the Samoyed and Khanty words for ‘common lizard’:

  • NenT tanc° ‘common lizard; (dial.) snake’, EnF tasu, EnT taďu, Ngan (Castrén) tansú ‘lamprey’, PSlk *tüśu ~ *tȫśu (Ta tüši̮,Ty čöž, O tȫs), Kam tonzǝ, Mat tanǯV ‘common lizard’ (< PSam *tånsu)
  • VVj sosǝl, Sur săsaʟ, Irt săs, Ni sŏsǝl, Kaz sŏsǝʟ, O săsǝl ‘common lizard’ (< PKh *sasāl ~ *si̮sāl)

It is well-known that this word goes back to Proto-Uralic and has cognates also in more western branches which, however, feature numerous phonological irregularities (cf. SaaN deažžalakkis, Fin sisilisko, MariE šǝŋšalʹe, Komi ćoʒ́ul, Udm keńʒ́alʹi ‘common lizard’). Nevertheless, for the Ob-Ugric and Samoyed words a common proto-form *sońći can be quite regularly reconstructed. The Khanty word contains an opaque derivational element *-(ā)l, which is apparently present in all the more western forms, too. The development PU *ńć > PKh *s is not completely regular, but besides PKh *sasāl ~ *si̮sāl ‘common lizard’ it is attested at least in the following words:

  • PU *kuńći- ‘urinate’ > PKh *kus- (> V Vj Sur kŏs-, Irt Ni Kaz O xŏs-)
  • PU *kVńćV ‘star’ > PKh *kɔ̄s (> VVj kɔs, Sur kos, Irt Ni xus, Kaz xǫs, O xos)
  • PU *peńćä- ‘go numb’ > PKh *pis- (> V Vj Sur Irt pĕs-, Ni Kaz păs-, O pȧ̆s-). The Khanty verb is cognate with MsK pĭńśǝt-ɔw- (pass.) (< PMs *pińćǝt-), W pińśǝml-ɔw-, N pińśaml-awǝ- (pass.) ‘get frostbitten’ (< PMs *pińćǟmlǝ-), and Komi poźav-, KomiJ poʒ́al- ‘go numb’.

Thus, the addition of PMs *tāńćǝ ‘worm’ to the Uralic cognate set for ‘common lizard’ features no phonological problems. Its different meaning is not a problem either, considering that the Nenets cognate has also the dialectal meaning ‘snake’, and the meanings ‘worm’ and ‘snake’ are frequently colexified; cf., e.g., Est uss, Liv ūška, Veps mado, Old Norse ormr, Old English wyrm, Lithuanian kirmis, Tundra Yukaghir čitnej-godʹe, Chukchi kǝmʔǝɬɣǝn, Central Siberian Yupik nemeghyaq ‘snake; worm’. As regards the development of the meaning ‘snake’, cf., e.g., Old English snaca ‘snake; reptile’ and Czech had ‘snake’ ~ Polish gad ‘reptile; scoundrel’ ~ Serbo-Croatian gȁd ‘scoundrel; snake; lizard’ ~ Old Church Slavonian гадъ ‘reptile’.

Previously another Mansi word has been included in the Uralic cognate set for ‘common lizard’: MsN (Upper Lozva) sosla ‘some kind of mythical animal’, (Sosva) sosǝl ~ susǝl ‘some kind of mythical animal; common lizard’. UEW, however, considers it possible that the word was borrowed from Northern Khanty. This is obvious indeed, considering that the distribution in Mansi is limited, the Sosva form shows irregular variation between o and u, and the change *ńć > *s has occurred in Khanty only.

Saami *kikse̮ ‘larva (esp. one that infests skin clothing and foodstuffs)’

This Saami word can be reconstructed on the basis of SaaL N giksa, I kiksâ, Sk ǩihss, T kïkks ‘the black larva of a species of beetle which infests foodstuffs and skin clothing’; in Inari and Skolt Saami the word is also used in the general sense of ‘larva’ or ‘worm’. No etymology has been proposed for the word, but in phonological terms it could be projected back to Pre-PSaa *kīksi. This kind of phonological structure is not possible for a Proto-Uralic simplex stem, but it could have regularly developed from a derivative of the shape *küj-ksi, with the denominal noun suffix *-ksi attached to the consonant stem of PU *küji ‘snake’ (> Fin kyy ‘adder’, MdM kuj, Komi and Udm ki̮j, SlkTa šǖ ‘snake’). As a phonological parallel one can mention SaaS jæjhka, U ihkkuo, I iho, etc. ‘at night’ (< PSaa *ikō); this fossilized adverb goes back to Pre-PSaa *īko < *üj-ko, a prolative form of the noun *üji ‘night’ (> PSaa *ije̮ > SaaS jïjje, U jïjja, I ijjâ, etc.). The different meaning of the Saami word poses no problem for the etymology: as noted in the previous etymology (Mansi *tāńćǝ), ‘snake’, ‘worm’ and ‘larva’ are meanings that are easily interchanged.

Samoyed *sǝ̑jwå ‘worm / botfly larva’

Gusev (2012: 78) regards NenF xæwa ‘worm’ cognate with SlkTa solʹči̮ ‘botfly larva’; we can postulate the Proto-Samoyed form *sǝ̑jwå for them. The comparison is quite compelling: although there is no surface similarity, the developments PSam *s- > NenT x-, PSam *-ǝ̑j- > NenT æ, and PSam *-jw- > PSlk *-lʹć- (> SlkTa -lʹč-) are fully regular. Previously Janhunen (1977: 132) had compared NenF xæwa to Mat simǝrendä ‘snake’ and reconstructed the variant PSam forms *sǝ̑jmå ~ *sǝ̑jwå; Helimski (1997: 340), however, finds the appurtenance of the Mator word questionable because of its irregular m and because -ndä seems to be a participle suffix.

PSam *sǝ̑jwå would go regularly back to PU *ćujwa, *ćulwa or *ćuδʹwa. The first option allows us to connect it with the following cognate set in UEW:

  • MariE šüɣö ~ šüwö, W šǝɣǝ ‘woodboring beetle’ (< PMari *šü̆gǝ ~ *šü̆wǝ)
  • KhVVj soɣ, Sur săɣʷ, Irt sȧw, Ni Kaz sɔw, O săw ‘caterpillar, worm (esp. in plants, leaves, dry wood; not in meat or fish)’ (< PKh *saw)
  • MsK sǝw, (Munkácsi) ‹såu›, ‹jiw-såu› ‘woodworm’ (jiw ‘tree, wood’) (< PMs ?*saw ~ ?*si̮w)
  • Hung szú (acc. szút ~ szuvat) ‘woodworm, woodboring beetle’

UEW further adduces Komi pu-će̮j and Udm pi̮-ćej ‘woodworm’, which are however problematic due to their word-initial ć- instead of ś-, and proposes several alternative proto-forms (*ćuɣV / *ćukV / *śuɣV / *śukV). Nothing, however, seems to oppose the reconstruction of the form *ćujwa presupposed by Samoyed; indeed, the sequence *-uj- also neatly explains the front vowel *ü̆ < *ü in Mari. Semantically the comparison is straightforward; the cognates in Mari and Ugric refer to insects that consume wood, but even this meaning is found also in the NenF compound ṕa-xæwa ‘woodworm’ (cf. ṕa ‘tree, wood’).

4 Responses to Some etymological notes on words for invertebrates

  1. Alexander Savelyev says:

    One can probably add Hill Mari šar ‘aphid’ as a reflex of PU *ćara ‘fly’ that you reconstruct. Cf. also Meadow Mari šar-γeń-ćĕ, Hill Mari šar-γeńə, NW Mari šar-γen-cə ‘louse eggs’; in my view, what we have here is the same PMr *šɔr with diminutive suffixes, and -a- in Meadow Mari is most likely due to contamination with *šar ‘small, fine’ (attested in different Meadow Mari compounds).

    • Ante Aikio says:

      Thanks for the comment! Indeed, Hill Mari šar does fit perfectly as the reflex of *ćara-. But is it really clear that the same stem šar- occurs in *šargeńćǝ ‘louse eggs’ (can *-geń- really be segmented as a suffix)? As you no doubt know, the latter word has also been compared to Finnish saivar, Mordvin *śarkǝ, Permic *śerVl ‘louse eggs’. The correspondences are, of course, quite irregular, but do you then think that the Mari word is not related to any of these other words at all?

      On the other hand, now that I think of it, Mordvin *śarkǝ ‘louse eggs’ could actually be quite straightforwardly etymologized to PU *ćara- + suffix *-kkA (cf. Mordvin *äŕkǝ ‘lake’ < *jäwrä + *-kkA); the Finnic and Permic forms would then have to be of different origin, I suppose.

      I now notice one suggestion I had overlooked in my post: Khanty *sī̮r- and Mansi *sīr- have also been compared to the cognate set including Finnish saivar, etc.; but to me, this seems unlikely for phonological reasons.

      • Alexander Savelyev says:

        It seems obvious that the shape of *šɔrγeńćə is too complex for an underived stem, so either *-γeńćə or *-eńćə should be segmented away. As far as I can see, the sequence *-eńćə in Mari almost never appears after consonants other than *γ, so what we have here looks like *-γeń(ə) + *-ćə, a double diminutive construction. The clearest example of *-γeń(ə) as a diminutive suffix is Hill Mari pi-γeń- ‘*small dog’ as used in the compound piγeń-iγə ‘puppy’.

        I cannot say anything about the Finnic and Permic terms for ‘louse eggs’, but your interpretation of Md. *śarkə looks good to my eye. So, we can probably assume that there was an areal Mari-Mordvin pattern of deriving the terms for ‘louse eggs’ based on *ćara + diminutive suffixes.

        And I still believe it is safe to explain the irregular vocalism of Meadow Mari šarγeńćĕ by contamination with šar ‘*very small, fine’ as found in šar-γü ‘pebble, gravel’, šar-šuδŏ ‘knotgrass, lowgrass’, šar-γož ‘spruce with particularly narrow annual rings’. Meadow Mari šarγeńćĕ is even used metaphorically as a term for small kids (‘малявка, малышня’), according to Большой марийско-русский словарь. Meadow Mari šar is probably of Turkic origin, cf. Meadow Mari šara=βura ‘small things’ and Tat. šara=bara, Chuv. šara=para ‘small things; small kids’.

  2. Olle K says:

    A fun read, thanks! A very minor comment: the Lule Saami forms should have a short second vowel, i.e. tjuruk : tjuruga ~ tjiruk : tjiruga

