projet shtooka

SWAC Metatags

Recent technological developments have made the systematic recording of words and expressions and the creation of language audio collections possible. With specific tools, it is now possible to record 1000 words in less than an hour.

These audio collections can be used for:

  • Linguistic research (record and compare the pronunciation in different regions)
  • Didactics (e.g. «English Irregular Verbs»)
  • Illustration (for electronic dictionaries)

The exchange of audio files has been made much easier by Internet. Files can be copied and downloaded easily. However, since the recordings must be associated with other data (what is the word or expression ? what language is it in?) in order to be indexed or used properly, it is useful to have a standard recording format that contains the associated data. In this way, the audio collections can easily be produced by different software on different platforms and by different people.

We suggest a simple and practical way to associate data to the audio recording, the aim being to show the way to define the data rather than defining the data itself.

The Vorbis Comment metadata system allows you to stock additional data in Ogg Vorbis, Flac and Ogg Speex files. This solution is really adapted to setting up audio word collections. This is an existing, free and widely supported technology. Audio file transfer is easy and, because the associated data is in the audio file already, in the form of metadata in a Vorbis Comment tag, there is no need for additional description.

Below is a proposed list of standard field names with a description of intended use. We recommend adopting the same standard field names for a community that is producing and using audio word collections, on the same principle as the Vorbis Comment field recommendations : for example, ina music collection, you do not have to complete a field that gives the name of the artist but, if you do, you must call the field “ARTIST” and not “BAND” or anything else.

None of these fields are intended to be mandatory, although we believe that no real automated processing can be done without the SWAC_TEXT and SWAC_LANG fields.


1. Data for the pronounced text

Text pronounced by the speaker
  • « house »
  • « it's raining cats and dogs ! »
The language of the word pronounced (ISO 639-3)
« rendezvous »eng
« rendez-vous »fra
« crocodile »eng
« crocodile »fra
Items which allow programs to generate automatically the alphabetical index of the audio collection. The separator is «|» (U+007C)
« house » (eng)house
« It's raining cats and dogs! » (eng)rain|cat|dog
« I am » (eng)be
« 啊 » (chi)ā
« se laver » (fra)laver (se)
« j'ai faim » (fra)avoir|faim
« ett fönster » (swe)fönster
« telefonul » (ron)telefon
When the record is a derivative form of a word, this field indicates the base word
« I was » (eng)to be
« je vais » (fra)aller
« друзей » (rus)друг
When the SWAC_BASEFORM is defined, this field indicates the name of the form
« je vais » (fra)Present. 1p.S.
« друзей » (rus)Gen. Pl.
Name of the referential used by the SWAC_FORM_NAME field (such as LMF codification)
Index which can help the user to differentiate homographs in the audio collection. The SWAC_HOMOGRAPHIDX is based on the grammatical difference between homographs.
« пропа́сть » (rus)verb
« про́пасть » (rus)noun
« os » (fra) /os/sing
« os » (fra) /o/plur
But it can also be a translation into another language (basically in English) or a small explanation if the difference is not of grammatical nature.
« мука́ » (rus)flow
« му́ка » (rus)pain
« bass » (eng)fish
« bass » (eng)music
Name of the referential used by the SWAC_HOMOGRAPHIDX field.

2. Speaker data

Speaker's name
  • « Jacques Durand »
  • « Иван Иванович Иванов »
Speaker's gender [M/F]
  • M: masculine
  • F: feminine
Speaker's year of birth

(Format: YYYY)

Speaker's native speaking language

(ISO 639-3)

Country where the speaker acquired the SWAC_SPEAK_LANG


Region where the speaker acquired the SWAC_SPEAK_LANG
  • « Pays basque »
Location of the SWAC_SPEAK_REGION (format: WGS 84 DM)
  • N 48°52.233 E 2°24.232
General note about the pronunciation of the speaker (for example, a speech defect)
Speaker's living country code


Speaker's living town
  • « Saint-Jean-Pied-de-Port »
Contact data for the speaker
  • « »
Free note about the speaker

3. Word pronunciation data

Note about the intonation
« oh »Surprise
« oh »Realization
  • 1: slow pronunciation for pedagogical use
  • 2: normal pronunciation
  • 3: fast
Comments on the pronunciation of the word by the speaker
« abasourdir » (fra) /ʁ.diʁ/ Academic pronunciation
« abasourdir » (fra) /ʁ.diʁ/ Popular pronunciation
« догово́р » (rus) Standard pronunciation
« до́говор » (rus) Popular pronunciation in the south of Russia
Phonetic transcription (using the international API phonetic alphabet)
Specific phonetic transcription in the language system concerned
« мука » (rus) мука́ (with the diacritic symbol)
« 啊 » (chi) ā (the pinyin transcription)

4. Audio collection data

  • « Base Audio Libre De Mots Français »
Section in the audio collection
Description of the collection
Organization producing the audio collection
URL for data on the organization producing the audio collection
License which applies to the collection
Audio collection copyrights
Audio collection authors
URL for general information on the collection

5. Technical data

Audio Quality [1/2/3/4/5]
  • 1: very poor
  • 2: poor
  • 3: normal
  • 4: good
  • 5: very good
Date of recording

(Format: YYYY-MM-DD)

The program used to record the sound

Note about the Vorbis Comment specifications:

Please consult the Vorbis Comment home page at: for more information on general comment tag specifications.

The content of tags such as TITLE, DESCRIPTION, LICENSE and COPYRIGHT can be set to any value. These fields can be completed automatically using data provided by SWAC fields, although it is recommended that the GENRE field is set to « Speech ».

« Speech »

According to the general Vorbis Comment specifications, the use of additional fields is allowed. This enables SWAC Fields to cohabit with other specific data. For example, electronic dictionaries can use specific fields such as « OMEGAWIKI_ARTICLEIDX » to link audio items to their articles.

Note about the ID3v2 Tagging Format:

Since the availability of the 2.4 version of the ID3 Tagging Format, it is possible to store Unicode character strings in MP3 audio files. We do not recommend the use of this tagging format but SWAC fields can be stored as « TXXX » frames.

Please consult the ID3 Tagging Format home page at for more information.