Corpus description

CHEU-lex is a trilingual parallel and comparable corpus encompassing textual data in German, French and Italian. It comprises three subcorpora (one for each language) which are in turn composed of two datasets: 1) a subcorpus of bilateral agreements entered between Switzerland and the EU from 1972 to 2017, and 2) a subcorpus of federal legislation representing the reception of these agreements. As the reception of the agreements into Federal legislation does not happen simultaneously and requires some variable time, we collected only those agreements that had been implemented at the time of texts’ downloading (January 2020). This means that at the time of corpus compilation, we were able to include only the agreements signed up to December 2017, starting from 1972, when the first agreement between the EU and Switzerland was signed.

In total, CHEU-lex includes 444 bilateral agreements and 348 national legal acts (laws and ordinances).

Table 1. The CHEU-lex corpus

Language Text type Texts Tokens*
 German   Bilateral agreements 148 726 773
Federal legislation 116 792 639
 French   Bilateral agreements 148 903 247
Federal legislation  116 1 081 794
 Italian   Bilateral agreements 148 822 414
 Federal legislation 116 939 847
  Total 792 5 266 714

*As defined by SketchEngine.

Click here to see a list of all the documents.

Topics

Topics are the different areas covered by the agreements. They can be queried with the advanced search function. Two options are available: Macro-topic and Micro-topic. Categories reproduce the Fedlex’ heading classification as determined by the Swiss Confederation. Below is a list of the macro-topics as defined in the Classified Compilation of Federal legislation (SR) for domestic and international law. For minor topics, please refer to the single macro-topics listed in Fedlex.

Table 2. Macro-topics as listed in Fedlex

Federal law International law
1. State - People - Authorities
2. Private law - Administration of civil justice - Enforcement
3. Criminal law - Administration of criminal justice - Execution of sentences
4. Education - Science - Culture
5. National defence
6. Finance
7. Public works - Energy - Transport
8. Health - Employment - Social security
9. Economy - Technical cooperation
0.1. International law in general
0.2. Private law - Administration of civil justice - Enforcement
0.3. Criminal law – Legal assistance
0.4. Education - Science - Culture
0.5. War and neutrality
0.6. Finance
0.7. Public works - Energy - Transport
0.8. Health - Employment - Social security
0.9. Economy - Technical cooperation

Legislation can also be queried via text ID. This code is assigned to “revised compilations” of the Official Compilation of Federal Legislation (AS) by the SR platform. The “revised compilation” is listed by subject headings and is regularly updated. The SR also comprises cantonal constitutions currently in force.

Text structure

Texts can be explored as a whole or by sections. The “Body” and “Annex” sections are further split into “subsections” to keep track of their internal structures (i. e., articles and (sub-)annexes).

Table 3. Documents' internal structure

Sections Subsections
Title --
Title info --
Preamble --
Body Article title
Article text
Annex Annex title
Annex text

Pseudo-XML structure

The figure below illustrates the full pseudo-XML structure applied to the corpus texts. Strikethrough text in green marks text that was edited out during processing.

Figure 1. Example of pseudo-XML structure