CHEU-lex is a trilingual parallel and comparable corpus encompassing textual data in German, French and Italian. It comprises three subcorpora (one for each language) which are in turn composed of two datasets: 1) a subcorpus of bilateral agreements entered between Switzerland and the EU from 1972 to 2017, and 2) a subcorpus of federal legislation representing the reception of these agreements. As the reception of the agreements into Federal legislation does not happen simultaneously and requires some variable time, we collected only those agreements that had been implemented at the time of texts’ downloading (January 2020). This means that at the time of corpus compilation, we were able to include only the agreements signed up to December 2017, starting from 1972, when the first agreement between the EU and Switzerland was signed.
In total, CHEU-lex includes 444 bilateral agreements and 348 national legal acts (laws and ordinances).
Table 1. The CHEU-lex corpus
Language | Text type | Texts | Tokens* |
German | Bilateral agreements | 148 | 726 773 |
Federal legislation | 116 | 792 639 | |
French | Bilateral agreements | 148 | 903 247 |
Federal legislation | 116 | 1 081 794 | |
Italian | Bilateral agreements | 148 | 822 414 |
Federal legislation | 116 | 939 847 | |
Total | 792 | 5 266 714 |
*As defined by SketchEngine.
Click here to see a list of all the documents.
Topics
Topics are the different areas covered by the agreements. They can be queried with the advanced search function. Two options are available: Macro-topic and Micro-topic. Categories reproduce the Fedlex’ heading classification as determined by the Swiss Confederation. Below is a list of the macro-topics as defined in the Classified Compilation of Federal legislation (SR) for domestic and international law. For minor topics, please refer to the single macro-topics listed in Fedlex.
Table 2. Macro-topics as listed in Fedlex
Federal law | International law |
1. State - People - Authorities 2. Private law - Administration of civil justice - Enforcement 3. Criminal law - Administration of criminal justice - Execution of sentences 4. Education - Science - Culture 5. National defence 6. Finance 7. Public works - Energy - Transport 8. Health - Employment - Social security 9. Economy - Technical cooperation |
0.1. International law in general 0.2. Private law - Administration of civil justice - Enforcement 0.3. Criminal law – Legal assistance 0.4. Education - Science - Culture 0.5. War and neutrality 0.6. Finance 0.7. Public works - Energy - Transport 0.8. Health - Employment - Social security 0.9. Economy - Technical cooperation |
Legislation can also be queried via text ID. This code is assigned to “revised compilations” of the Official Compilation of Federal Legislation (AS) by the SR platform. The “revised compilation” is listed by subject headings and is regularly updated. The SR also comprises cantonal constitutions currently in force.
Text structure
Texts can be explored as a whole or by sections. The “Body” and “Annex” sections are further split into “subsections” to keep track of their internal structures (i. e., articles and (sub-)annexes).
Table 3. Documents' internal structure
Sections | Subsections |
Title | -- |
Title info | -- |
Preamble | -- |
Body | Article title Article text |
Annex | Annex title Annex text |
Pseudo-XML structure
The figure below illustrates the full pseudo-XML structure applied to the corpus texts. Strikethrough text in green marks text that was edited out during processing.
Figure 1. Example of pseudo-XML structure