Wartungsarbeiten: Opencast & Podcasts Di 14. Mai 2024 07:00 - 08:00 | Aufgrund von Wartungsarbeiten an den Opencast-Servern werden Ihnen Podcasts und Opencast-Videos nicht zur Verfügung stehen. Kontakt: www.podcast.unibe.ch
Wartungsarbeiten: File-Server So/Mo 26. Mai 22:00 - 27. Mai 07:00 | Aufgrund von Wartungsarbeiten an den File-Servern können in diesem Zeitraum keine Dateien hochgeladen werden. Die Dateien auf ILIAS können aber weiterhin heruntergeladen/angesehen werden.
Symbol Kurs

2024-03-04 - 2024-03-08 Bern Winter School - Natural Language Processing

Natural Language Processing in Muerren

Reiter

Bern Winter School on Natural Language Processing (NLP)
Learn machine learning in the mornings and practise skiing or working in the afternoons.

About
With large language models based on transformer architectures, Natural Language Processing is the latest boom within Machine Learning and Artificial Intelligence. Examples are ChatGPT and similar models. The disruptive potential of these technologies seems limited only by imagination.

Natural Language Processing (NLP) means the production, preparation and analysis of textual data in natural (as opposed to programming) languages. NLP can be used for a variety of purposes, such as the identification of persons, places or dates (so-called Named Entitiy Recognition), the annotation of word forms (so-called Part-of-Speech) or even the analysis of dependencies within sentence constructions (for example subject-verb-object agreement). In computational linguistics, NLP has long been rule-based and language models have been made explicit and formal. In the past approximately five years, solutions which use machine learning based language models, have become generally accepted. Within a very short time, the results of rule-based NLP have not only been achieved, but significantly exceeded. The most recent deep learning language model (Transformer) and the later developed pre-trained systems (e.g. ChatGPT/GPT-3/4 of OpenAI or BERT) allow the production of texts that are not or hardly distinguishable from human written texts.

Within this module we prepare texts in order to train and annotate them with self-created neural networks in a second step. We focus on three levels: 1) the preparation and segmentation of text [pre-processing]; 2) forms of annotation and corresponding evaluation from text [information extraction]; 3) epistemological foundations and theoretical classification [critical text analysis].

The module can be used with texts from the own field/topic area. Ideally, the texts are prepared in txt format and brought to course. In addition, texts are made available to the participants.

Methods
Tokenization of text (depending on language), pre-training of language models (vectorization), use of Jupyter Notebooks/Google Colab [will be communicated at the first day of class] to train neural networks, practical studies on available (or self-constructed) text corpora.

Goal
Application of chosen NLP subtask on text data, evaluation and presentation of the outcome

Learning outcomes
After the course participants can
- Understand NLP and know different methods and applications
- perform basic preprocessing and segmentation of text for NLP purposes
- perform basic information extraction (know forms of annotation and corresponding evaluation)
- perform basic critical text analysis (epistemological foundations and theoretical classification)
- apply neural networks (transformers like GPT and BERT) for NLP tasks
Target Group
- Researchers, PhD students, postdocs
- Industry, administration
- Other interested people
Prerequisites 
- Basic familiarity with Python and Jupyter notebooks
- Own laptop
Methods
  • Theoretical introductions
  • Hands-on tutorials with Jupyter notebooks
  • Project work with presentation
Format
  • About 15-20 hrs presence (online possible)
  • About 30 hours project work (voluntary)
  • Assessment as oral presentation of project work
Certificate
  • A certificate with 2 ECTS will be delivered to participants who have attended the whole training and presented a successful project work
Coaches
- The coaches are local or external experts
Time : 2024-03-04 - 2024-03-08 (afternoons for work, skiing, wellness or whatever) 
Fee students and UNIBE staff: 600 CHF (fee) + 900 CHF (including private room, shared bathroom on the hallway, breakfast, coffee break, lunch bag, dinner, and social program). Additional consumption at the hotel needs to be paid directly upon check-out.
Fee others: 1100 CHF (fee) + 900 CHF (including provate room with shared bathroom on the hallway, breakfast, coffee break, lunch bag, dinner, and social program). Additional consumption at the hotel needs to be paid directly upon check-out.

Language: English
Participants : Max 20
Registration : Mandatory
Responsible : PD Dr. Sigve Haug
Monday (Arrival)
14:00 - 17:00 Machine Learning Review (Ahmad)
17:00 - 19:00 Apéro
19:00 - 20:00 Dinner (Regina)

Tuesday (Ahmad)
08:15 - 09:00 Introduction to NLP tasks
09:00 - 10:15 Different methods & approaches in NLP
10:15 - 10:45 Break
10:45 - 12:30 Word embedding
12:30 - 17:00 Skiing, work or whatever
17:00 - 18:30 NLP Open-source libraries in Python
19:00 - 20:30 Dinner (Regina)

Wednesday (Mykhailo)
08:15 - 09:00 Neural networks for NLP 1
09:00 - 10:15 Neural networks for NLP 2
10:15 - 10:45 Break
10:45 - 12:30 Neural networks for NLP 3
13:10 - 14:30 Curling (meeting point is Regina reception)
17:00 - 18:30 Neural networks for NLP 4
19:00 - 20:30 Dinner (Regina)

Thursday (Sukanya)
08:15 - 09:15 Introduction to Transformer architecture
09:15 - 09:45 Tutorial : Huggingface Pipelines
09:45 - 10:15 Break
10:15 - 11:00 Introduction to Transformer architecture contd.
11:15 - 12:30 Tutorial: Train your own BERT
12:30 - 17:00 Afternoon break
17:00 - 18:00 Text Summarization with Transformers
18:45 - 21:30 Cheese Fondue at restaurant Allmendhubel (meet at the hotel reception Regina at 18:45)
21:30 - 22:00 Sledge ride down to the hotel
22:00 - 02:00 "Bliemli Chäller"

Friday (Sukanya)
08:15 - 09:00 Introduction to Large Language Models
09:00 - 09:45 Tutorial: Try out LLMs
09:45 - 10:15 Break and checkout
10:15 - 11:00 Tutorial on Retrieval Augmented Generation with LLMs
11:15 - 12:00 Project and discussion session
12:00 - 12:15 Wrap up (Sigve)

Project presentation slot Friday April 26 (Zoom and Room H4 102):
10:00 Jule
10:30 Marine, Karthana
11:00 Yosuke
11:30 Genghui
For the 2 ECTS certificate you need to do a project:

Goal: Apply what has been learned in the tutorials to a similar or different task (T) on own or public data (E) and ideally assess the performance (P) of the task solving.

Expected effort: 30 hours

Result: 15 minutes presentation (your notebook optionally with some slides) to be uploaded to Ilias together with the Jupyter notebook or Python script used (Naming convention: surname_1-surname_2-projectname.pdf/ipynb)

Teamwork: Please work and present in teams of two (or three). Exceptionally you can work alone.

Slots for presentations will be agreed upon during the course week. 

Assessment: You will get feedback (15 minutes) right after your presentation. If you have given it a good try (~30h) your project will pass. There is no further grading. The project together with school attendance yield 2 ECTS credit points.
Registration: If you have an ILIAS or AAI account (people affiliated with a Swiss higher education organisation), please login and join the course. For others, please write an email to info.dsl@unibe.ch.
You are free to bring familiy and friends of course (not participating in the school), if there are rooms free. Extra accomodation costs must be paid directly to the hotel upon check-out. 
Cancellation: Editions with less than 15 registrations might be cancelled one month in advance. Cancellation is possible only until February 12th, 2024. No refunds will be made for cancellation received later or for no-shows. Moreover, participants will be charged by the University of Bern for the accommodation costs at Hotel Regina.The notice of cancellation needs to be submitted in written form to info.dsl@unibe.ch
Arrival: Monday 04 of March 2024. School starts at 14:00 and evening dinner is at 19:00.
Depature: Friday 08 of March 2024 noon (if you don't stay longer for your pleasure)

Travel: By public transport 2 hours from Bern (sbb.ch). Muerren is a car free village. You can park in Lauterbruennen or Stechelberg.
Mürren can be reached from the Lauterbrunnen Valley via two connections:
* From Lauterbrunnen by cable car and a mountain railroad via Grütschalp to Mürren BLM.
* From Stechelberg by cable car via Gimmelwald to Mürren Schilthornbahnen LSMS.
Lauterbrunnen is easily accessible by train from Interlaken. The route via Stechelberg is mainly preferred by motorists because of the parking spaces at the Stechelberg valley station. Stechelberg can also be reached from Lauterbrunnen by post bus.
Leasure: Muerren offers spa, outstanding skiing slopes, swimming pool etc. If there is enough snow, there is a about 10 km cross country skiing slope in the Lauterbrünnen valley. Inform yourself:  muerren.swiss/en/winter/
Great textbook for NLP: Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin. 
The book is available online for free: https://web.stanford.edu/~jurafsky/slp3/
Dr. Sukanya Nath

Sukanya has a background in Computational Linguistics and  Data Science. She has a Ph.D. in Computer Science from the University of Neuchâtel. As part of DSL, she applies NLP to different areas of research.

PD Dr. Sigve Haug (overview, school responsible)


Sigve studied physics in Germany, Spain and Norway. He has been involved in neutrino physics experiments and high energy frontier experiments, often with main focus on the computing challenges related to the large and distributed data from these experiments. Today he is coordinating the Data Science Lab at the University. Beyond science he likes philosophical conversations in the evening, sport and friendly people. 

Dr. Mykhailo Vladymyrov

MSc Ahmad Alhineidi

Ahmad has studied Linguistics with focus on Computational Linguistics at the University of Zurich and currently working at the Data Science Lab (DSL).