|Abbreviation: RACJEZ||Load: 30(L)
|Lecturers in charge: ||prof. dr. sc. Mario Essert
doc. dr. sc. Tihomir Žilić
|Lecturers: ||doc. dr. sc. Tihomir Žilić
|Course description: Course objectives:
Gaining knowledge about the natural language and its computer processing, via algorithms and programs for morphology, syntax and semantics, with particular emphasis on extracting information from the documents and the creation of related LLOD data (linguistic linked open data).
To get familiar with Python modules for wordprocessing and to master pattern recognition techniques
To apply mathematical and statistical knowledge in modeling and computer processing of Croatian language.
Enrolment requirements and required entry competences for the course:
Object oriented programming. Recommended elective course: WEB programming
The material treated in the lectures will be illustrated using programs written in Python + SNLTK (natural language tookit) tool, with the help of more advanced Python module for the Croatian language (corpus, morphology, syntax, semantics). The practical instructions will include exploring web2py http: //www.web2py.com MVC framework (modelviewcontroller framework), what will further enable automatic inclusion of all students seminars in the network (Web) environment. Lectures and exercises are obligatory.
Grading and evaluation of student work over the course of instruction and at a final exam:
The exam will consist of the public presentation of a seminar work, which will take place after completion of lectures and exercises. In the essays, students will use part of the program modules from known sources of computer linguistics: http: //www.nltk.org/, http: //www.clips.ua.ac.be/ and http: //scikitlearn.org / stable /. During the semester, students will write 5 homeworks which will replace "traditional" written part of the exam.
Methods of monitoring quality that ensure acquisition of exit competences:
5 homeworks, 2 seminars.
Upon successful completion of the course, students will be able to (learning outcomes):
It is expected that after passing the course the student will:
demonstrate knowledge and understanding which provides the basis for the original development and application of ideas in mathematics and computer science in linguistics;
be able to apply her/his knowledge, understanding and problemsolving skills in a broader context related to linguistics;
be able to integrate new knowledge in the design and modeling of linguistic data;
1. Linguistics. Natural and artificial languages. Syntax and semantics.
2. Word, word forms and POS (part of speech), phrases/uterances, sentences. Gramatics of sentence.
4. Morphosyntactical tagging (lemma tagging, syntactic SPO tagging, PoS part of speach tagging, category annotation)
5. Generative or PS grammar (phrase structure grammar) and Dependency grammar.
6. Semantics. Semantical trees.
7. Wordnet lexical database (PWN, CroWN)
8. Role & sense tagging. Annotation classes.
9. Word valences.
10. Lexicography. Dictionary and thesaurus.
11. Corpus linguistics.
12. Text mining. The classification and clustering documents, probabilistic models.
13. Frequency (TFIDF) and latent semantic analysis (LSA).
14. Ontologies. LLOD Lingustics linked open data
15. Computational model of natural language.
1. Croatian language and programming language Python.
2. Natural language toolkit (NLTK)
3. Using regular expression in the morphology.
4. Categorizing and tagging words
5. Analyzing sentence structure (Context free grammar CFG and Dependency Grammar DG).
6. Analyzing the meaning of sentences.
7. Senses and synonyms. WordNet hierarchy and lexical relations.
8. Training a various (Ngram, Brill, PWN,... ) taggers.
9. Determination the valences of words thru the syntaxsemantic framework.
10. Distributed processing and handling large datasets.
11. Accessing text corpora and lexical resources.
12. Supervised classification. Extracting information from the text.
13. Measuring precision and recall of various classifier.
14. SparQL programming of the Virtuoso server.
15. Modeling linguistic patterns.
|1. ||Mario Essert, Tihomir Žilić: Python za jezikoslovce, e-udžbenik, 2016.
|2. ||Roland Hausser: Foundations of Computational Linguistics: Human-Computer Communication in Natural Language, third edition, Springer, 2014.
|3. ||Steven Bird, Ewan Klein, Edward Loper: Natural Language Processing with Python, http: //www.nltk.org/book/, O"Reilly Media, 2009.
|4. ||Daniel Jurafsky, James H. Martin: Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition, Pearson Education, 2009.