Speaker
Description
Large language models have changed the way we think about language and are fueling the next industrial revolution. However, during their evolution, the focus quickly shifted from language to data. In this talk, I will briefly summarise how this change impacted linguistics, linguists, and other representatives of related fields in the humanities. Theoretically, the number of word forms per language is not that high, still dealing with them in their raw form has its challenges. However, in the field of natural language processing and corpus linguistics, the noise caught up when collecting real-life texts into corpora also needs to be managed not to mention the multilingual environments and non-standard language use. Since the embedding and vector representation of words are still inexplicable within the accepted linguistic frameworks, linguists have been averse to these methods. On the other hand, the other areas of the humanities have begun to benefit from LLMs, as some of the necessary text processing methods that had previously not performed well enough have been significantly improved with the advent of even bigger and smarter language models. This has led to a decline in interest in some widely used standard tasks related to linguistics and presumably will force even the most conservative disciplines of humanities scholars to revise their practices. I will give some examples of how this process is going and what problems have arisen along the way.