Opinion Mining & Big Data
Opinion Mining & Big Data is a partnership with the University which undertakes to develop applications for complex processing of large texts and other data formats. It addresses the students and postgraduate students of the Faculty of Automatic Control and Computers, Faculty of Mathematics and Computer Science, as well as students attending other faculties who are interested in the suggested research topics. The results obtained during the programme are in fact the grades awarded to them for their Bachelor’s degree or Master’s dissertation papers.
The main research topics are related to extracting opinions from Romanian texts, automatic detection of declarations and press releases made by individual persons or companies, as well as to processing public data issued by central or local authorities (e.g. data from the Official Journal, data on public procurement and other data available on the www.data.gov.ro platform).
Within the proposed research topics, a large range of technologies specific to the following areas is being used:
- Natural Language Processing: lemmatization, POS tagging, affective scores, dependency trees, n-gram models etc.
- Information Retrieval: Apache Nutch & Lucene & Solr;
- Machine Learning: Weka, Mallet, clustering (STC, Lingo);
- NoSQL Database; MongoDB, Neo4j.
Publications focused on the topics of this programme:
Florea, I.M., Rebedea, T., Chiru, C.G. Parser de dependenţe pentru limba română realizat pe baza parserelor pentru alte limbi romanice/ Dependency Parser for Romanian based on parsers for other Romance languages. Revista Romana de Interactiune Om-Calculator/ Romanian Man-Computer Interaction Magazine 7(1), 1-20, 2014.
Zamfirescu, A.N., Rebedea, T.E. Identificarea entităţilor, citatelor şi evenimentelor în ştiri şi texte din Web-ul social în limba română./ Identifying the entities, quotations and events in the news and texts from the social Web. Revista Romana de Interactiune Om-Calculator/ Romanian Man-Computer Interaction azine 6(2Mag), 169-192, 2013.
At the moment, we have three major directions regarding the development of applications:
- Automated media monitoring: detection of the mentioned entities, opinions and quotations for texts in Romanian
- Analysis of public data for specific projects:
- Building the Romanian businessmen graph
- Bid analysis in Romania
- Building the conversational agents which should model o historical, scientific or literary figure
- University “Politehnica” of Bucharest – Faculty of Automatic Control and Computers