MicroBlog restarted in 2019

January 2019


Presenting at Hora Informaticae (ÚI AV ČR): Deep Neural Networks in Natural Language Processing (slides)


First-time reviewing for an *ACL event (5 papers for NAACL)


Became an "academic research employee" (akademický výzkumný pracovník) VP2A at ÚFAL. Not sure what it means, except that I should now have 8 weeks of holiday instead of 5 :-)

Old Past (2016-2013)

  • 28th May - 2nd June: attending EAMT in Riga (presenting a Chiméra poster, and substitute-presenting a Czech-Vietnamese SMT paper by Ondra Bojar and Tam Hoang)
  • 3rd June: submit something to EMNLP (if there is something -- seems there won't be... :-/)
  • June: getting married :-)
  • 15th July: submit something to COLING (hopefully there is something)
  • August: hopefully going for a 3-week honeymoon to the absolutely fascinating Iceland!
  • 15th-19th September: organizing the second SloNLP (Slovakoczech NLP) workshop at ITAT, and this time I am attending! Intended for students and early-stage researchers of CL and NLP from Slovakia and Czechia, so do attend if you are one of them!
  • Buying a house and moving to Beroun! :-)
  • 27/05/2016 We already have 5 paper abstracts submitted to SloNLP, woohoo! :-) That's infinitly more than at this time last year! Interestingly, it is mostly by reasearchers from other institutes, as our immediate colleagues have some problems to make it till the deadline... So, we decided to extend the deadline by a week! :-)
  • 15/05/2016 Submitted a system description paper to WMT.
  • 06/05/2016 I presented my croSSSynt research at CIG FIT ČVUT (slides). Wow, the session which had been planned to take one hour eventually took more than 2 hours! Mainly because we got stuck right at the beginning with me not being able to explain how MSTParser is trained, and then me again being very impersuasive about what dependency parsing is actually good for... But once we got over that it got much better, we finally got to understand each other and interact, and in the end I believe I got most of the attendees quite interested in our ideas on joint morphosegmentation and morphoalignment (and then potentially computing multilingual morph embedings...)
    Btw. it turns out nobody is a linguist but everybody knows word2vec and deep NNs -- so another thing that deep NNs have done for us is that they finally started bringing computational linguists and machine learning people closer together, because suddenly they speak more or less the same language... I haven't noticed that until now, where everyone woke up once I got to embeddings and the discussion became more lively. I guess this may mean something for near future -- if we could finally cooperate with some people who know something about machine learning, that would be great!
  • 28/04/2016 We have bought a house in Beroun! Moving in in June if everything goes smoothly now!
  • 15/03/2016 I came back to ÚFAL after being away for half a year, and I got a desk for my own as a welcome present, yay! :-) Huge thanks to Vincent, who agreed to give up his share of the desk and moved to the desk just opposite, which had been recently abandonned by Loganathan (who left to work for a Czech company whose name I always forget), and not currently needed by Ivana either, as she is currently on a maternity leave with her little son :-)
  • 05/09/2015-13/03/2016: I temporarily left my beloved home institute, together with my country and my girlfriend, to intern for 6 months at Google Zürich. I was in research, in the group of Srini Narayanan, under the lead of and with great help from Yasemin Altun. In my project, I managed to improve automatic anaphora resolution in conversational search queries (the change was launched just as I was leaving), so if you use your Android phone to do voice search on Google in English, and are brave enough to try queries such as "where is the nearest Subway in Prague" and then e.g. "what are their opening hours", then if you get a good result, it is possibly thanks to me! :-)
  • 20/09/2015: I organized (but did not attend) the first Slovakoczech NLP workshop SloNLP 2015 on ITAT in Slovakia
  • 3-4/09/2015: I was a local organizer of the Deep Machine Translation Workshop 2015, and also presented Translation Model Interpolation for Domain Adaptation in TectoMT there
  • 24-26/08/2015: Presented Multi-source Cross-lingual Delexicalized Parser Transfer: Prague or Stanford? and Targeted Paraphrasing on Deep Syntactic Layer for MT Evaluation on Depling in Uppsala
  • 23/08/2015: Attended the first Universal Dependencies meeting in Uppsala and participated in the Coordination and names and Copula and cleft sentences subgroups (after a great railtrip through Hamburh, København, Malmö, Jönköping and Stockholm)
  • 26-31/07/2016: Presented KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer at ACL in Beijing (and then took 2 weeks holiday to do an amazing self-organized trip through China with my girlfriend)
  • 22-24/07/2015: Presented MSTParser Model Interpolation for Multi-source Delexicalized Transfer at IWPT in Bilbao
  • 27/06/2015 12:18: Submitted camera-ready for Depling (but have not slept for more than 24 hours now...)
  • 26/06/2015 We cordially invite all interested researchers to the 1st Deep Machine Translation Workshop :-)
  • 18/06/2015 My IWPT paper got accepted as well! :-) A successful season this has been :-) Btw. I got 4 reviews for some reason (no idea why, it is just a short paper for a small workshop/conference -- maybe too many reviewers and too few papers? :-)), and, for the first (and second at the same time) time in my life, I have found this comment in one of the reviews: "I have nothing to criticize about this paper.", and then another review: "I don't have any reservations."... Like wow, I thought this does not happen in real life?! :-)) Furthermore, it is not any super cool paper or anything, I only reached the accuracy of another method of mine with a different method that performs really badly in its basic version, but when weighting is added (in the same way as in the previous method, so nothing new here), it magically reaches its score (and I have no really good explanation for that, neither in the paper nor in my head, only some rather vague ideas on why this might be)... I was originally thinking about sending it as a negative result paper (the method really did not meet my expectations), but as I eventually reach the same number with a method that is less computationally demanding, I decided to present it that way (an alternative method with identical accuracy and higher efficiency) and send it as a positive result (although I didn't dare sending it as a long paper)... Well, like why not, but really? :-)) It's fun though :-)
  • 15/06/2015 I have successfully passed the state exam! :-) The comittee was quite pleased with my performance and they unanimously agreed on the result :-)
  • 10/06/2015 The short KLcpos3 paper got accepted to ACL!!! :-)
  • 29/05/2015 Got a paper accepted to Depling! :-) The title is "Multi-source Cross-lingual Delexicalized Parser Transfer: Prague or Stanford?" -- so if you want to know the answer, come to my presentation ;-) And the reviews are nice (but I need to learn how to do significance testing for parsing, as all 3 reviewers agreed that this is missing in the paper -- well, I agree, I just didn't know how to do it at that time and didn't have time to lern it... o:-) )
    Also, another paper of Petra, which I coauthored, got accepted as well! :-) It's "Targeted Paraphrasing on Deep Syntactic Layer for MT Evaluation", and since Petra will be at Google from June to August, maybe I will present both of the papers... Well we'll see about that later...
  • 30/04/2015 Woohoo I got a payrise! :-) Well technically it is not really a pay rise, generally it only means that I get the bonuses throughout the year instead of getting them in June and December... but hey, it looks really nice! :-) (And also, for the first time in my life, I am now above the average wages!)
  • 30/04/2015 Submitted a delex parser transfer paper to ACL short. It is built upon the rejected long paper, but it is shorter, some inconclusive experiments were removed, and I think it's much better than the original long paper, as I got a bunch of useful suggestions from the reviewers and tried to incorporate all of them, and also got some other useful comments from colleagues, plus I did some more experiments... I really hope that it can get through!
  • 27/04/2015 Zvu všechny kolegy na SloNLP, Slovenskočeský NLP workshop při konferenci ITAT, který orgnizuju společně s Petrou!
  • 25/04/2015 ACL long rejected, but it could get through as a short...
  • 22/04/2015 Submitted a delex parser transfer paper to DepLing! We found something rather interesting about the treebank annotation styles in a cross-lingual setting...
    Also a paper on paraphrasing using both a-layer and t-layer (built upon a Czech tecto roundtrip in Treex), written together with Petra.
  • 10/04/2015 It seems we have something interesting for DepLing... :-)
  • 18/03/2015 Woohoo, I got accepted for an internship at Google Zurich, into the group of Srini Narayanan! I should start on 7th September and finish on 11th March next yer. Really looking forward! :-) So now it's time to find some nice accomodation, probably on Airbnb. And to get a map of Switzerland and plan some nice railtrips for the weekends... :-)
  • 12/03/2015 Today was the last lecture of Joakim Nivre's parsing course. It was a great course, I recommend watching the videos to anyone interested in syntactic parsing. Although I am not at all new to parsing, it helped me both to get the big picture and to better understand various details. Moreover, Joakim had many insightful comments regarding common practices in parsing, as well as possible future research directions; there is likely very few similarly appropriate people to answer such questions, so this was very special and in some respect even more valuable than the rest of the course.
  • 10/03/2015 For the first time in my life, I've got wages above the median salary! :-)
  • 28/02/2015 I agreed to an increase of the part-ness of my part-time here from 50% to 80% since February, which also goes together with a corresponding increase of salary of course :-)
  • 28/02/2015 Submitted my delex parser transfer paper to ACL. I think it's quite neat, so I'm hoping to get it through (at least as a short).
  • 27/02/2015 I have just learned that I succeeded at the technical interviews last Tuesday, and there is already even a potential host for me! Seems I might actually get to intern at Google... :-)
  • 12/02/2015 I've achieved +7 BLEU (!!!) in Czech-to-English TectoMT on QTLeap corpus by implementing question word ordering (a huge part of the dataset are questions and they had affirmative word order before that).
  • 01-04/02/2015 I visited Edinburgh for the kick-off meeting of the HimL project (Health in my Language, i.e. providing healthcare information that is now available in English only to Czechs, Germans, Poles and Romanians). A really nice team we have, featuring University of Edinburgh (Barry Hadow, Maria Nadejde and others), Ludwig-Maximilians-Universität München (Alex Fraser et al.), and three companies (Lingea, NHS 24, and Cochrane); and of course us, represented by Ondřej Bojar, Dan Zeman, and myself (and hopefully also Aleš in future).
    Seems Depfix will be revived no matter what, as HimL strongly encourages me to implement Depfix for English to German and/or Polish and/or Romanian. It'll be fun I guess, since we have only one Romanian on the team (Maria) and no Pole at all -- so I finally will most probably have to actually implement the machine-learned version of Depfix... It's been more than 1 year since I last touched that, so probably I'll have to start from scratch with that (but I have learned a bit of Python since, so I should be able to use full-fledged machine learning by SciKit, instead of mucking with Perl machine learning, which is limited at best).
    And of course, revisiting Edinburgh after a year was a pleasure -- I managed to catch up with all I missed last time, including the deep fried Mars bar (pretty bad, but surprisingly not as bad as it sounds!), and sampling many of the Islay whiskies (with Bowmore becoming my new favourite!).
  • 19-23/01/2015 I attended PARSEME 1st Training School, with Joakim Nivre's talks on dependency parsing being my clear winner :-) Now I am planning to attend his syntactic parsing course, and maybe also visit Uppsala in future :-) Notably, DepLing 2015 will take place in Uppsala...
    Although originally I was not part of the organization team, I ended up doing the recording of the lectures; so, thanks to me, you can watch most of the Parseme lectures (except for the ones at the beginning cause it took some time to think up that we want to do recordings, and then to learn how to actually do it... :-)) Sorry for the sound, we just used the built-in microphone of the camera, as we had enough trouble setting up the camera alone...  So after the school was over, I also started a guide to using ÚFAL's camera...
  • 20/01/2015 -- I have become a member of Parseme, a networking project focused on parsing and MWEs, which may enable me to go to some short research visits to other participating countries (most of Europe + Stanford + Haifa). Sounds good :-)
  • 12-14/12/2014 -- Attended TLT in Tübingen (Germany). I was pleasantly surprised by the high level of the conference -- although small, partially local, and with a high acceptance rate, the level of most of the papers is very high. Moreover, the conference is very focused on parsing, which is the topic of my research, so it has probably been the most interesting conference for me so far!
  • 02-06/12/2014 -- Attended QTLeap annual project meeting in Lisbon. It was an exceptionally constructive and productive project meeting! We even managed to create Depfix for English-to-Basque in one afternoon!! (Well, it has 3 rules at the moment and probably only one of them erally does something useful -- but hey, it's there!)
  • 26/11/2014 -- Prezentoval jsem chatbota Pohádkové dítě na Dni otevřených dveří Matfyzu -- můžete se podívat na 4hodinový rozhovor, který tam proběhl (uživatelé se pochopitelně měnili, ale pro Dítě to byl jeden dlouhý rozhovor).
    I presented Pohádkové dítě / Fairytale Child on Day of Open Doors on Matfyz -- I published a 4-hour Czech dialogue that occured there (of course, the users were changing, but for the Child it was just one long dialogue).
  • 20/11/2014 -- It seems we have discovered something new regarding delex parser transfer. Hopefully more on that on ACL! ;-)) (Unless we e.g. find out that it has already been discovered by the McD team or other scientific competitors :-))
  • 14/11/2014 -- I presented TectoMT and Moses (and briefly the whole Chimera) to students of PLIN, which it seems both me and them enjoyed; it is nice to try to add my bit to breaking the barrier between Praha and Brno :-) Also, I got a cap, a mug and a USB flash drive, so thanks guys, it's been a pleasure presenting for you :-)
  • 11/11/2014 -- My citation h-index on Google Scholar has risen to 6, yay! :-)
  • 31/10/2014 -- My TLT paper got rejected; moreover, I had never had such bad reviews before... So OK, I should've had more sanity not to submit a work that is only in the stage of ideas based mainly on my intuition... (Also, it is somehow hard for me to write an "extended abstract" for review, as I always fail at choosing what to include and what to omit... Have to train that a bit.) Sorry for the reviewers who had to spend their precious time reviewing that instead of doing something useful -- I promise I will have more self-control next time. At least the reviewing was double-blind, so unless some of the reviewers come across this post, I do not have to be ashamed when attending conferences...
  • 23/10/2014 -- Me and Jindřich Libovický had a couple of popularization presentations on Gymnázium Kladno, i.e. the school that we both had studied. We weren't exactly successful though -- it seems most of the people neither understood most of what we said, nor have we motivated them to consider studying Matfyz... Next time, we have to be more simple and more fun!
  • 07/10/2014 -- I lead the first lesson of NLP Technology (Zdeněk didn't even show up; hope he does next week). Most of the students seems to know at least a bit of Linux and Bash and they had no problem doing the tasks, but at the same time all of them claimed to have learned something new in the lesson, so it seems the course will run well :-)
  • 26/09/2014 -- I presented the Fairytale Child Chatbot on the ITAT conference in Demänovská Dolina (Slovakia). The transcipt of the fairy tale session from the presentation will be soon posted here.
  • 14-16/09/2014 -- I attended ÚFAL seminar in Prčice, and got a ÚFAL best paper award for HamleDT 2.0: Thirty Dependency Treebanks Stanfordized! :-) Seriously, I think there were better papers this year though, such as the ACL demo by Straka and Straková (Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition).
    Also, I had the honor to chair an interesting round table with the topic Do we need parsing? Turns out we don't really know -- nobody was able to come up with an application of parsing where parsing is necessary, or at least makes things much easier or better... But one interesting outcome was that probably parsing is not useful because it is not very accurate -- accuracy around 85% is often so low that it prevents a meaningful usage of parsing (which, interestingly, is not true for ASR, although it often has an even lower accuracy); and that's still parsing newspaper text, which is often not what we actually need in the application. So, the key thing to do is either to make parsing better :-), or at least to work on adapting parers to real world texts, or, maybe, to modify the task of parsing in such a way that it can be done with a much higher accuracy while retaining the usefulness of the resulting analyses (sounds nice, but we haven't got any further since then).
    We also dicussed parsing of ASR outputs with Filip Jurčíček and David Mareček. We think that the best approach might be to take a Malt-parser-like approach, but to try to produce a sequence of dependency treelets (i.e. a dependency forest) instead of one tree (which does not make as much sense for speech, since our intuition is that people don't speak in nice sentences, but rather produce a continuous flow of words that has some local structure, but long-range dependencies are basically non-existent, and the general structure is basically a linear chain...)
  • 05-13/09/2014 -- I attended The Ninth MT Marathon in Trento (Italy), together with a bunch of ÚFAL colleagues (including my grilfriend); I presented Depfix, a Tool for Automatic Rule-based Post-editing of SMT, amd I helped Aleš with the Moses lab.
  • 14/08/2014 -- My Depfix paper has been accepted to be presented on MT Marathon and published in PBML, hooray! :-)
  • 14/07/2014 -- My ITAT paper has been accepted! :-D It's about a very simple chatbot; unfortunately it is only a console application now and requires Treex to run, but I will make it available on my pages anyway once I submit the camera ready paper. Maybe I could make it require only the CPAN-installable part of Treex, then it would be rather easy to run it for anyone :-)
  • 03/07/2014 -- We had a meeting with MJ, who has been assigned to supervise our ProSŠ propagation project. The plan is to make people interested in what we do and not to scare them too much; for that reason, we intend to make it more or less secret from which faculty we actually are, so that we don't lose their attention before we even start... :-))
  • 23/06/2014 -- 27/06/2014 I attended ACL in Baltimore, together with a few ÚFAL colleagues. I presented our system description poster on WMT (CUNI in WMT14: Chimera Still Awaits Bellerophon -- paper, poster).
    I was particularly impressed by Miloš Stanojević's new MT metric called BEER, which is simple enough and partially language-universal, and at the same time was one of the top performing metrics (and definitely looked the most usable amongst them). Great work!
    I also got the impression that the future of NLP is in distributional semantics and neural networks, both of which are rapidly advancing these days. It kind of makes sense -- we still have not agreed on what a good representation of language is, but theory tells us that whatever it is, it can be expressed by vectors, so why not keep the actual meaning of the representation more or less latent, and simply use some vectors, task-specifically tuned so that they work for whatever application we have (probably the boldest thing I have seen was the idea of creating a vector from an image and then using it to generate a text description of the image -- a very cool idea indeed!). And neural networks constitute a similar approach for the modelling part. I think I should start getting to know these before it is too late...
    Oh and the Baltimore Aquarium is amazing!! And the Railway Museum was very nice and interesting as well! Otherwise, there isn't really much to see or do in Baltimore. But Washington DC is near, and I must say it has become my favourite US city -- finally a city centre full of green and water, where one can stroll around and see nice things, not the skyscraper-conrete-jungle style which I have seen elsewhere and which I hate so much (especially when it is hot and the concrete environment turns into a living hell)! (But still, I prefer Iceland ;-))
  • 19/06/2014 -- We have received the PROGMA grant for propagating our faculty among highschool students! You can look at the proposal (in Czech).
    Dostali jsme grant PROGMA na náš projekt ProSŠ na propagaci fakulty mezi studenty středních škol! Můžete mrknout na naši žádost.
  • 18/06/2014 -- I have just written and submitted my first review of a paper. I originally wanted to reject the paper, but there is still a chance that the approach actually works well and the authors only did not include the relevant experiments because they did not realize that the reported results don't actually show anything... However, I made it clear that the missing experiments must be added and that a second review is needed, so hopefully it does not get accepted if the experiments don't turn out well... (Maybe I should have simply rejected it, but the journal does not seem to be a high quality one anyway...)
  • 26/05/2014 -- 31/05/2014 I attended LREC in Reykjavík, Iceland, together with a number of ÚFAL colleagues. I presented our new HamleDT paper (HamleDT 2.0: Thirty Dependency Treebanks Stanfordized -- paperposter). It got quite a bit of attention, it seems that HamleDT is gaining popularity rather fast :-) Also, it seems that people are stating to understand that a treebank which is not freely available is of a very limited value. Moreover, some people are converting their TBs to Stanford Dependencies or even annotating new TBs directly in SD, so it seems there will finally be a de facto standard for treebank annotation (I hope we helped a bit there). That's simply great news :-) We are definitely interested in adding more TBs to HamleDT (although it is quite a bit of work, so it might take some time), and we are even more interested in obtaining some more permissive licences for the ones we already have so that we can redistribute them freely.
    (I also helped Petra present her poster on Improving Evaluation of English-Czech MT through Paraphrasing -- I became a co-author only because Depfix was used there. And we took part in QTLeap dissemination at the project stand.)
    Btw., Iceland is the best country I have been to, at least from the tourist point of view. Definitely coming back one day! Jón Gnarr had an encouraging (and exceptionally entertaining) talk at the opening session, suggesting that there might probably be some money going into celandic NLP in future -- if so, I will be happy to help! ;-)
  • 21/05/2014 I have just agreed to review a paper for "Proceedings of the National Academy of Sciences, India Section A: Physical Sciences" (I do not do physical sciences, but it seems computational linguistics belong to physical sciences there for some reason). I was actually asked to do the review several days ago, but the e-mails had many signs of being spam and so I just ignored them; only when they wrote today that because I did not reply, they will ask someone else, I noticed that the title of the paper is very related to my work and that it probably was not real spam... Well, the first e-mail (from a trustworthy sender called "em nasa 0 3b2044 4dee3724") was:
    Dear Dr Rudolf Rosa, (I am no doctor)
    You have just been registered (didn't request that) by Prof. Jai Pal Mittal (never heard of), Editor in Chief, as user for Proceedings of the National Academy of Sciences, India (never heard of, + see web) Section A: Physical Sciences. (I'm not in physical sciences)
    Looks like 100% spam to me... And the website of the institute really helps to build trust: http://www.nasi.org.in/ (so nostalgy! very 90s! wow! bgsound broken?)
  • 19/05/2014 WDS talk (presentation to appear)
  • 28/04/2014 We have come up with a funny idea of using flow networks for joint parsing and alignment, so David is currently doing some experiments with that.
  • 23/04/2014 We submitted several H2020 grant project proposals (QTpublic, Caption, HiML).
  • 23/04/2014 I've just realized there is a video of me giving a Deepfix talk on ACL in Sofia last year -- all with the slides that change automatically and the "Switch off the lights" button... cool! :-) A good reason to start a Video lectures subsection on my homepage :-)
  • 22/04/2014 Měl jsem popularizační přednášku pro účastníky České lingvistické olympiády.  Vysvětloval jsem, jak funguje frázový statistický strojový překlad, jaké má problémy, a jak se dají opravovat Depfixem. Věřím, že to mnohé zaujalo, a doufám, že se s některými z nich za pár let potkám na ÚFALu. A fakt mě to bavilo! Asi se zkusím domluvit na Kladně na gymplu, že bych jim tam taky udělal přednášku. Ale nejdřív počkám jestli dostanu od někoho nějaký feedback, abych to dopříště ještě nějak vylepšil, Snad se někdo s něčím ozve, e-mail jsem jim dal.
    Slajdy (PDF).
    I gave a popularization talk about SMT and Depfix for the participants of Czech Linguistics Olympics (in Czech). There was a lot of questions afterwards, so it seems that at least some people followed it and were interested; I hope to meet them at ÚFAL one day. It was funny when a girl got the hinge of it, and asked me: "But it surely is not as simple as that?" Well, it IMHO really is that simple; especially phrase based SMT is even frustratingly simple I dare say. And Depfix is also simple -- once you have a tagger, a parser, and a morphological generator... But hey, there are languages for which all of these are just off-the-shelf tools!  It was real fun for me btw., I think I'll do more of that :-) Although it was rather difficult, talking about SMT and Depfix without using words like MLE or parsing...
    Slides (PDF, in Czech).
  • 18/04/2014 MLFix ACL short paper got rejected -- ACL encourages researchers to send in work in progress as short papers, but MLFix is probably too much in progress to be accepted to ACL. Well, we'll probably try to carry it further and submit it to EMNLP.
  • 01/04/2014 We submitted two system description papers to WMT - one for the Khresmoi medical translation task, and another as a joint description of TectoMT, Moses and Depfix submissions (as there wasn't that much new stuff)
  • 24/03/2014 I got a GAUK for HamleDT! Details on the grant page or on ÚFAL Wiki.
  • 22/03/2014 We submitted HamleDT 2.0: Thirty Dependency Treebanks Stanfordized (camera-ready) to LREC (but the data will be released end May as there are still some improvements to be done). Christopher Manning was so kind to send us their draft of their LREC paper (Universal Stanford Dependencies: A cross-linguistic typology), and so we follow their new USD instead of inventing our own based on their old SD and Google's UDT -- so, we are doing our best to promote convergence of annotation schemes! We are now even thinking about adapting our Prague dependencies to be more similar to USD, because we like some thinks in USD better than in PRG (and also to make the conversions less lossy).
  • 15/03/2014 We went go to Potrati, a puzzle hunt co-oranized by David Mareček, my Master thesis advisor. We were not very successful.
  • 13/03/2014 I submitted a short paper on MLFix to ACL. (I probably managed to prove that trees help for APE, yeay!) MLFix still does not perform as well as Depfix rules, but at least it seems to have a positive effect on the translation quality now. And it does not require the tons of manual work on designing and tuning the rules. Which was like the main point of it.
    Btw. I realized that you should never try to do serious machine learning in Perl. Really. It's probably more efficient to learn Python form scratch and do it in Python than trying to do it in Perl even if you already know Perl well. There are some CPAN modules, but after you manage to find a module that does what you want and get through its API to make it really do something, you will find that it has many shortcomings, such as dying if the data is not separable, being so ridiculously inefficient that 100GB RAM is not enough for it, etc. Plus the code is often so messy that trying to fix it would probably be harder than rewriting it from scratch.
    I also realized than Naive Bayes is kind of cool, as it often works well even if the independence assumption does not hold, might be among the best performing algorithms for a problem, plus is extremely fast both for training and at runtime.
  • 10/03/2014 We discussed my current work on MLFix (machine-learned flavour of Depfix) on the Monday reading group. I got valuable feedback, especially based on experiences of the attendees with manually evaluating parts of MLFix output on WMT12 dataset.
  • 05/03/2014 I attended the faculty ball -- this year it was really good!
  • 04/03/2014 We submitted several systems for WMT 14 Translation task (I am only involved in EN to CS since Depfix only works for that translation direction). Nadir beats us in BLEU, but let's see what manual evaluation shows :-)
  • 29/01/2014 I gave a talk about Depfix at the Edinburgh SMT group weekly meeting; actually, it was more of a discussion accompanied by the slides (and we only managed to get to slide 15, although the meeting took much longer than planned). It was definitely the best audience Depfix ever had, as ILCC currently has the world's top MT research group. I got a tremendous amount of feedback and suggestions, and I hope that the group also took some benefit from the meeting since many of the attendees seemed to be genuinely very interested in my work.
  • 24/01/2014 Our HamleDT paper on stanfordization of 30 treebanks got accepted to LREC! And so did Petra's paper on paraphrasing for MT evaluation that I co-authored (because we are employing Depfix in the paraphrasing pipeline).
  • 13/01/2014 -- 24/02/2014 I am at ILCC, University of Edinburgh, as a visiting PhD student, thanks to an invitation from professor Bonnie Webber and thanks to financial support from ÚFAL. I am taking part on meetings, auditing lectures, and generally trying to learn how it works here, what the people do here, make some contacts, etc. I am also trying to learn more about Bayesian learning, as this is probably one of the best places in the world to do so. And, of course, I am also spreading a word about what we do in Prague, as always with a bit of a focus on my all-time interest, Depfix.
  • 07/12/2013 I am starting to work on MLFix, which is an attempt to replace (at least partially) the manual labour of designing Depfix rules by doing that automatically with machine learning. If this works, it should be possible to rather easily port Depfix to other languages (among other benefits this would bring).