TranslatorBank is a free and user-friendly software to collect specialized texts from the Web (build a corpus), extract monolingual terminology and collocations and make linguistic analyses during the translation process. It is designed for use in a professional translation environment or in translation classes to analyse reference texts.
If you use this software in your papers, please cite: "Revisiting corpus creation and analysis tools for translation tasks". In: Gallego-Hernández, Daniel and Rodríguez-Inés, Patricia (eds.), Special Issue: Corpus Use and Learning to Translate, almost 20 years on. Cadernos de Tradução, 36(1), 62-87 (2016).
Operating System: Windows.
For translators and interpreters, analysing concordances allow you to:
Key features:
If you don't have a set of texts, you can create your own corpus from the Web using the CorpusCreator utility. The user manual for CorpusCreator can be found here. You can start CorpusCreator from menu Corpus | Create corpus from the Web (CorpusCreator). Texts collected with CorpusCreator are downloaded and converted in XML-files which can be directly iported in a corpus database by TranslatorBank.
When you have a set of suitable documents for your project, you can import them in corpus database. All texts must be saved in the same folder which will eventually be your project folder. You can import:
Once you have a folder with .txt and/or .xml files, you can import them in a corpus database:
Click on Corpus | Open corpus database. PS: when you open TranslatorBank, the last corpus you have used will be automatically opened. You see the name of the open corpus on the top of the window.
TranslatorBank uses a query system similar to the Google Search Engine. To start querying your corpus, type a word, part of it or a phrase in the Search mask and press Enter. The concordances for your string will be displayed. As default the Proximity search is activated: this means that if you enter 2 words, the tool will search for sentences containing the two words in a gap of 5 words (distance). If you want to search for exact words and phrases, just use "". To see the whole text context of a particular result, double-click on the corresponding concordance ID (row number on the left). Here the search word is highlighted. You can access the original file ith links provided.
To alphabetically order the results, open the Options panel by clicking the green +, use the drop-down list to select Left or Right and the distance of the word by which you want the results to be ordered.
We are still working on rules and heuristics for the single languages. So please, consider the results just as "working in progress".
Even if the idea of TranslatorBank is to offer you an out-of-the-box solution, for the extraction module - as it involves so many different languages - you need to help the software to speak your language. In fact, for extracting the terminology from your corpus TranslatorBank uses three resources:
We cannot provide all language-dependend resources directly for many languages,. This is why you may need to add some files (for English everything is already installed, even if grammar rules, stopwords and common vocabulary can be changed i.e. improved by the user). If some file for your language is missing in the installation, TranslatorBank will popup a message. In this case you need to:
You also need the rules files. Rules are provided for English, Italian and German (but you can change i.e. improve them). Morphological rules describe what terms are. Unfortunately I do not speak so many languages, so I have no clue what is a term in French, Spanish or Russian. You are called to write these rules alone (or ask me for support, we can do that togheter). If you have succesfully written yur rules, it would be nice if you could send them to me so that I can make them available with the next release.
For Egnlish, for example, we can say that a term can be a NOUN. In the rules file for English, we have to write "<1gram>;NN;" where "1 gram" means that the term is made only of 1 word and "NN" is the part of speach used in the Tagset documentation (associated with the parameter file of the TreTagger) to describes nouns. If we want to extract terms made by "adjectiv+noun", we add a new line in the Rules file for English as: <2gram>;ADJ,NN; (please note the use of commas and semicolumns).
Rule files are saved in a folder called Rules. You can open it with Menu "Tool|Open directory with language". See the format of the English rules file to understand how a rule files is made.
If you have problems in adding your language, write me an e-mail and I'll help you to create this rules file. I'll need some help from you in order to build it!
TranslatorBank and all its modules are copyright of Claudio Fantinuoli, University of Mainz. The third party modules are owned by their respective owners. For more information contact the author of this software.