Our Package for Cleaning Up Language Data

  • You may have a number of legacy language data which you would like to convert into good quality texts in an up-to-date format.
  • These legacy data may have restrictions such as being written in capital letters only, or lacking umlauts in German.
  • The texts contain many orthographic errors, and spaces are more or less arbitrarily put or omitted between words and punctuation marks.
  • Due to input field limitations, the texts contain many arbitrarily abbreviated items.
  • Standards for measurements, item numbers, or screw thread designations are not observed,
  • let alone terminological consistency or quality standards for technical documents.      

Thus, it is time to get your language data cleaned up!

Data Cleansing Based on Linguistic Analysis Procedures

All sorts of problems you may have with your legacy language data can be approached by our data cleansing tools based on sophisticated linguistic analysis procedures. Step by step, your corrupted data are converted into high quality texts that comply with linguistic, terminological, and editorial quality standards. Your texts will be made fit for being processed by modern language technologies such as translation memories and authoring memories.

Our procedures include:

  • normalisation of whitespace
  • standardisation of special data types
  • correction of orthographic issues
  • harmonisation of terminology
  • detection of variants
  • quality metrics


If you have further questions, or if you would like to know how we can help you with your specific requirements, do not hesitate to contact us.

We use cookies to optimise this website and continuously update it according to your needs. With the usage of our services you permit us to use cookies
More information I agree