2 August 2014

How to translate YouTube subtitles quickly

Translating 1200+ words of rich, technical text every week on top of our daily work is far from easy, so we had to develop a very streamlined workflow. I'll share it here, in case you ever need to do something similar.

The biggest hurdle with subtitles is timing. Splitting text into chunks long enough to be readable on screen is a whole profession, and we really didn't have the time and resources for it.

Our (admittedly brutal) solution was to use YouTube's own automatic subtitles. Every time a video is uploaded, YouTube uses a speech recognition system in order to turn audio into text, and then split it automatically into subtitles.

As you can see, it's not perfect, but it gets the job done.

In order to extract the subtitles we use an open source tool called Google2SRT, which you can download here. It's a Java application, so it runs everywhere (Windows, Mac, Linux) and without installation.

The tool is pretty self explanatory: you just need to input the URL of the video, press read, then pick the desired language(s) and click Go!

Yep, I have my Windows set in Italian for work purposes. Although it's actually a Japanese copy of Windows 8.1. Long story... Yep, I have my Windows set in Italian for work purposes. Although it's actually a Japanese copy of Windows 8.1. Long story...

At that point, you should have a .SRT file, which you can open with any plain text editor like notepad and looks like that:

SRT file

Again, pretty self explanatory. Each entry is marked by an ordinal number, start and end times, and the subtitle itself.

As usual, we work inside memoQ, so we import the file as plain text and then, in order to get the control codes out of the way fast, we just sort everything in alphabetical order, select the control codes, copy to target and lock.

SRT translation with memoQ

If you don't have memoQ, you can achieve a similar result with Excel.

- Copy the whole content of the .SRT file and paste it into column B of a new Excel sheet
- In column A type 1, 2, 3 in the first three lines, then drag and drop until all the lines in column B have a number on their side (Excel will continue the series automatically)
- Sort column B alphabetically, then select all the cells that contain the control codes and drag them into column C
- Sort column A alphabetically in order to restore the original order
- VoilĂ ! All the control codes are now copied into column C. You can now start translating into column C, and you can even apply a "Blank lines" filter in order to remove codes from view
- When you're done, just copy and paste column C into the original .SRT file

During the translation process you will meet three main issues.

The speech recognition engine is pretty good (especially with a series like EC which has only Dan's voice and no music background), but it does make some mistakes you should probably correct in the source.

Also, YouTube applies no punctuation at all. Actual movie subtitles tend to remove some of the punctuation too, so our solution was to implement only the punctuation signs that are strictly necessary. For example, we use full stops to separate sentences that appear together on screen, but we don't put them at the end of each sentence. It's a detail, but it makes the text more readable while reducing our workload.

The final issue is subtitles that end up being too short. In the example below, it doesn't make much sense to have "room" appearing alone at line 124, so we just merged it into the translation above.

SRT translation

Once you are done (and have proofread & spellchecked thrice), you are ready to export as .SRT

As a final step, you should "absorb" the subtitles you decided to remove, by copying their end time on the subtitle before and then removing them. Incidentally, this also means that your .SRT file will skip an ordinal number, but YouTube doesn't seem to mind.

[su_row][su_column size="1/3"]Step 1[/su_column] [su_column size="1/3"]Step 2[/su_column] [su_column size="1/3"]Step 3[/su_column][/su_row] [su_row][su_column size="1/3"]31 00:01:34,620 --> 00:01:37,659 di giocare un titolo molto coinvolgente nella quiete della vostra stanza,

32 00:01:37,659 --> 00:01:38,070

33[/su_column] [su_column size="1/3"]31 00:01:34,620 --> 00:01:38,070 di giocare un titolo molto coinvolgente nella quiete della vostra stanza,

32 00:01:37,659 --> 00:01:38,070

33[/su_column] [su_column size="1/3"]31 00:01:34,620 --> 00:01:38,070 di giocare un titolo molto coinvolgente nella quiete della vostra stanza,

33[/su_column][/su_row]

That's it! All you need to do now is upload your SRT file on YouTube

As you can see from the video above, a couple of timings can be slightly off, but tweaking them is still way faster and easier than doing a full timing.

Thanks for reading and enjoy the show!

(cover credits)