From owner-freebsd-doc@FreeBSD.ORG Fri Jan 29 21:01:07 2010 Return-Path: Delivered-To: freebsd-doc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81D65106566C for ; Fri, 29 Jan 2010 21:01:07 +0000 (UTC) (envelope-from murray@stokely.org) Received: from mail-px0-f183.google.com (mail-px0-f183.google.com [209.85.216.183]) by mx1.freebsd.org (Postfix) with ESMTP id 643E48FC15 for ; Fri, 29 Jan 2010 21:01:07 +0000 (UTC) Received: by pxi13 with SMTP id 13so517559pxi.3 for ; Fri, 29 Jan 2010 13:01:07 -0800 (PST) MIME-Version: 1.0 Received: by 10.141.100.9 with SMTP id c9mr909634rvm.33.1264798866801; Fri, 29 Jan 2010 13:01:06 -0800 (PST) In-Reply-To: <2a7894eb1001172357t754cee36u760d9ddd1d6a7665@mail.gmail.com> References: <2a7894eb1001172357t754cee36u760d9ddd1d6a7665@mail.gmail.com> Date: Fri, 29 Jan 2010 13:01:06 -0800 Message-ID: <2a7894eb1001291301u2e0b5f17q8dc381fad5b76285@mail.gmail.com> From: Murray Stokely To: FreeBSD doc list Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: Proposed new doc hierarchy for closed-captions / transcripts from conferences X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jan 2010 21:01:07 -0000 No comments? I will proceed with this plan then.. - Murray On Sun, Jan 17, 2010 at 11:57 PM, Murray Stokely wrote= : > As some of you might be aware I have been working on getting closed > captions for the videos of FreeBSD related talks at conferences. =A0In > the last month I've started using the YouTube Machine Learning to > produce the first automatic transcript and then paying human editors > through Amazon Mechanical Turk to improve the technical vocabulary / > general editing of the transcripts. > > There are now four videos in the BSD Conferences YouTube channel with > relatively good quality human-edited english language transcripts. > (e.g. pointers at > http://freebsd.stokely.org/2010/01/improved-conference-captions-from.html= ) > > The caption files themselves are simple ASCII text files with one line > for the start/end time of the text to be displayed, 1 or 2 lines for > the text to be displayed, and a blank line to separate the next > record. > > I would like to start checking in these text files under > doc/en_US.ISO8859-1/captions/ for a number of reasons. > > 1. I want to make it easier for others to correct any mistakes in the cap= tions. > 2. I want to make it easier to translators to produce localized > captions for the most popular videos. > 3. Keep a centralized repository of the captions outside of YouTube, > so other hosting sites or systems are able to use them. > 4. Increase discoverability of technical content discussed in the > conference talks with indexable transcripts open to search engines. > > The blog post above has some example text files that I'd like to check > in. =A0It then becomes a matter of choosing the hierarchy. > > I might suggest: > > doc/${LANG}/captions/${YEAR}/${CONFERENCE}/${TALK} > > e.g. > > doc/en_US.ISO8859-1/captions/2009/asiabsdcon/mckusick-kernelinternals.sbv > > Thoughts? > > =A0 =A0- Murray >