Tool for XML to MP3 conversion
Project name
AudioLlibre  (MyWay)
Customer
UOC


CONVERSION XML TO MP3, PACKED IN DAISY FORMAT
Automatic XML tool conversion to mp3 format, with the possibility of downloading the material packed in DAISY format.

Tools used
XSLT 2.0
Loquendo (Catalan, Spanish, and english)
Vervio
xalan
Java
C++ & STL
DaisyPlayer
Subversion
Lame

Sample conversion XML to mp3: 
Andersen tales:
L'anaguet lleig
El Rossinyol


What is Daisy?
DAISY (Digital Accessible Information System) is a standard for digital spoken books, developed by international consortium of libraries that provide materials for blind people.

This Standard was developed to allow blind people access to books with the same ease that a person can have when you read a book. DAISY format can be distributed on CD, DVD and other multimedia systems.

These books can be reproduced so called DAISY Readers Walkman-like devices (like audio guides disposiotius Museums) or PC using software that can be found on the web, for example DaisyPlayer, distributed by ONCE.

Technical description

For this output were analyzed two data format, the input (XML) and output (Daisy).

DAISY format consists of the following parts, a file that tells how many books are on CD (discinfo.html index of books), how you structure each book (ncc.html), audio files in mp3 format, text file Synchronization of audio files and to mark the order in which they reproduce audio files in a format SMIL (Synchronized Multimedia Integration Language). All these files must be in a specific directory structure.

How to build a daisy package:

1 - First we have to know the directories structures, the Daisy structure:


Book_name
    discinfo.html -
-
    index.html
    /chapter_1
       ncc.html
       /smil
           smil files
       /wav
           mp3 files - mp3 files generated using loquendo
       /txt
           text content files

discinfo.html
Stores how many books (cahpter1..n) we have.
ncc.html
Book structure in this case chapter_1 structure. the structure is build with h1-h5  tags, and there are meta tags that are mandatory as <meta scheme="hh:mm:ss" content="02:34:25.657687500000975" name="ncc:totalTime"/>, the duration of the book in this format, we calculate this time using java inside xslt.

smil
One smil file for each mp3, with metatag that are mandatory, for example the duration of the mp3 file in seconds, we use as the seaction before a java inside xslt.

mp3 files
mp3 files generated using loquendo


How to get the mp3 files

To reach this final format first need to convert the entire text of the material in an audio format XML, this operation we will have a text-audio converter that can be Loquendo or Vervio, both TTS require a special input format called SSML.

Through XSL 2.0 we transform XML documents into SSML (Speech Synthesis Markup Language), exploiting the characteristics of this format (emphasis sentences, pauses, exclamations, volume, background music, etc. ...).

SSML Sample:

<speak xml:lang="es">

  <voice name="jorge"> DAisy format tool</voice>
  <break time="3000ms"/>
</speak>

After it, SSML is converted directly with Loquendo that gets the name attribute from voice tag as a voice. The output is stored in wav format. After this we convert to mp3, to save space.
The program that does this conversion was developed in C ++ using the STL (Standard Template Library) that calls dll Loquendo library.
With this process we obtain a set of wav files. How long the occupy wav files convert to MP3 using the LAME opensource.

Create syncronization files NCC and SMIL

Once we generated audio files to generate files that indicate the order in which Daisy must play mp3, these files are the NCC and SMIL. This files have been generated with XSL 2.0.

NCC Structure have two part, metas that are mandatory and html structure:

<html xmlns="http://www.w3.org/1999/xhtml" lang="ca" xml:lang="ca">
<head>
<title>Teoria i tècniques de les relacions públiques I</title>
<meta content="Teoria i tècniques de les relacions públiques I" name="dc:title"/>
<meta content="Universitat Oberta de Catalunya." name="dc:creator"/>
<meta content="Daisy 2.02" name="dc:format"/>
<meta content="UOC" name="dc:publisher"/>
<meta content="DTB-code" name="dc:identifier"/>
<meta content="code" name="dc:source"/>
<meta content="2009-04-15" name="dc:date"/>
<meta content="ca" name="dc:language"/>
<meta content="0" name="ncc:footnotes"/>
<meta content="0" name="ncc:pageFront"/>
<meta content="0" name="ncc:pageNormal"/>
<meta content="0" name="ncc:pageSpecial"/>
<meta content="0" name="ncc:prodNotes"/>
<meta content="0" name="ncc:sidebars"/>
<meta content="1 of 1" name="ncc:setInfo"/>
<meta content="0" name="ncc:maxPageNormal"/>
<meta content="Editorial" name="ncc:sourcePublisher"/>
<meta content="Universitat Oberta de Catalunya" name="ncc:producer"/>
<meta content="25" name="ncc:tocItems"/>
<!-- Total time, mp3 duration -->
<meta scheme="hh:mm:ss" content="02:34:25.657687500000975" name="ncc:totalTime"/>
<meta content="Loquendo 6.5" name="ncc:narrator"/>
<meta content="iso-8859-1" name="ncc:charset"/>
</head>
<body>

HTML structure is the mp3 document structure:
We build the structure with h1 to h5 tags with unique id and a reference to a smile file that contains the corresponding mp3 file.

<h1 class="title" id="heading_02_title"> <a href="smil/code_title.smil#bookid_02_title">La direcció de projectes de relacions públiques . </a> </h1>
[1] <h1 class="section" id="heading_02_0i"> <a href="smil/code_0i.smil#bookid_02_0i">introduction</a> </h1>
<h1 class="section" id="heading_02_1"> <a href="smil/code-1_1.smil#bookid_02_1">1_1.chapter 1</a> </h1>
<h2 class="section" id="heading_02_1_1"> <a href="smil/XX06_18007_02885-2_1_1.smil#bookid_02_1_1">1_1_1. Section 1</a> </h2>
<h2 class="section" id="heading_02_1_2"> <a href="smil/code-2_1_2.smil#bookid_02_1_2">2_1_2. Section 2</a> </h2>
<h1 class="section" id="heading_02_2"> <a href="smil/XX06_18007_02885-2_2.smil#bookid_02_2">2_2.Chapter 2</a> </h1>
</body>
</html>

SMIL files:
Sample file corresponding to de [1] C
ode_0i.smil

<?xml version="1.0" encoding="iso-8859-1"?>
<smil>
<head>
<meta content="Daisy 2.02" name="dc:format"/>
<layout>
<region id="txt-view"/>
</layout>
</head>
<body>
<!-- mp3 file duration -->
<seq dur="667.6275s">
<par endsync="last">
<!-- Where can find the text file -->
<text id="bookid_02_0i" src="../modul_2.html#2_0i"/>
<seq>
<!-- we have to split the mp3 files in parts of 60seconds, better for daisy diveces -->
<audio id="audio2_0i_0" clip-end="npt=60s" clip-begin="npt=0s" src="../wav/code-2_0i.mp3"/>
<audio id="audio2_0i_1" clip-end="npt=120s" clip-begin="npt=60s" src="../wav/
code-2_0i.mp3"/>
<audio id="audio2_0i_2" clip-end="npt=180s" clip-begin="npt=120s" src="../wav/
code-2_0i.mp3"/>
<audio id="audio2_0i_3" clip-end="npt=240s" clip-begin="npt=180s" src="../wav/
code-2_0i.mp3"/>
<audio id="audio2_0i_4" clip-end="npt=300s" clip-begin="npt=240s" src="../wav/
code-2_0i.mp3"/>
</seq>
</par>
</seq>
</body>
</smil>

These generated files should be put in the correspondind directory inside DAISY structure. Once you do this process, we put this structure in a CD and you can listen it.

XSLT Utils

Using Java with XSLT, this template calculates a wav duration with xalan:
<xsl:template name="duracio-wav-java"  xmlns:file="java.io.File"               xmlns:audio="xalan://javax.sound.sampled">             
         <xsl:param name="filename"/>
       
        <xsl:if test="normalize-space($filename)">           
              <xsl:variable name="fitxer-so" select="file:new($filename)"/>
               <xsl:variable name="stream" select="audio:AudioSystem.getAudioInputStream($fitxer-so)"/>
              <xsl:variable name="format" select="audio:getFormat($stream)"/>
              <xsl:variable name="frameLength" select="audio:getFrameLength($stream)"/>
              <xsl:variable name="rateLength" select="audio:getFrameRate($format)"/>           
              <xsl:value-of select="$frameLength div $rateLength"/>
        </xsl:if>
                 
  </xsl:template>

With SAXON: Tamplate that calculates a filesize:
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:saxon="http://saxon.sf.net/"                    
                              xmlns:uoc="http://www.uoc.edu"   
                              xmlns:xs="http://www.w3.org/2001/XMLSchema"
                              xmlns:file_uoc="java.io.File"
                              exclude-result-prefixes="saxon uoc xs">

  <xsl:template name="mida-fitxer">               
         <xsl:param name="filename"/>
       
        <xsl:if test="normalize-space($filename)">           
              <xsl:variable name="fitxer" select="file_uoc:new(string($filename))"/>
            <xsl:variable name="mida" select="file_uoc:length($fitxer)"/>           
              <xsl:value-of select="$mida"/>
        </xsl:if>
  </xsl:template>   
</xsl:stylesheet>

Next generation of daisy documents

The next step is to add mathml inside for technical books, phisics, mathematics,...

Sample
Openedtech 2008


Customer project: UOC MyWAy project
The UOC promotes accessibility mean to make their materials accessible to blind people, to carry out this task was thought to DAISY format.
This project consisted of converting the materials of the UOC in XML format and this could well create a new syndication format designed for blind people, a piece of the project MyWay.

My Way is a project developed by the UOC, which transforms the contents, so to allow greater accessibility by providing people the data format that suits you.


These references can be found in appliances, computers and software aimed at Daisy, and technical specifications of the format:
http://www.visuaide.com
http://www.plextor.com
http://www.dolphinse.com
http://www.once.es/cidat

Links
Awards IMS Learning
MyWay
La Vanguardia - Los apuntes que se oyen

Contact ajuhe@omaonk.com.