While working on my customisations to Tim GeyssensMailEngine I was looking for an accurate method of automatically creating a plain-text version of the HTML emails that were being sent out by the site. Further reading brought my attention to something called Markdown. After some hunting around with a little help from my friend Google I managed to find a markdown XSLT file. Using the XSLT I could transform my HTML email to plain-text with relative ease and accuracy. Of course in order to do this I would need a valid XML document and as my pages were already valid XHTML I had no problems there.
Here is my method for doing the conversion, all it requires is that you pass it the HTML you want to convert which must be valid XML:
[code lang="csharp"]/// <summary>
/// Converts to HTML to plain-text.
/// </summary>
/// <param name="HTML">The HTML.</param>
/// <returns>The plain text representation of the HTML</returns>
private static string ConvertToText(string HTML)
{
string text = string.Empty;
XmlDocument xmlDoc = new XmlDocument();
XmlDocument xsl = new XmlDocument();
xmlDoc.LoadXml(HTML);
xsl.CreateEntityReference("nbsp");
xsl.Load(System.Web.HttpContext.Current.Server.MapPath("/xslt/Markdown.xslt"));
//creating xslt
XslTransform xslt = new XslTransform();
xslt.Load(xsl, null, null);
//creating stringwriter
StringWriter writer = new System.IO.StringWriter();
//Transform the xml.
xslt.Transform(xmlDoc, null, writer, null);
//return string
text = writer.ToString();
writer.Close();
return text;
}[/code]
Download the XSLT file I used from here:
http://www.getsymphony.com/download/xslt-utilities/view/20573/
I would love to hear from anyone that does this differently or if you can find any problems with the method I have chosen to implement for this solution.