Just because you're doing a lot more, doesn't mean you're getting a lot more done!

Convert HTML to Plain Text in C# using Markdown

Posted in   Umbraco , .Net
While working on my customisations to Tim Geyssens MailEngine I was looking for an accurate method of automatically creating a plain-text version of the HTML emails that were being sent out by the site. Further reading brought my attention to something called Markdown. After some hunting around with a little help from my friend Google I managed to find a markdown XSLT file. Using the XSLT I could transform my HTML email to plain-text with relative ease and accuracy. Of course in order to do this I would need a valid XML document and as my pages were already valid XHTML I had no problems there. Here is my method for doing the conversion, all it requires is that you pass it the HTML you want to convert which must be valid XML: [code lang="csharp"]/// <summary> /// Converts to HTML to plain-text. /// </summary> /// <param name="HTML">The HTML.</param> /// <returns>The plain text representation of the HTML</returns> private static string ConvertToText(string HTML) { string text = string.Empty; XmlDocument xmlDoc = new XmlDocument(); XmlDocument xsl = new XmlDocument(); xmlDoc.LoadXml(HTML); xsl.CreateEntityReference("nbsp"); xsl.Load(System.Web.HttpContext.Current.Server.MapPath("/xslt/Markdown.xslt")); //creating xslt XslTransform xslt = new XslTransform(); xslt.Load(xsl, null, null); //creating stringwriter StringWriter writer = new System.IO.StringWriter(); //Transform the xml. xslt.Transform(xmlDoc, null, writer, null); //return string text = writer.ToString(); writer.Close(); return text; }[/code] Download the XSLT file I used from here: http://www.getsymphony.com/download/xslt-utilities/view/20573/ I would love to hear from anyone that does this differently or if you can find any problems with the method I have chosen to implement for this solution.

comments powered by Disqus