Some wacky HTML generation code I wrote the other week

On a site I’m working on, we offer content authors a multi-line textbox whose content later is rendered to end-users by way of replacing newline-like tokens with <br />s. It was then decided that we needed to add support for bullets points. So we needed something to take the ‘markup’ on the left and convert it to the HTML on the right (or similar):

1 markup

Owing to extreme time constraints and the fact that we wanted to exclude all other forms of rich content (and other reasons), I decided to roll my own quick and nasty HTML generation code rather than use/customise something like CKEditor/Markdown. So here it is – I felt a little dirty writing it but hey, it took next to no time, it works like a dream and is even unit-tested (bonus!). My only issue with it is that we’re using <br /> elements instead of <p> elements with margins, but on the other hand, content authors will seldomly be using line breaks and it’s something which can be easily changed later.

Note that RegexReplace(), StringJoin(), IsNullOrEmptyOrWhiteSpace() and Remove() are all extensions which we’ve added as we prefer a more functional/fluent way of programming. Excuse the lack of constants 😀

static string ConvertMarkupToHtml(string markup)
{
  if (markup == null)
    return null;

  return markup
    // "* [something]" will eventually be translated to <li>[something]</li>, so remove the space now
    .Replace("* ", "*")
    // Replace *[something] linebreak with <ul><li>[something]</li></ul>
    .RegexReplace(@"(\*(?<content>[^\r\n]*)(\r\n|\n))", "<ul><li>${content}</li></ul>")
    // Replace *[something]END with <ul><li>[something]</li></ul>END
    .RegexReplace(@"(\*(?<content>[^\r\n]*)$)", "<ul><li>${content}</li></ul>")
    // Unescape asterisks
    .Replace("**", "*")
    // Replace newline like tokens with <br />s
    .Replace("\r\n", "<br />")
    .Replace("\n", "<br />")
    // Combine adjacent lists
    .Remove("</ul><ul>")
    // Surround all content which is not HTML or inside HTML with <p></p>
    // ... RegexSplit includes the delimiters (bits of HTML) as part of its result - so this returns <br />s, <ul></ul>s and raw content
    .RegexSplit(@"(<br />|<ul>.+?</ul>)")
    .Where(c => !c.IsNullOrEmptyOrWhiteSpace())
    // ... Let markup pass through - wrap non-markup in <p></p>
    .Select(chunk => chunk.StartsWith("<br />") || chunk.StartsWith("<ul>") ? chunk : "<p>" + chunk + "</p>")
    .StringJoin()
    // Whenever 1-2 <br />s exists between two paragraphs, amalgamate the paragraphs preserving the <br />s
    // ... Why doesn't this work? (<br />){1, 2}
    .RegexReplace("</p>(?<brs>(<br />|<br /><br />))<p>", "${brs}")
    // Whenever 3+ <br />s exist between two paragraphs, remove two of them to compensate for p element spacing
    .RegexReplace("</p><br /><br /><br />(?<brs>(<br />)*)<p>", "</p>${brs}<p>")
    // Whenever some number of <br />s exists between a </p> and a <ul>, remove one of them
    .RegexReplace("</p><br />(?<brs>(<br />)*)<ul>", "</p>${brs}<ul>");
}

About Nathan Pitman

Undergrad software engineer at the University of Auckland
This entry was posted in C#, Web and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *