Hi there
It is difficult for me to do that using a code.For me ,i usually process the word using a word processing tool.It supports to process or convert word to html.As for clean up HTML in c#.I have never tried to do that.
But you can try to add a word tool to help you.They offer detailed tutorial and code for new users.
Hope to help you.