Wednesday, March 24, 2010

std::string vs. std:wstring which C++ string datatype best supports UTF-8?

More bug resolution!

I managed to get the ht-en part of the site working. The issue was related to a somewhat confusing configuration management bug.

Modified the site so it handles user translation submissions in a more context sensitive manner.

I also added an icon to the site, gets rid of the cherrypy icon that was previously being rendered by default.

I'll be fiddling with the ht-en side of things over the next week until I can figure out that mess.

std::string vs. std::wstring

I found out there are issues with utf-8 encoded strings crossing into the python-moses backend. I've been told C++ std::strings can handle unicode, but when the STL has a special std::wstring data types for "wide strings"...I'm lead to believe otherwise. I even think swig supports std::wstring.

In anycase, that just means a little more fiddling on my part with this aspect of the python-moses mojo.

Any tips on converting python-unicode strings into something C++ can process (to std::string or std::wstring? is the question) would be welcomed!

So far I've seen the following blurbs on this issue:


ct

No comments:

Post a Comment