Strings and String conversions (in Haskell)

Representation of Strings is a sore spot in Haskell, unfortunately.

The fundamental problem is that the ‘default’ String, the String type from the standard library, is a linked list of characters. Nicely enough it is unicode capeable and handles special characters nicely, however using linked lists as strings is very inefficient.

Therefore marvin uses a more efficient string type called Text. To be precise the Text type in Data.Text.Lazy from the text library.

Functions exposed by the marvin library generally ALL deal with this string type, to make it as easy as possible for the user.

However when you interact with other libraries you might encounter other string types, such as ByteString (often the result of HTTP requests or input for JSON decoding) and String from the standard library, often in the form of FilePath as name for files and directories.

Strict and lazy Text and ByteString

Furthermore both Text and ByteString have a strict and a lazy variant. It is not really necessary to know the difference between the lazy and strict variants of these strings, suffice to say they are not the same thing.

If you need to convert between the strict and lazy variants of these strings there is a fromStrict (converts strict to lazy) and toStrict (converts lazy to strict). This is the case for both Text and ByteString. The two conversion functions toStrict and fromStrict are always contained in the module holding the lazy version of the type. This means for Text it is Data.Text.Lazy and for ByteString it is Data.ByteString.Lazy.

If you want to know which String type you have, look at the module.

  • strict Text comes from either Data.Text or Data.Text.Internal
  • lazy Text comes from either Data.Text.Lazy or Data.Text.Internal.Lazy

ByteString uses the same naming scheme, just replace Text with ByteString.

Converting String

To convert a String value to the string type used in marvin use pack from the Data.Text.Lazy module. To convert a String value from the string type used in marvin use unpack from the Data.Text.Lazy module.

You can use the same functions for FilePath as it is only an alias for String.

Converting ByteString

ByteString values are a lot harder to convert than String, becuase they have no specified encoding.

To convert a ByteString value you, first make sure it is a lazy ByteString. If it is not convert it with fromStrict.

Now you’ll have to decode the ByteString. The text library offers a number of decoding functions in the Data.Text.Lazy.Encoding module. The encode* functions are used to create ByteString``s from text and the ``decode* functions are used to turn ByteString``s into ``Text.

If you are not sure what encoding your source text is try Utf8, its the most common.