Pareto Optimal Dev

Sums of data types should be preferred over "bucket" data types with a `thingType` field

Which are the tradeoffs of these two? Is one better overall?

User data type contains userType :: UserType field with Sum type to differentiate

types

data UserType = FreeUser | PaidUser
data User =
  User { userId :: Int
       , userName :: String
       , userType :: UserType
       , userRewardPoints :: Maybe Int
       }
data ArticleType = FreeArticle | PaidArticle
data Article =
  Article
  { articleId :: Int
  , articleName :: Text
  , articleTitle :: Text
  , articleType :: ArticleType
  }

case

serveArticle :: ArticleID -> Servant.Handler
serveArticle aId = do
  -- ... snip ...
  a <- runDB $ getArticle aId
  case userType u of
    PaidUser ->
      case articleType a of
        PaidArticle -> do
          runDB $ upsert user { userRewardPoints = fmap (+ 5) (userRewardPoints u) }
          displayArticle a
        FreeArticle -> do
          runDB $ upsert user { userRewardPoints = fmap (+ 2) (userRewardPoints u) }
          displayArticle a
    FreeUser -> case articleType a of
        PaidArticle -> Http400 "denied"
        FreeArticle -> displayArticle a

User is a sum type containing different entire different Users and their sometimes duplicate contents

types

data FreeUser =
  FreeUser { freeUserId :: Int
           , freeUserName :: String
           }
data PaidUser =
  PaidUser { paidUserId :: Int
           , paidUserName :: String
           , paidUserRewardPoints :: Int
           }
data User = Free FreeUser
          | Paid PaidUser
data FreeArticle =
  FreeArticle { freeArticleId :: Int
              , freeArticleName :: String
              }
data PaidArticle =
  PaidArticle { paidArticleId :: Int
              , paidArticleName :: String
              }
data Article =  Free FreeArticle | Paid PaidArticle

case

serveArticle :: ArticleID -> Servant.Handler
serveArticle aId = do
  -- ... snip ...
  a <- runDB $ getArticle aId
  case user of
    Paid pu ->
      case a of
        PaidArticle pa -> do
          runDB $ upsert user { paidUserRewardPoints = (+ 5) (paidUserRewardPoints pu) }
          displayArticle pa
        FreeArticle fa -> do
          runDB $ upsert user { paidUserRewardPoints = (+ 2) (paidUserRewardPoints pu) }
          displayArticle fa
    Free _ -> case articleType a of
        PaidArticle _ -> Http400 "denied"
        FreeArticle fa -> displayArticle fa

Comparing the calling code

We case differently on both user and article with these approaches. Does either seem advantageous?

serveArticle :: ArticleID -> Servant.Handler
serveArticle aId = do
  -- ... snip ...
  a <- runDB $ getArticle aId
  case userType u of
    PaidUser ->
      case articleType a of
        PaidArticle -> do
          runDB $ upsert user { userRewardPoints = fmap (+ 5) (userRewardPoints u) }
          displayArticle a
        FreeArticle -> do
          runDB $ upsert user { userRewardPoints = fmap (+ 2) (userRewardPoints u) }
          displayArticle a
    FreeUser -> case articleType a of
        PaidArticle -> Http400 "denied"
        FreeArticle -> displayArticle a
serveArticle :: ArticleID -> Servant.Handler
serveArticle aId = do
  -- ... snip ...
  a <- runDB $ getArticle aId
  case u of
    PaidUser pu ->
      case a of
        PaidArticle pa -> do
          runDB $ upsert user { paidUserRewardPoints = fmap (+ 5) (paidUserRewardPoints pu) }
          displayArticle pa
        FreeArticle fa -> do
          runDB $ upsert user { paidUserRewardPoints = fmap (+ 2) (paidUserRewardPoints pu) }
          displayArticle fa
    FreeUser _ -> case articleType a of
        PaidArticle _ -> Http400 "denied"
        FreeArticle fa -> displayArticle fa

Edit: A suggestion from a user on the Haskell discord that is correct by construction and ergonomic

types

data UserType = FreeUser | PaidUser PaidUserData
data User =
  User { userId :: Int
       , userName :: String
       , userType :: UserType
       }

data PaidUserData = PaidUserData
  { userRewardPoints :: Int
  }
data FreeArticle =
  FreeArticle { freeArticleId :: Int
              , freeArticleName :: String
              }
data PaidArticle =
  PaidArticle { paidArticleId :: Int
              , paidArticleName :: String
              }
data Article =  Free FreeArticle | Paid PaidArticle

this is correct by construction by moving the variation into the sum type

User { userId = 0
     , userName "h4x0r"
     , userType = FreeUser
     }

TODO case

serveArticle :: ArticleID -> Servant.Handler
serveArticle aId = do
  -- ... snip ...
  a <- runDB $ getArticle aId
  case u of
    PaidUser pu ->
      case a of
        PaidArticle pa -> do
          runDB $ upsert user { paidUserRewardPoints = (+ 5) (paidUserRewardPoints pu) }
          displayArticle pa
        FreeArticle fa -> do
          runDB $ upsert user { paidUserRewardPoints = (+ 2) (paidUserRewardPoints pu) }
          displayArticle fa
    FreeUser _ -> case articleType a of
        PaidArticle _ -> Http400 "denied"
        FreeArticle fa -> displayArticle fa

my commentary

User is not correct by construction with the userType example

User { userId = 0
     , userName "h4x0r"
     , userType = FreeUser
     , userRewardPoints = Just 10000 -- free users shouldn't have reward points
     }

User is correct by construction with the sum containing different Users

Free (FreeUser
      { freeUserId = 0
      , freeUserName "valid free"
      })
Paid (PaidUser
       { paidUserId = 1
       , paidUserName "valid paid"
       , paidUserRewardPoints = 10000
     })

TODO conclusion

sum of different types only really big advantage is being correct by construction

  • Another advantage stemming from that is preventing a proliferation of optional fields muddying the purpose of individual types

    • this typically has a side effect of having to laboriously validate this type in calling code over and over, hiding other codes intention infectiously

Sums of data types should be preferred over “bucket” data types with a `thingType` field

Contradicting the previous statement and this page, perhaps the core issue is the variations not always existing in a sum

TODO reconsider conclusions after including new example