So here's a puzzle for you.

You start with an empty string and clipboard. You can perform four operations:

  • Append a character to the string
  • Delete a character from the end of the string
  • Copy the entire string into the clipboard
  • Paste the contents of the clipboard at the end of the string

Each operation takes a different amount of time:

  • Append takes 1 tick
  • Delete takes 1 tick
  • Copy takes 3 ticks (CTRL-A-C)
  • Paste takes 2 ticks (CTRL-P)

The question is "What's the shortest string length that requires a delete to create most efficiently?".

If you wanted to figure this out for youself, well, you came to the wrong place.

In [52]:
-- we'll need these later
import Data.MemoCombinators (memo2, integral) 
import Data.Monoid (mempty)

So let's encode what we know so far

In [53]:
data Operation = Append | Delete | Copy | Paste
  deriving (Eq,Show)

type Length = Int
type Ticks  = Int
type Chain  = [ Operation ]

opCost :: Operation -> Int
opCost Append = 1
opCost Delete = 1
opCost Copy   = 3
opCost Paste  = 2

chainCost :: Chain -> Int
chainCost = sum . map opCost

Let's simplify our problem space. As the problem is stated, we're considering all chains of operations:

AACAAPAAADADCDDP
PDPPCA
AAAAACDP

Note a couple things:

  1. Pasting before we've Copied anything doesn't increase the size of the string.
  2. Since we're only considering length, not content, Pasting and Appending commute. Both PA and AP increase the size of the string by one plus the clipboard length.
  3. Similarly, Pasting and Deleting commute. Both PD and DP increase the size of the string by one minus the clipboard length.

This lets us restrict us to chains of a certain form. We can rule out all the Pastes before a Copy since we're searching for the most efficient solution, and this only adds ticks. We can restrict pastes to only be after a copy. This gives these meta-operations:

  • Append a character to the string
  • Delete a character from the end of the string
  • Copy the entire string into the clipboard and Pasting it zero or more times.

Now all our chains look like this:

AAACPPPACPPD
AADCPADDD
CPPPACPDD

There are futher optimizations that could be made, but this turns out to be sufficient for our purposes.

If we work backwards from a string of length n , it could have been made by

  • Appending to a string of length n-1 in 1 tick,
  • Deleting from a string of length n+1 in t-1 tick,
  • or, for all d and q such that q * d = n, by Copying a string of length d and Pasting it q -1 times in a total of 2*q + 1 ticks.

That last bit deserves an example. A string of 12 could have been by

  • copying a string of length 3 and pasting it 3 times, in 9 ticks.
  • copying a string of length 4 and pasting it 2 times, in 7 ticks.
  • copying a string of length 6 and pasting it 1 time, in 5 ticks.

So at this point a helper function to discover all the divisors of a number seems useful:

In [54]:
-- find all the pairs that multiply to the given value
divisors :: Int -> [ (Int, Int) ]
divisors = integral $ \n -> [ (d,q) | d <- [1..n], let (q,r) = n `quotRem` d, r == 0 ]

That integral above is my first use of the Data.MemoCombinators library. It memoizes the function, so that the divisors for each input are only computed once (the first time they're requested) and then cached to be reused for future calls.

This saves me some time. For example, if I'm used divisors inside an m-fold loop, the loop would be O(m * n) without memoization, but only O(m + n) with memoization.

Consider again working backwards from a string of length n. Though the last meta-operation may have operated upon a shorter (Append, Copy/Paste) or longer (Delete) string, if were looking at the chain operations that leads most efficiently to our string of length n, than the string that the meta-operation acted upon must be, by definition, more efficient, since we had to spend further ticks after reaching it. So if we know how to efficiently construct strings in less than t ticks, we can use that to efficiently construct strings in exactly t ticks.

This opens us up to Dynamic Programming, which is the real reason I broke out Data.MemoCombinators. Below we use memo2 to memoize a two argument function that calculates the chains required to generate a string of length n in exactly t ticks using the exact working backward method discussed above.

Note that chainsToIn calls itself at least four times, so if we used straight recursion, this would lead to a combinatorial explosion of work. Memoization saves that, letting us calculate a single chainsToIn n t in O(t2) time and all chainsToIn n t for 0 <= n < N and 0 <= t < T in O(NT + T2) time.

In [55]:
-- figure out how to get a string of the given length
-- in exactly the given amount of ticks
chainsToIn :: Length -> Ticks -> [ Chain ]
chainsToIn = memo2 integral integral $ \n t -> case (n,t) of
    (0,0) ->  return []
    (_,t) | t <= 0 -> mempty
    (c,_) | c <= 0 -> mempty
    _     ->  -- helper function to extend chains that generate
              -- the given length by the given suffix
              -- to create chains to the current location
              let moveFrom :: Length -> Chain -> [ Chain ]
                  moveFrom n' c = map (++c) . chainsToIn n' $ t - chainCost c
              -- find all the ways to get to the current location
              in concat [ moveFrom (n-1) [Append]
                        , moveFrom (n+1) [Delete]
                        , do
                            (d,q) <- divisors n
                            moveFrom d $ Copy : replicate (q-1) Paste
                        ]

Now that we can find all the ways (if any) of generating a string of length n in t ticks, we can determine the most efficient ways to generate a string of length n by simply iterating from t=0 up until we find a value of t that gives us a non-empty set of ways to find a string of that length.

In [56]:
-- find the optimal chain to e
optimalChainsTo :: Length -> [ Chain ]
optimalChainsTo = integral $ \n ->
  head . dropWhile null $ map (chainsToIn n) [ 0.. ]

Our solution, then, is shortest length where all the chains of operations that generate that length in optimal time include a Delete.

In [57]:
-- our solution only has chains with a Delete in it
solution :: Length
solution = head $ filter (all (elem Delete) . optimalChainsTo) [ 0.. ]

Let's see what it is!

In [58]:
solution
53

And what was so special about the chain of operations that generate it so efficiently?

In [59]:
optimalChainsTo solution
[[Append,Append,Append,Append,Append,Append,Copy,Paste,Paste,Copy,Paste,Paste,Delete]]

And how many ticks did it require?

In [60]:
map chainCost it
[21]

Out of curiousity, lets see what the optimal chains look like for strings up to length 64.

In [61]:
-- need a few more libraries
import Text.Printf (printf)
import Control.Monad (forM_)
In [66]:
putStrLn "num ticks chains"
forM_ [ 0 .. 64 ] $ \i ->
  let cs@(c:_) = optimalChainsTo i
  in printf " %2d    %2d %s\n" i (chainCost c) (unwords $ map (map $ head . show) cs)
num ticks chains
  0     0 
  1     1 A
  2     2 AA
  3     3 AAA
  4     4 AAAA
  5     5 AAAAA
  6     6 AAAAAA
  7     7 AAAAAAA
  8     8 AAAAAAAA
  9     9 AAAAAAAAA
 10    10 AAAAAAAAAA AAAAACP
 11    11 AAAAAAAAAAA AAAAACPA
 12    11 AAAACPP AAAAAACP
 13    12 AAAACPPA AAAAAACPA
 14    12 AAAAAAACP
 15    12 AAAAACPP
 16    13 AAAAACPPA AAAACPPP AAAAAAAACP
 17    14 AAAAACPPAA AAAACPPPA AAAAAAAACPA AAAAAACPPD
 18    13 AAAAAACPP
 19    14 AAAAAACPPA
 20    14 AAAAACPPP
 21    14 AAAAAAACPP
 22    15 AAAAAAACPPA
 23    16 AAAAAAACPPAA AAAAAACPPPD AAAAAAAACPPD
 24    15 AAAAAACPPP AAAAAAAACPP
 25    16 AAAAAACPPPA AAAAAAAACPPA AAAAACPPPP
 26    17 AAAAAACPPPAA AAAAAAAACPPAA AAAAACPPPPA AAAAAAAAACPPD AAAACPPACP AAAAAACPACP
 27    16 AAAAAAAAACPP
 28    16 AAAAAAACPPP
 29    17 AAAAAAACPPPA
 30    17 AAAAAACPPPP AAAAAAAAAACPP AAAAACPCPP AAAAACPPCP
 31    18 AAAAAACPPPPA AAAAAAAAAACPPA AAAAACPCPPA AAAAACPPCPA AAAAAAAACPPPD
 32    17 AAAAAAAACPPP
 33    18 AAAAAAAACPPPA AAAAAAAAAAACPP AAAAACPACPP
 34    19 AAAAAAAACPPPAA AAAAAAAAAAACPPA AAAAACPACPPA AAAAAAACPPPPD AAAAACPPAACP AAAACPPPACP AAAAAAAACPACP AAAAAACPPDCP
 35    18 AAAAAAACPPPP
 36    18 AAAAAAAAACPPP AAAACPPCPP AAAAAACPCPP AAAAAACPPCP
 37    19 AAAAAAAAACPPPA AAAACPPCPPA AAAAAACPCPPA AAAAAACPPCPA
 38    19 AAAAAACPPACP
 39    19 AAAACPPACPP AAAAAACPACPP
 40    19 AAAAAAAACPPPP AAAAAAAAAACPPP AAAAACPCPPP AAAAACPPPCP
 41    20 AAAAAAAACPPPPA AAAAAAAAAACPPPA AAAAACPCPPPA AAAAACPPPCPA AAAAAAACPCPPD AAAAAAACPPCPD
 42    19 AAAAAAACPCPP AAAAAAACPPCP
 43    20 AAAAAAACPCPPA AAAAAAACPPCPA
 44    20 AAAAACPPCPPD AAAAAAAAAAACPPP AAAAACPACPPP AAAAAAACPPACP
 45    19 AAAAACPPCPP
 46    20 AAAAACPPCPPA
 47    21 AAAAACPPCPPAA AAAACPPCPPPD AAAAAACPCPPPD AAAAACPPACPPD AAAACPPPCPPD AAAAAAAACPCPPD AAAAAACPPPCPD AAAAAAAACPPCPD
 48    20 AAAACPPCPPP AAAAAACPCPPP AAAAACPPACPP AAAACPPPCPP AAAAAAAACPCPP AAAAAACPPPCP AAAAAAAACPPCP
 49    21 AAAACPPCPPPA AAAAAACPCPPPA AAAAACPPACPPA AAAACPPPCPPA AAAAAAAACPCPPA AAAAAACPPPCPA AAAAAAAACPPCPA
 50    21 AAAAAAAAAACPPPP AAAAACPCPPPP AAAAAACPPPACP AAAAAAAACPPACP AAAAACPPPPCP
 51    21 AAAAACPPAACPP AAAACPPPACPP AAAAAAAACPACPP AAAAAACPPDCPP
 52    21 AAAACPPACPPP AAAAAACPACPPP
 53    21 AAAAAACPPCPPD
 54    20 AAAAAACPPCPP
 55    21 AAAAAACPPCPPA
 56    21 AAAAAAACPCPPP AAAAAAACPPPCP
 57    21 AAAAAACPPACPP
 58    22 AAAAAACPPACPPA AAAAAAACPPPACP
 59    22 AAAAACPPCPPPD AAAAACPPPCPPD
 60    21 AAAAACPPCPPP AAAAACPPPCPP
 61    22 AAAAACPPCPPPA AAAAACPPPCPPA
 62    22 AAAAAAACPPCPPD
 63    21 AAAAAAACPPCPP
 64    22 AAAAAAACPPCPPA AAAAACPPACPPP AAAACPPPCPPP AAAAAAAACPCPPP AAAAAAAACPPPCP