.
This is a fun and instructive exercise that
combines some elements from 15-112,
15-122, 15-150, and 15-251. We will use normal Python integers
to represent lists, sets, maps, strings, and finite state
machines. While there are more efficient ways to represent
all those things, it is fascinating that arbitrary-length
Python integers alone are sufficient for all of that and more!
Since this problem has many parts, and it would not be fair if you only got
credit if you completed ALL of them, there are several checkpoints that you
can complete to get chunks of the 4 points available from this problem:
- Background
Since we plan to include multiple integers in a single
integer, we will start by encoding integers with a prefix
of the number of digits in that integer. For example, we might
encode 25 as 225, where the first 2 says there are 2 digits.
But then we'd have to encode 1234567890 with a count of 10. But
then we'd have to know how many digits our count itself has!
And we are now recursing. Oh no!
Our solution, which is not fully general but which is good enough
for our purposes, is to first have a count-count of the number
of digits in the count.
That first count-count will always be exactly one digit,
so the count itself can be between 1 and 9 digits. So the largest
digit count is 999,999,999. So this approach does not allow
numbers with one billion or more digits.
We can live with that restriction.
We also have to deal with the sign (+/-) of the number. Normally
this might be 0 for positive and 1 for negative, but leading 0's
can cause confusion, so we'll use 1 for positive and 2 for
negative.
Thus, we will encode numbers as such:
[sign-digit] [count-count] [count] [number]
.
So, for example, to encode 789, the sign-digit is 1 (positive).
The count is 3, so the count-count is 1. Thus, the entire encoding of 789 is 113789.
For another example, to encode -1234512345, the sign-digit is 2 (negative). The count is 10, so the count-count is 2. Thus,
the entire encoding of -1234512345 is 22101234512345.
- lengthEncode(n)
Time to write some code! Write the function lengthEncode(n) that takes a possibly-negative Python integer and returns the length prefix encoded integer as described above. Here is a test
function:
def testLengthEncode():
print('Testing lengthEncode()...', end='')
assert(lengthEncode(789) == 113789)
assert(lengthEncode(-789) == 213789)
assert(lengthEncode(1234512345) == 12101234512345)
assert(lengthEncode(-1234512345) == 22101234512345)
assert(lengthEncode(0) == 1110)
print('Passed!')
Hint: while not required, we found it very helpful to write the
helper function intCat(n, m) that took two non-negative integers
and returned their concatenation. So, for example,
intCat(123, 45) returns 12345.
- lengthDecode(n)
Now write lengthDecode(n) that performs the inverse operation
of lengthEncode(n), so:
def testLengthDecode():
print('Testing lengthDecode()...', end='')
assert(lengthDecode(113789) == 789)
assert(lengthDecode(213789) == -789)
assert(lengthDecode(12101234512345) == 1234512345)
assert(lengthDecode(22101234512345) == -1234512345)
assert(lengthDecode(1110) == 0)
print('Passed!')
Hint: while not required, we found it very helpful to write a
helper function that could extract a "substring" of an integer,
i.e. if I wanted to extract digits 2 through 4 of 1511242
it would return 112.
- lengthDecodeLeftmostValue(n)
The whole point of using the length prefix was to allow us
to place multiple integers one after another inside a single
larger integer. For example, we encode 2 as 1112 and we encode
789 as 113789, so we can encode a 2 followed by a 789 as
simply 1112113789.
Thus, we need a way to decode just the leftmost value.
For that, write the function
lengthDecodeLeftmostValue(n) that takes a number with
one-or-possibly-more encoded values and decodes just the
leftmost value. This function has to return two values:
the decoded value, and also the rest of the encoded values!
So lengthDecodeLeftmostValue(1112113789) returns (2, 113789),
meaning that 2 was the leftmost value, and after removing
the 2, 113789 still remains.
Thus, we call this function using this pattern:
encodedValues, nextValue = lengthDecodeLeftmostValue(encodedValues)
With that in mind, write lengthDecodeLeftmostValue(n)
so that the following test function passes:
def testLengthDecodeLeftmostValue():
print('Testing lengthDecodeLeftmostValue()...', end='')
assert(lengthDecodeLeftmostValue(111211131114) == (2, 11131114))
assert(lengthDecodeLeftmostValue(112341115) == (34, 1115))
assert(lengthDecodeLeftmostValue(111211101110) == (2, 11101110))
assert(lengthDecodeLeftmostValue(11101110) == (0, 1110))
print('Passed!')
Hint: you may want to think carefully about the case where
the second-to-leftmost value is a 0. The last two tests
above deal with that case.
PS: We highly reccomend completing this function BEFORE lengthDecode
- Lists
We can now use a sequence of length-prefix-encoded integers
to represent a list of integers. We will just prefix the
entire list with the length of that list.
So, for example, the Python list [9, 8888]. This list is
of length 2, so we will encode it as the values 2, 9, 8888.
Since these are encoded as 1112, 1119, and 1148888,
respectively, we will encode the entire list
as 111211191148888.
Note that an empty list is just a list of length 0, and so
it is just the length-prefix-encoded value of 0, which is 1110
(the result of calling lengthEncode(0)).
With that, here are the functions you need to write:
- newIntList(): takes no argument, returns an empty list.
- intListLen(L): takes a list, returns its length (the leftmost encoded value).
- intListGet(L, i): takes a list and an index, and returns the decoded value at that index. Returns the string 'index out of range' as needed.
- intListSet(L, i, v): takes a list and an index, and returns a new list with the value at the given index changed to be the encoded value of v. Returns the string 'index out of range' as needed.
- intListAppend(L, v): takes a list and a value, and returns a new list with that value (encoded) appended.
- intListPop(L): takes a list and removes the last value
("pops" it).
It then returns two values:
the list without that popped value,
and that popped value itself.
And here is the test function to pass:
def testIntList():
print('Testing intList functions...', end='')
a1 = newIntList()
assert(a1 == 1110) # length = 0, list = []
assert(intListLen(a1) == 0)
assert(intListGet(a1, 0) == 'index out of range')
a1 = intListAppend(a1, 42)
assert(a1 == 111111242) # length = 1, list = [42]
assert(intListLen(a1) == 1)
assert(intListGet(a1, 0) == 42)
assert(intListGet(a1, 1) == 'index out of range')
assert(intListSet(a1, 1, 99) == 'index out of range')
a1 = intListSet(a1, 0, 567)
assert(a1 == 1111113567) # length = 1, list = [567]
assert(intListLen(a1) == 1)
assert(intListGet(a1, 0) == 567)
a1 = intListAppend(a1, 8888)
a1 = intListSet(a1, 0, 9)
assert(a1 == 111211191148888) # length = 2, list = [9, 8888]
assert(intListLen(a1) == 2)
assert(intListGet(a1, 0) == 9)
assert(intListGet(a1, 1) == 8888)
a1, poppedValue = intListPop(a1)
assert(poppedValue == 8888)
assert(a1 == 11111119) # length = 1, list = [9]
assert(intListLen(a1) == 1)
assert(intListGet(a1, 0) == 9)
assert(intListGet(a1, 1) == 'index out of range')
a2 = newIntList()
a2 = intListAppend(a2, 0)
assert(a2 == 11111110)
a2 = intListAppend(a2, 0)
assert(a2 == 111211101110)
print('Passed!')
- Sets
A set is similar to a list, only with the restriction that
values can only appear once, even if they are added multiple
times. While there are more efficient approaches, we can represent
sets using lists. In particular, when you add a value, we can
first check if it is already in the set, and if so, do nothing.
So we only add values that are not already in the set. That's it!
With that, here are the functions you need to write:
- newIntSet(): returns a new empty set (which is the same as a new empty list, which is the same as the length-prefix-encoded value for 0).
- intSetAdd(s, v): takes a set and a value, and returns the same set if the value is already in it, or a new set that appends the value to the end of the set s.
- intSetContains(s, v): takes a set and a value, and returns True if that set contains that value, and False otherwise.
And here is the test function to pass:
def testIntSet():
print('Testing intSet functions...', end='')
s = newIntSet()
assert(s == 1110) # length = 0
assert(intSetContains(s, 42) == False)
s = intSetAdd(s, 42)
assert(s == 111111242) # length = 1, set = [42]
assert(intSetContains(s, 42) == True)
s = intSetAdd(s, 42) # multiple adds --> still just one
assert(s == 111111242) # length = 1, set = [42]
assert(intSetContains(s, 42) == True)
print('Passed!')
- Strings
We can convert individual characters into numbers using
the Python function ord(c), and we can undo that and convert
those numbers back into characters using chr(n). Thus,
we can store a Python String as a single integer by creating
an encoded list of those ord values.
With that, here are the functions you need to write:
- encodeString(s): takes a Python string s and
returns a length-prefixed list of the ord values of
the characters in s.
- decodeString(L): takes a length-prefixed list L
and returns the corresponding Python string.
And here is the test function to pass:
def testEncodeDecodeStrings():
print('Testing encodeString and decodeString...', end='')
assert(encodeString('A') == 111111265) # length = 1, str = [65]
assert(encodeString('f') == 1111113102) # length = 1, str = [102]
assert(encodeString('3') == 111111251) # length = 1, str = [51]
assert(encodeString('!') == 111111233) # length = 1, str = [33]
assert(encodeString('Af3!') == 1114112651131021125111233) # length = 4, str = [65,102,51,33]
assert(decodeString(111111265) == 'A')
assert(decodeString(1114112651131021125111233) == 'Af3!')
assert(decodeString(encodeString('WOW!!!')) == 'WOW!!!')
print('Passed!')
- Maps
A map is a data structure that
converts keys into values (in Python this is called
a "dictionary").
For example, we could use a map
m to store country capitals. We could
call intMapSet(m, 'France', 'Paris') to set the capital of 'France'
to 'Paris'. Then, we could retrieve the capital by
calling intMapGet(m, 'France'), which should return 'Paris'.
Maps convert keys to values. So in this example,
'France' is the key, and 'Paris' is the value.
Here, we are only using integers, but the idea is the same:
a map will convert keys to values. And while there are
more efficient approaches, we will store a map as a list
of alternating values: key1, value1, key2, value2,...
Thus, an empty map is just an empty list. Then, if we map the
value 42 to 73, we get the list [42, 73], which is encoded
as 11121124211273. Then, if we re-map the value 42 to 98765,
since 42 can only be present once as a key, we change the 73
to instead be 98765, so we get the list [42, 98765].
With that, here are the functions you need to write:
- newIntMap(): takes no arguments and returns an empty map, which is an empty list.
- intMapContains(m, key): takes a map and a key, and returns
True if that map contains that key (as a key, not a value),
and False otherwise.
- intMapGet(m, key): takes a map and a key, and returns
the value associated with that key in that map, or
the string 'no such key' as appropriate.
- intMapSet(m, key, value): takes a map, a key and a value,
and returns a new map which is the same as m only
with the given key mapped to the given value. If the key
was already in m, this involves modifying the value the
key maps to. Otherwise, this involves adding the key and
value to the end of the list.
And here is the test function to pass:
def testIntMap():
print('Testing intMap functions...', end='')
m = newIntMap()
assert(m == 1110) # length = 0
assert(intMapContains(m, 42) == False)
assert(intMapGet(m, 42) == 'no such key')
m = intMapSet(m, 42, 73)
assert(m == 11121124211273) # length = 2, map = [42, 73]
assert(intMapContains(m, 42) == True)
assert(intMapGet(m, 42) == 73)
m = intMapSet(m, 42, 98765)
assert(m == 11121124211598765) # length = 2, map = [42, 98765]
assert(intMapGet(m, 42) == 98765)
m = intMapSet(m, 99, 0)
assert(m == 11141124211598765112991110) # length = 4, map = [42, 98765, 99, 0]
assert(intMapGet(m, 42) == 98765)
assert(intMapGet(m, 99) == 0)
print('Passed!')
- Finite State Machines (FSM's)
A Finite State Machine (FSM) is a theoretical machine that
is useful for modeling certain kinds of limited computation.
You can read about them online. The basic ideas are these:
- An FSM has states. Ours will be numbered.
- An FSM has a special start state. Ours will be state 1.
- An FSM has transitions, which are rules about how to move from one state to another. A transition is specified by 3 values -- a startState, a symbol, and an endState. If
the FSM is in the startState when it sees that symbol, it
will transition to the endState. In our case, the startState and the endState will both be integers, and
the symbol will be a single digit.
- An FSM has accepting states. If the FSM finishes
reading symbols and it is in an accepting state,
we say that the FSM accepts those symbols.
Our FSM will work in this way.
We will represent our FSM as a list of two values:
a map of transitions, and a set
of accepting states. Each
transition itself maps a startState to yet another map,
that second map mapping symbols (digits) to endStates.
This is a bit complicated, and you may need to read this
a few times, think about it, and carefully study the
examples in the test code below to fully understand it.
With that, here are the functions you need to write:
- newIntFSM(): takes no arguments and returns an empty FSM, which contains an empty map of transitions and an empty
set of accepting states.
- isAcceptingState(fsm, state): takes an fsm and a state,
and returns True if that state is in the set of accepting
states, and False otherwise.
- addAcceptingState(fsm, state): takes an fsm and a state,
and returns a new FSM that is the same except that the
given state is added to the set of accepting states.
- setTransition(fsm, fromState, digit, toState):
returns a new FSM that is the same as the given fsm
except with this new transition added. This involves
updating two maps -- the outer map (mapping each fromState
to its own map), and the inner map (mapping each
digit to its toState). Look closely at the examples
in the test function for details.
- getTransition(fsm, fromState, digit):
returns the toState that is mapped to by this fromState
on this digit, or the string
'no such transition' as appropriate.
- accepts(fsm, inputValue): takes an fsm and an inputValue,
and returns True if the FSM accepts that inputValue,
and False otherwise. To do this, it starts the FSM in
its startState (1), then it transitions on each symbol
from left-to-right in the inputValue. When it is done
with all the values, it checks if it is in an
accepting state. Note that if any transition is missing,
this is not an error, it just means that the FSM does
not accept the input.
- states(fsm, inputValue): takes an fsm and an inputValue,
and does basically the same thing as accepts(fsm, inputValue), only here instead of returning True or
False, this returns a list (length-encoded-prefix list)
of all the states that the FSM visited while computing
the solution. This is mainly for testing purposes,
so the autograder can verify your machine is working
properly. Again, see the test code for details.
And here is the first test function to pass:
def testIntFSM():
print('Testing intFSM functions...', end='')
fsm = newIntFSM()
assert(fsm == 111211411101141110) # length = 2, [empty stateMap, empty startStateSet]
assert(isAcceptingState(fsm, 1) == False)
fsm = addAcceptingState(fsm, 1)
assert(fsm == 1112114111011811111111)
assert(isAcceptingState(fsm, 1) == True)
assert(getTransition(fsm, 0, 8) == 'no such transition')
fsm = setTransition(fsm, 4, 5, 6)
# map[5] = 6: 111211151116
# map[4] = (map[5] = 6): 111211141212111211151116
assert(fsm == 1112122411121114121211121115111611811111111)
assert(getTransition(fsm, 4, 5) == 6)
fsm = setTransition(fsm, 4, 7, 8)
fsm = setTransition(fsm, 5, 7, 9)
assert(getTransition(fsm, 4, 5) == 6)
assert(getTransition(fsm, 4, 7) == 8)
assert(getTransition(fsm, 5, 7) == 9)
fsm = newIntFSM()
assert(fsm == 111211411101141110) # length = 2, [empty stateMap, empty startStateSet]
fsm = setTransition(fsm, 0, 5, 6)
# map[5] = 6: 111211151116
# map[0] = (map[5] = 6): 111211101212111211151116
assert(fsm == 111212241112111012121112111511161141110)
assert(getTransition(fsm, 0, 5) == 6)
print('Passed!')
That test function verifies that the FSM is setup properly.
This next test function verifies that it runs properly:
def testAccepts():
print('Testing accepts()...', end='')
fsm = newIntFSM()
# fsm accepts 6*7+8 => See the diagram
fsm = addAcceptingState(fsm, 3)
fsm = setTransition(fsm, 1, 6, 1) # At state 1, receive 6, move to state 1
fsm = setTransition(fsm, 1, 7, 2) # At state 1, receive 7, move to state 2
fsm = setTransition(fsm, 2, 7, 2) # At state 2, receive 7, move to state 2
fsm = setTransition(fsm, 2, 8, 3) # At state 2, receive 8, move to state 3
assert(accepts(fsm, 78) == True)
assert(states(fsm, 78) == 1113111111121113) # length = 3, list = [1,2,3]
assert(accepts(fsm, 678) == True)
assert(states(fsm, 678) == 11141111111111121113) # length = 4, list = [1,1,2,3]
assert(accepts(fsm, 5) == False)
assert(accepts(fsm, 788) == False)
assert(accepts(fsm, 67) == False)
assert(accepts(fsm, 666678) == True)
assert(accepts(fsm, 66667777777777778) == True)
assert(accepts(fsm, 7777777777778) == True)
assert(accepts(fsm, 666677777777777788) == False)
assert(accepts(fsm, 77777777777788) == False)
assert(accepts(fsm, 7777777777778) == True)
assert(accepts(fsm, 67777777777778) == True)
print('Passed!')
Here is a diagram explaining this FSM:
And another diagram visualizing the integer data structure form of the FSM
and its relationship to the various states and transitions.
- In Summary
11241112421124211242112321128911311111311711232113109112971131001131011123211310511311611233112321127111311411310111297113116112321131061131111129811233112321128711311111311111310411311111311111233112331123311232112421124211242
That's it. Have fun!!!