Wednesday 3 December 2014

Induction

Finally, we are introduced to Induction in this class. I don't really understand why we were not introduced to introduction when we were first doing proofs, given that it is so useful. Actually, induction seems so simple and straightforward that its power is sometimes underestimated at first. At least I was one of them. However, upon closer examination, I realized that induction actually proves some of our assumptions in our previous proofs. For example, one of our first proofs was "for all natural numbers n, if n is odd, then n^2 is odd." We proved that by first assuming that n is a natural number and n is odd. How do we know that if n is odd then n^2 is odd is true for all natural numbers? Do we plug every natural number in and check the result? We could do that until the end of time, and still be unable to get a conclusion. Therefore, we need a more powerful tool of logic to help us out.

Induction is really an axiom, and like all axioms, it is self evident. Lets break it down.

  • let P(n) be a predicate
  • if P(0) is true and for all natural numbers n, (P(n)=>P(n+1)) is true
  • then for all natural numbers n, P(n) is true!
So if P(0) is true, and P(0) => P(1) is true, then P(1)=> P(2) is true, and P(2) => P(3) is true... and so on...

Now lets prove the following proposition:
It's interesting to note, that I didn't really introduce anything new here... We were already given the equation sum of i = n(n+1)/2, and we simply proved that it's true. However, induction did not give us that equation nor did it give us the reasoning behind it. Is that satisfying?

Visualizing Big-O

When first introduced to the formal definition of Big-O, -Omega, and -Theta, it can be a little daunting to absorb everything and understand them. At least I admit that I was overwhelmed by all of the information. So how does one compartmentalize these definitions, and use them for algorithm analysis and proofs?

I don't think there is only one way, instead different people learn through different techniques. However, what works for me is to visualize the definitions in terms of graphs rather than focus on the definitions, which express the same information anyways. For Big-O, I imagine two curves that grow in different rates, and a point on the x-axis as the 'breakpoint', such that beyond that point, one curve is always slower at growth than the other. It goes to reason that that curve is in the Big-O of the faster growing curve.

Now that we understand Big-O, the scaffold is set, and we can easily understand Big-Omega, with just one minor alteration. Instead of saying that the slower curve is in Big-Omega, we say that the faster curve is in Big-Omega. That effectively makes the curve an upper-bound. Finally, Big-Theta is simply the conjunction between Big-O and Big-Omega.

As mentioned before, we use Big-O notation to help us explain the complexity of an algorithm. In reality, we want algorithms that run in O(n) or O(log n), and never O(n^2) or O(n^3), because the run-time would grow exponentially based on n.

Sunday 30 November 2014

Complexity and Big-O

Previously, we just talked about algorithm run-time, but we did not talk in detail about how run-time is determined. Run-time is stated as a function relating the input to the complexity of an algorithm. How do we determine complexity? We use Big-O notation.

Best and Worst

In terms of complexity, we must first account for best and worst cases. The best case is the case where the algorithm takes the minimum running time to complete over an input. For example, in linear search, if the first item in the input is the item we are searching for, then we have a best case. The worst case on the other hand is if the item we are searching for isn't even in the list, in which case the algorithm would take the maximum running time to complete. We typically care about the worst case scenario, because that is what cause algorithms to take a long time to run in the real world. Although it seems pessimistic to look at the worst case all the time, we should be aware that it happens often.

There is also an average-case, which is the expected running-time under random inputs following certain probability distributions. This is rather difficult to determine, because it depends on the probability distribution, which is based on what?

Big-O notation

What is this Big-O notation that we have been hearing about? When relating complexity and algorithm run-time, what we really care about is GROWTH. How does the run-time grow based on the input? Big-O notation is a method for characterizing that growth. Lets break it down.
  • running time is proportional to the number of steps as a function of input size (n). 
  • what we really care about is how this number GROWS as a function of n
  • we don't really care about the absolute number of steps, so 2000 and 2004 are the same.
  • if we compare two functions 2n2 and 100n2 we only care about the n2
  • so constant factors don't matter
  • what about 2n2+n and n2? we only care about n2 because when n is really big, 2n is negligible compared to n2
  • think about it like this: as input grows, do you care if an algorithm takes 100 years or 200 years to run? Not really... both are too long...
  • we end up using a model of asymptotic growth, which deals with how the complexity grows as you reach the limit of the size of the input
  • typically done using O(f(n)
So, if we see O(n), it means that the algorithm's running time grows no faster than n. More specifically, if we see f(n) in O(n2), it means that the function is f(n) upper-bound by n2.

There are many classes of functions. Such as linear, logarithmic, and exponential. Linear and log functions grow at a rate that is within reason in reality. If an algorithm's complexity grows faster than these, such as exponential, then its run-time grows very quickly. In real life, we don't want that!

Algorithm Run-time

From this point on, the course switches gears a little bit, and will begin to discuss algorithms, efficiency, and complexity. First lets talk a little bit about efficiency and orders of growth.

Although it may sound intuitive, but when first thinking about efficiency, one might ask "why is efficiency important?"
    The answer is that efficiency determines how long our programs takes to run, which is called run-time.

This is not to be confused with the time in seconds an algorithm takes to run. For instance, if one algorithm took 2 seconds to run, and another algorithm took 0.5 seconds to run, can you say that the second is faster than the first? No, because I did not specify many necessary factors that needs to be taken in account. For one, we need to think about the speed of the processor, as well as the implementation of Python. It also depends on the amount of data input, and in terms of extremely large data sets, some algorithms might take a millennium to run!

More importantly, efficiency is about algorithms, not about coding structure or clever coding.

So the way we normalize across these factors, is to talk about efficiency in terms of the steps an algorithm takes to produce a result based on an input. A step is both a physical thing and an abstract idea, in that we assume each step takes constant time.

Proof by Cases

Some proofs need to be broken down into different cases, through which the whole proof can be evaluated. This makes proving complex proofs easier. In fact, I think we already used this method for proving equivalence. In class, we showed that an implication "P implies Q" is equivalent to its contrapositive "not Q implies not P" by showing:

P Q P implies Q not Q implies not P
T T T T
T F F F
F T T T
F F T T
In each of the four cases above, we showed that P implies Q is true if and only if not Q implies not P is true. Even when P is True and Q is False, both the implication and its contrapositive are False, which means the equivalence still stand. Also note that proof by cases work in a much broader environment than propositions involving boolean variables.

Proofs and non-Boolean Functions, disproofs, and false proofs!

Non-Boolean Functions

In class, we discussed about non-boolean functions such as the floor function which takes in a number and returns the largest integer that is less than or equal to that number. The non-boolean functions can be used to set up more interesting predicates for evaluation. For example, prove that n2 + 3n + 127 is a prime number for all natural numbers n. Ok, lets take natural number 0, yes, 127 is a prime number. What about 1? If n = 1, then it equals to 131, yes that is a prime. 2? No...2 is not prime. So the proposition is incorrect. So for non-boolean functions, some input values would be ok, while others are not. The main takeaway is that just because a predicate seems to be true for some values, doesn't mean the whole proposition is true.

Disproving

Now, how do we disprove something? For example, how do we disprove the statement 2+2=5? We just need to prove that 2+2 does not equal to 5. So to disprove any statement S, we need to prove that not S is true. This is very useful, because we would not always get to prove true propositions, we would need to know how to disprove something that is false. The most important thing to notice is that the proposition is false, then we can negate it and start disproving. If we fail to notice that a proposition is false, we can try to prove it, but it wouldn't go anywhere. It is a very bad thing to prove something False to be True.

False Proofs

I think this is the funniest topic in this course. We must remember that the real world exists outside of the realm of mathematics, and there are many types of proofs out there, some of which do not actually prove anything! I think the most prevalent false proof out there is the proof by gut feeling. Our intuition is good for rough estimations and predictions based on bad evidence, which was useful in the wild, when we needed to escape predators lurking in the bushes, but not good for establishing truths.

Saturday 29 November 2014

Indirect Proofs - Proof by Contradiction

Proof by contradiction starts with an assumption that the proposition in question is false, followed by a series of logical deductions that lead to a contradiction of something that we all know to be true, then we can say that the proposition must be true. In simpler terms... if a proposition were false, then something false would be true, and since false cannot be true, our proposition better not be false.

This all sounds very abstract... so lets break it down.
Let P be a proposition
To prove P is true,
    we assume P is false (not P is True)
        this leads to some falsehood (such as 2 is not an even number)
    then not P -> falsehood is True (How? Remember truth tables?)
        the only way for this to work is if not P is false, which means P is true.
This is the same as proving the contrapositive of Truth -> P