|
On February 20 2019 10:34 travis wrote: I have a assigned question to determine the last two digits of 99^14 by using modulus 100 so I know 99 is congruent -1 (mod 100) which means 99^14 is congruent (-1)^14 (mod 100) which means the last two digits must be 01 because -1^14 = 1
did I do this correctly? Hi - yes you did.
You can check this property because if a = bm+c and g = hm+k, then ag = (bm+c)(hm+k) = (hbm+ch+bk)m+ck. Thus ag (mod m) \equiv ck (mod m).
Applying this to your problem, you have 14 99s, and for each one, c = k = -1 (without loss of generality). So you combine the first two, and get that it is equivalent to 1, then you take 99^2 and 99 together. Well 99^2 \equiv 1 (mod 100) so that means that it's remainder must be 1. So now you have c = 1, k= -1. Thus 99^3 \equiv -1 (mod 100). We can use induction to show that what you said is correct.
|
Thanks Rodya.
I've got some more review checking to do, this one is probability.. I've always been especially bad at probability, it confuses me pretty badly sometimes.
the question says that a student takes an hour long exam (so x is time, where x is {0,1}), with a probability of finishing within time x = x/2.
It then says that the student is still working after .75 hours, and what is the probability the student will take the entire hour to finish?
edit: I removed the crap I wrote because I realized I must be doing this question wrong... I will come back and fill in what I try doing
edit2:
I think I figured it out. I need P(x=1 | x>=.75)
which = P(x >= .75 and x = 1)/P(x >= .75)
which = P(x = 1)/P(x >= .75)
which are
(1-1/2)/(1-.75/2) = .8
the key was I forgot I need to subtract both values from one because the original probability was the chance of finishing it in that time, not of going past that time. I need to pay more careful attention when I am reading
|
On February 21 2019 04:26 travis wrote: Thanks Rodya.
I've got some more review checking to do, this one is probability.. I've always been especially bad at probability, it confuses me pretty badly sometimes.
the question says that a student takes an hour long exam (so x is time, where x is {0,1}), with a probability of finishing in time x = x/2.
It then says that the student is still working after .75 hours, and what is the probability the student will take the entire hour to finish?
I've tried to work it out and I've tried to get some guidance online but I am not making sense of it
What I did is:
P(A) = .75/2 P(B) = 1/2
I think I want P(B|A)
And I think P(B|A) = P(A and B)/P(A)
But that's where I am getting really confused. Isn't P(A and B) = P(B) ?
Which gives
P(B|A) = (1/2)/(.75/2)
which is 1.33333.. which is greater than 1. What am I doing wrong lol Took me a few minutes to understand. Just to clarify, let me state what I guess the unwritten assumptions are: You meant to say P(t <= x)=x/2 where t is the time of finishing the the exam for x in [0,1) (instead of {0,1})? And that P( t= 1)= 1/2, because the student is forced to finish the exam once the hour has passed?
If A = "student still working at 0.75 hours" and B= "student takes full hour" then you are indeed looking for P(B|A). However, when calculating the probabilty of A, you are apparently confusing A with its complementary event.
|
Yeah, your assumptions are correct, sorry for having you read through that only to find I had edited it afterwards.
|
On February 21 2019 04:47 travis wrote: Yeah, your assumptions are correct, sorry for having you read through that only to find I had edited it afterwards. No problem, I have to get back into this stuff anyway, otherwise I wouldnt even have bothered
|
You are right but Id mark off a point for your not so right derivation. Note first that f(x) = x/2 is the cdf of the finish-time variable, which I will call y. The pdf of y must be g(x) = 1/2. Probability must sum to 1 so that means the integral of (1/2) over the domain of g should be 1. So that means g ranges from 0 to 2. Thus there is a 50% chance that the exam is not completed within the hour.
Now, p(y=1 |y>=.75) = p(y=1)/p(y>=.75) = (1/2)/(5/8) = 0.8. The calculation is helped by the fact that p(y>=.75) = 1 - p(y< .75) = 1-p(x=.75). You did the right numbers but you got them the wrong way. P(x>=.75) = the integral from .75 to 2 of x/2. That comes out to 23/32.
|
I am not understanding your explanation of why you would mark a point off.
|
You calculated the wrong thing. P(x=1) is the probability that you finish WITHIN the hour, not ON the hour.
|
The best way to think about your problem is a hybrid distribution with both continuous and discrete components. Namely you have a continuous probability density on [0,1) and then a discrete probability mass of a half at 1. In terms of the cdf function that would be F(x) = x/2 for x in [0,1) and then a jump at 1 so F(x) = 1 at x=1. It doesn't make any sense to try to extend your sample space to [0,2] in the manner that Rodya is saying.
|
On February 21 2019 05:49 phwoar wrote: The best way to think about your problem is a hybrid distribution with both continuous and discrete components. Namely you have a continuous probability density on [0,1) and then a discrete probability mass of a half at 1. In terms of the cdf function that would be F(x) = x/2 for x in [0,1) and then a jump at 1 so F(x) = 1 at x=1. It doesn't make any sense to try to extend your sample space to [0,2] in the manner that Rodya is saying. I agree with Rodya for subtracting the point. And I agree with you that this makes more sense as having a continuous and a discrete component.
Travis, if you don't understand your mistake, try doing the exact same exercise with the cdf F(x) = x/3.
+ Show Spoiler +You got lucky that 1 hour was the halfway point, and therefore P(x=1) = P(x < 1) = 0.5. With the cdf x/3 this is no longer the case, and you have to use P(x = 1 | x >= 0.75) = P(x = 1) / P(x >= 0.75) = 2/3 / 5/8, which is different from P(x < 1 | x >= 0.75) = 1/3 / 5/8....
|
I see the mistake, thanks!
Now on to something a bit more less common.
I need to know how to differentiate matrices... but this is something we've literally never done, and everything I am reading online is tough to follow, and i am confused to heck.
Starter problem:
compute the derivative of the function f(w) with respect to w_i, where w,x are in R^D and:
f(w) = 1/(1+e^(-w^t*x))
I honestly have no idea what to do. What's got me confused here is "with respect to w_i". I am expecting this problem is actually simpler than the later problems, but I am not sure what it's asking for. It's asking for the derivative of a given element of w ? god i can't wrap my head around this I don't have a strong enough basis in calculus
edit: am I literally just finding the derivative of 1/(1+e^(-x * w_i))?
edit2: or is it the summation of said derivatives?
|
On February 22 2019 00:42 travis wrote: I see the mistake, thanks!
Now on to something a bit more less common.
I need to know how to differentiate matrices... but this is something we've literally never done, and everything I am reading online is tough to follow, and i am confused to heck.
Starter problem:
compute the derivative of the function f(w) with respect to w_i, where w,x are in R^D and:
f(w) = 1/(1+e^(-w^t*x))
I honestly have no idea what to do. What's got me confused here is "with respect to w_i". I am expecting this problem is actually simpler than the later problems, but I am not sure what it's asking for. It's asking for the derivative of a given element of w ? god i can't wrap my head around this I don't have a strong enough basis in calculus
edit: am I literally just finding the derivative of 1/(1+e^(-x * w_i))?
edit2: or is it the summation of said derivatives? The 'derivative' of a linear transformation matrix A is itself.
I'm guessing w^t is the transpose of w? And I suppose the * in w^t * x is the dot product? For future reference, most people would prefer you write w^T not w^t. Allow me to write out the function more explicitly for you.
f(w_1, w_2, ..., w_D) = 1/(1+e^(-(w_1 x_1+w_2 x_2 + ... + w_D x_D)))
Now, the problem wants you to take the partial derivative of this function with respect to w_i. Hope this helps.
|
I still don't understand. Up to this point it's pretty much what I thought, but I still don't get what it means to take the partial derivative with respect to w_i
if I do that, how do I even represent the rest of the vector entries in the derivative?
Ah.. hrm.. is the answer:
+ Show Spoiler +
x_i*e^(-w^Tx) / (1+e^(-w^Tx))^2
?
|
|
okay fantastic. Now is where it gets a lot harder. I will be back soon lol I really appreciate the help, they really threw us in the deep end here, I think the upcoming questions are typically graduate level stuff and im not even good at math for an undergrad
it's for a deep learning course, if you're curious. the course isn't even really a theory class or anything but I guess it's good for me to get some understanding of this stuff
|
Okay, next, compute the derivative of J(w) with respect to w
J(w) =
1/2(sum from i to m) of: |w^Tx^(i)-y^(i)|
the notation here might seem weird, it does to me at least, they are using ^(i) to represent the ith data point
I believe, the derivative of a sum is the sum of derivatives? I think I read that? So do I pull out the 1/2 and summation, and do d/dw|w^Tx^(i)-y^(i)|
and then that derivative is equal to x(w^(i)x^(i)-y^(i)) / |w^(i)x^(i)-y^(i)| ?
To be honest, i am really confused by w^T*x^(i). x and w are both vectors here...? what does w^Tx^(i) even mean? If we have
w^T = [.4, .6] and x^T = [3, 7] and i = 1... then what is w^Tx^(i) saying to do?
|
Usually, w^T means the transpose of w.
That means, if w is a column vector, w^T is the row vector with the same numbers
(w1) w2 w3
becomes (w1, w2, w3)
That means that w^T x is simply the scalar product of w and x. The reason people use this way of writing is that you don't have to specifically define the scalar product which you want to use, and can instead use the normal rules of matrix multiplication. (And it is more generally useable)
https://en.wikipedia.org/wiki/Transpose
|
I understand that, but since x is a vector, it's saying it wants the scalar product of the entire w vector with just entry i of vector x ?
like.. w_1*x_i + w_2*x_i ..... + w_m*x_i ?
like, back to my example:
w^T = [.4, .6] and x^T = [3, 7] and i = 1... then what is w^Tx^(i) saying to do?
|
Firstly, you are right that the sum of derivatives is equal to the derivative of the sum. One way to remember this is by remembering that the derivative operation is *linear*.
Secondly, w^T*x^(i) basically means the dot product of w and x^(i). They are both vectors, so taking the dot product makes sense. The reason one of them is transposed is so that you can ignore dot products and just do matrix multipication (yes vectors are matrices). For your example we have w^T*x^(i) = 3(0.4) + 7(0.6).
Thirdly, when we talk about the derivative of a function with respect to a vector, we are talking about the gradient of that function. You can google it, but the gradient is just a vector containing all the first order partial derivatives of the function. I would be wrong about this, however, if your book has decided to use the word "derivative" to mean "total derivative".
Fourthly, the derivative that you gave is close to being correct. You are right that the funcitonal form of the derivative of |f(x)| is equal to f'(x)f(x)/|f(x)|, however you need to take into account the third point (so your answer should be a vector). Also, you may want to know that this derivative may not defined at certain points w for which wx^(i)-y^(i) = 0. But wherever the derivative is defined... you will have d|f(x)|/dx = f'(x)f(x)/|f(x)|.
Lastly, a couple other problems with your answer. w^(i) doesn't make sense - only x and y are indexed by i. x^(i) is a vector, not the i'th entry of x. The j'th entry of x^(i) is denoted (x^(i))_j. Also, that x on the far left of your answer should be (x^(i))_j or something, since you are taking a PARTIAL derivative with respect to w_j, not w itself.
I said a lot but hopefully I didn't confuse you, you seem to pretty much get it. The third point I wrote is really the only important one.
|
On February 22 2019 06:41 Rodya wrote: Firstly, you are right that the sum of derivatives is equal to the derivative of the sum. One way to remember this is by remembering that the derivative operation is *linear*.
Secondly, w^T*x^(i) basically means the dot product of w and x^(i). They are both vectors, so taking the dot product makes sense. The reason one of them is transposed is so that you can ignore dot products and just do matrix multipication (yes vectors are matrices). For your example we have w^T*x^(i) = 3(0.4) + 7(0.6).
I said a lot but hopefully I didn't confuse you, you seem to pretty much get it. The third point I wrote is really the only important one.
you definitely aren't making my confusion worse, only better, but this is the part that I don't understand.
is my confusion here just from a lack of understanding of the notation. I am wondering if this is somethign you guys are trying to tell me.
Lastly, a couple other problems with your answer. w^(i) doesn't make sense - only x and y are indexed by i. x^(i) is a vector, not the i'th entry of x. The j'th entry of x^(i) is denoted (x^(i))_j.
to further clarify my confusion, if x^(i) is a vector, then we are saying that x is potentially a > 1 dimensional matrix (so it potentially has more than one vector)? and i is the column we are multiplying w by?
so if that's the case then in this problem, W may or may not have more than one row? but then our output is a vector, not a scalar? (I think you said that)
edit:
to check another problem, it asks, "find delta_w of f, where f(w) = tanh[w^Tx]"
I am guessing the delta_w of f means "derivative of f with respect to w".
so to answer this I did chain rule u = w^Tx
d/dw(w^Tx) = x (I looked it up.. I don't know why it's x ... im guessing order of multiplication matters here. not really going to spend the time going into it)
so anyways, answer = x*sech^2[w^Tx]
did i screw this one up or is it really that straightforward?
gonna try doing the last two next, they are clearly the most difficult
|
|
|
|