The Idea
Ok, that's a bit of a mouthful, but why not. Pixel Corps big wig and TWiM/MBW host Alex Lindsay has this dream for a website (in his case a video website) that can look at how you rate videos and compare that to how others rate videos and then when other people with similar rating trends as you rate things highly, you''ll be shown the things that they rated highly. That way you'll only get content that's good. You know it's good because people who like the same things you like said it's good.
Well I think this is a fantastic idea. In fact, I think it should be done for more than just video, it should be done for anything and everything. So I've come up with a basic algorithm for how this could be done. I offer this as an open spec for how this type of system.
Tracking Rating Trends Between Two People
Suppose we have two people, Person 1 and Person 2. To track how these two people's rating patterns compare, four pieces of data are required:
1: The number of times both people agreed that something was good,
2: The number of times both people agreed that something was bad,
3: The number of times Person 1 thought something was good, but Person 2 thought it was bad,
4: The number of times Person 2 thought something was good, but Person 1 thought it was bad.
I'll label these "Good to Both", "Bad to Both", "Good to 1", and "Good to 2", respectively. In order to track this, we need to track how people rated a particular item. When Person 1 rates something, his opinion is recorded. Later when Person 2 rates it, Person 1's opinion is compared to Person 2's new rating, and the appropriate number (Good to Both, Bad to Both, Good to 1, Good to 2) is incremented. This provides data about how any two people's rating patterns compare.
Deciding Who to Track
When a person rates a piece of content, the process of tracking the rating trends is performed for pairs formed by that person and everyone else who's rated this piece of content. If two people have never rated the same item before, there won't be any record of the 4 numbers listed above, so a new record would need to be created. This creates a pairing between the two people. Once that's done, or if the two people already have a pairing, then the record is modified to reflect the new information.
Recommending Content
Once a person rates an item, there is now the potential to recommend that item to everyone that he has pairings with, excluding those that voted on the item (because they've already seen it). To generate a recommendation, we look at those four numbers, and on the way that the person rated it. Assuming the person who is going to make the recommendation is Person 1, we need to look at two numbers.
If Person 1 liked the item, then the important number is Good to 1 + Good to Both. This number is the total number of items that Person 1 thought were good. From this we know what fraction of those Person 2 also thought were good: Good to Both / (Good to 1 + Good to Both). This fraction tells you what proportion of things Person 2 liked, when Person 1 liked them. If this proportion is above a certain user-definable percentage, then you'll be shown the item.
Advising Against Content
Alternatively, we can look at the other number, Bad to 1 + Bad to Both, and look at the proportion of times Person 2 agreed that something was bad: Bad to Both / (Bad to 1 + Bad to Both). This tells us how many times both people agreed disliked a piece of content. When Person 1 rates something as bad, we can determine how likely Person 2 is to also think it's bad, and again if the proportion is above a user-definable percentage, we can specifically prevent the item from being recommended to Person 2.
Anti-Recommendations
A convenient side effect of this way of generating recommendations is that it doesn't matter if Person 1 and 2 completely agree or completely disagree in what's good and what's bad. If Person 2 always likes the things that Person 1 dislikes, the system will recommend what Person 1 dislikes to Person 2, and vice versa if Person 2 always dislikes what Person 1 likes. This occurs because of the thresholding technique.
Suppose Person 1 rates something as good, but the total proportion of things that Person 2 likes of the things that Person 1 likes is very low, below the threshold, that item will be hidden. This makes sense because if the proportion is low, that means Person 2 likes very little of what Person 1 likes. Conversely, if Person 1 rates something as bad, but the proportion of things that Person 2 likes when Person 1 dislikes them is high, then the item is recommended. This also makes sense, because Person 2 usually likes what Person 1 dislikes.
By making recommendations based on number of items that Person 2 likes, relative to the particular rating by Person 1, we can make smart decisions about what is and isn't going to be liked by Person 2. And you can see that it doesn't matter how similar to people are; even people with completely opposite tastes will be good sources of new content: If they have similar tastes, recommend the good items and hide the bad, if they have opposite tastes, recommend the opposite items. The more similar or dissimilar the better, because the proportions of liked items will be closer to 100%.
Database Model
Table 1: Item Opinions
Item ID, User ID, Rating (Good/Bad)
Table 2: Pairings
User 1 ID, User 2 ID, Good to Both, Bad to Both, Good to 1, Good to 2
Table 3: Users
User ID, ..., Threshold
Algorithm
Rate an Item:
for each Item Opinion where Item ID == this item {
if Pairing where (User 1 ID == this user and User 2 ID == this Item Opinion User ID)
or (User 1 ID == this Item Opinion User ID and User 2 ID == this user) does not exist {
create pairing
}
current pairing = Pairing where (User 1 ID == this user and User 2 ID == this Item Opinion User ID)
or (User 1 ID == this Item Opinion User ID and User 2 ID == this user);
if this Rating == Good and this item Opinion Rating == Good {
current pairing good to both ++
} else if this Rating == Bad and this item Opinion Rating == Bad {
current pairing bad to both ++
} else if this Rating == Good and this Item Opinion Rating == Bad {
if this user == current pairing User 1 ID { current pairing good to 1 ++ }
else { current pairing good to 2 ++ }
} else if this Rating == Bad and this Item Opinion Rating == Good {
if this user == current pairing User 1 ID { current pairing good to 2 ++ }
else { current pairing good to 1 ++ }
}
}
Recommending an Item:
for each Pairing where (User 1 ID == this user or User 2 ID == this user) and (User 1 ID and User 2 ID != User ID of current Item Opinions ) {
if this pairing User 1 ID == this user {
if this Rating == Good {
if Good to Both / ( Good to Both + Good to 1 ) > this user threshold { make recommendation }
else { recommend against }
} else {
if Bad to Both / ( Bad to Both + Bad to 1 ) > this user threshold { recommend against }
else { make recommendation }
}
} else {
if this Rating == Good {
if Good to Both / ( Good to Both + Good to 2 ) > this user threshold { make recommendation }
else { recommend against }
} else {
if Bad to Both / ( Bad to Both + Bad to 2 ) > this user threshold { recommend against }
else { make recommendation }
}
}
}
Last week on the TWiT's Futures in Biotech episode 10 Marc Pelletier interviews Dr. Carla Shatz about the human brain. The most fascinating part of the entire interview was a brief part discussing how the brain manages to connect the retina to the visual cortex without ending up completely random. This wasn't the first time I had read about the mechanism – V. S. Ramachandran has at least mentioned it in one of his books – but I had pretty much forgotten.
What happens is that the retina projects nerves to the back of the brain to connect with the occipital lobes to connect into the primary visual cortex. The connects are for the most part random, which is of course bad because
there's no sense to the signals. Waves of activation spread across the retina, neighboring neurons lighting up with electrical activity while more distant ones remain quiet, sending signals back to the visual cortex. The visual cortex then is able to coordinate the input, since even though the receiving neurons seem to randomly light up, they only light up because they neighbor one another in the retina. I suspect it uses some form of Hebbian learning, where neurons that fire simultaneously tend more often to be affected by one another's activation and those that don't are less affected.
To illustrate I offer a simple example using a string of characters. Suppose you have a string that's been scrambled, say " bbeenoooorttt". With information about neighbor status (the first t is next to the start of the string and next to the first o, the r is next to the second o and the first space, the second space is next to second and third t's, etc.) you can reconstruct the original: "To be or not to be". By using the neighbor status of the letters you can very simply say what they are near and thus by knowing what everything is near, you know where everything is.
there are a number of interesting applications that this might have, such as perhaps self building microchips, but the one I'm particularly interested in is artificial neural networks. Self organizing properties would make it significantly easier to design neural networks because it would remove the necessity to design the thing down to the tiniest detail. Instead they could be designed in a broader sense, by function rather than specific wiring. All that's left to do is get cybernetic implants to work properly and we'll have working cyberbrains! But that's another project...
I know this blog has primarily been about my devving, and this post isn't going to be different, but so far I've only focused on Jade and the related projects. I'm going to diverge a bit, now and in my next post, so brace yourselves!
This post is about parallel computations, not in the traditional sense of breaking a problem into multiple independent subtasks and performing them on multiple processors simultaneously, but rather something very different. Instead I mean a system in which the result of computation is independent of the order in which the parts are computed (which does indeed mirror traditional parallel computing), but in which the answer does dependent on what inputs are present. So while it is, in an absolute sense, completely true that traditional parallel computing does result in this, it's only in a strict sense. Sure, if you made a list of all the results of computation, that list would be different depending on which computations were and weren't done, but it's only different as a single whole. No two items are dependent on one another.
I contrast this with what I'm talking about in that there is interdependence; the answer may indeed be a collection of data, but each piece of data is influenced by a number of different inputs, not just one, which is the case with traditional computation. It's more a difference between how input maps to output, I suppose you can say. To modify the parallel computing notation, traditional computing models in general are SISO, single-input/single-output. Whether it's serial, with one instruction being processed at a time, and producing one output, or in parallel, with multiple processors doing multiple things, but each providing a single output at any given time. What I refer to instead could be thought of as MISO or MIMO, multiple-input/single-output or multiple-input/multiple-output. (Interestingly, you could also have a SIMO, single-input/multiple-output model, which creates divergent results through from process, perhaps random mutation, and this would model one of the processes involved in evolution.)
The reason I bring this topic up is because of a particular problem. If you try to simulate a non-SISO computation on an SISO machine, it's relatively easy to do. Because the results are order independent, the calculations can be done granularly in any particular order. But trying to do the reverse, in particular, trying to simulate an SISO computation on an MIMO machine seems to be especially difficult, because an MIMO machine has no instruction cue like an SISO computation does. Serial, and traditionally parallel, computations have some predefined list of instructions, but non-SISO parallel computations have no such predefined list of instructions, so how can you simulate SISO computations on non-SISO machines? It's an interesting problem, especially because it's essentially the question of how the brain, which is massively parallel MIMO device, can create our conscious experience, which seems to have some serialized nature, especially with things like mathematics, which is inherently serial. This difficulty in simulating MIMO computations might even explain why math is such a hard thing to learn.
One possible method, the only one that I've been able to think of, is that you might have a processor that produces a particular output from a particular input, and you might have a second processor that produces the next input, thus creating a feedback loop. This might not necessarily require that the entire input-output flow be mapped before hand, where you already know what outputs lead to what inputs, and aren't really computing anything new. Instead you could set it up so that the simple presence of the output, along with the current state of the processor and the input generator together produce a new input. This way all you'd have to do is have an input generator that depends on the completion of one process, and on the information of which input it previously provided, to generate a new input in a sequential manner. Of course it would be difficult to truly get a process-based system working, perhaps even impossible, since at no does the process really stop performing the computation. Every input to the processor changes the output, and the same goes for the input-generator, so theres a constant feedback.
I think this is a beginning for how it might work, not necessarily the way it does in fact work. But it is a starting point, something to use in creating models of computation.
So JavaScript's this Object Oriented language without classes, as such. Well, in a way it does have classes, because you can use constructor functions that end up behaving like class definitions:
function MyClass(){var foo = 5; // private variablethis.bar = -1; // public variable}
Which is very nice and efficient and I think much better than working with prototypes, which end up creating only public variables, and no unique objects for properties:
function MyClass(){ ... }MyClass.prototype.foo = 5;
MyClass.prototype.bar = {}; // all instances reference this same object! bad bad bad
But how do you inherit the private variables? Well it turns out that it's very easy. JavaScript provides you with the ability to call one function in the context of a specific object, so that the "this" keyword refers to that object. You do it with the "apply()" method of functions. So you can have two constructors, one which calls apply on the other, and you get inherited private variables:
function MyClass(){var foo = 5;this.getFoo = function(){ return foo; }}function OtherClass(){MyClass.apply(this);}var o = new OtherClass;alert( o.getFoo() ); // alerts 5
Now I have to go back and recode everything to take advantage of this. My task:
Turn this:
...this.property = someValue;...
into this:
...var property = someValue;...this.property = function(){ return property; }...
A few thousand lines of code, that's all... It will be fun, I'm sure...
Today I demoed my almost-completed scroll view class to a friend. He immediately discovered a glitch that I hadn't seen. When you moused down on the scrollers anywhere other than the scroller knob, and then moused up outside of the scroller, weird things would happen. Doing this to one of the line buttons (increment or decrement line) would start the autoscrolling feature that should only occur when you maintain a mouse down over the button for a period of time. When you did this over one of the page regions (increment or decrement page), the scroller would seem to do the page down or page up but then get stuck and flicker and do other weird stuff. And when you made the scroller knob proportion small enough, the same autoscroll behavior emerged from the stickiness and flickering from clicking on the page regions.
Well I investigated this phenomena, wondering what I did wrong. What happened was this: When you moused down on the scroller, if you moused down on the knob, the scroller informed the application that all mouse events needed to be directed to the scroller. But this wasn't true when you moused down outside the scroller knob, so if you were to mouse down on the line down button, and then mouse up outside of the scroller, the scroller never got the mouse up. It still thought you were mousing down. Well this was simple to fix, just change it so that the scroller always received the mouse events, regardless of whether you clicked the scroll knob itself or anywhere else.
Ah sweet bliss, I'm thinking. How lovely it is that the solution was so simple! But alas, I created another minor problem: when you moused down on anything other than the knob, and then dragged the mouse around, weird stuff happened! All sorts of flickering and junk. And the scroller didn't scroll up or down a line or page like it should have. Hmm, where could this come from? Well ofcourse the origin wasn't hard to find: because the scroller was receiving all mouse events at all times during a mouse down, it was getting sent mouse drag events even when the line up and down buttons (and page up and down regions) were clicked and then dragged. This normally would drag the scroller knob, so the scroller was drying to drag the scroller knob around even though you never clicked on it. Solution? Don't do anything during a mouse drag unless the scroller knob was clicked.
The moral of the story? Little things can have big impacts. And sometimes solving little problems creates other little problems that weren't there before the solution. Think ahead, and pay attention to the effect that your design decisions will have. It's better to know before hand the consequences of your decisions because you've mapped every possibility out, rather than try to discover them later by trial and error because you weren't looking hard enough when you wrote the code.
I might've mentioned this before, but Jade is modeled on OpenStep, the OO API that resulted from an effort between NeXT and Sun back in 1993. The API is amazing, it really is. The approach that you take to writing programs when you use it is really different than with other frameworks. In a way, it has what DHH says is the freedom of restrictions. It doesn't let you wander all over the map, doing things 50 different ways, it says he's how you do this, and that's in. And it really helps because it lets you focus on the task at hand rather than focusing on how to implement this particular behavior.
But alas, JavaScript is not Objective-C, and browsers themselves are not computers. The environment that we work in when we make something for a browser is vastly different than the one we work in when we make an executable application. There are new freedoms and new restrictions. And I have chosen to forgo strict adherence to the OpenStep API when it makes sense to do so because of the nature of the browser. I haven't deviated much at all, only in minor ways, but had I not done so life would be much harder.
In a way, using OpenStep instead of creating an API ground up is also using lack of freedom to my advantage. By not having to recreate everything, I have the ability to simply get done what needs to be done. I suppose thats what the purpose of the framework is as well; free developers from the need to work on unnecessary problems by solving them ahead of time, and providing the solutions as a restriction on what they can do when developing. Instead of hurting their work, it provides a useful structure around which they can build apps.
And that's today's musings. Please comment, I love to hear what you think!
Good Enough vs. Just Right
There's a huge dichotomy in the development world. No, not Object Oriented vs. Functional; not Java vs. C++; not Windows vs. Mac; not CSS vs. Tables. The dichotomy is between Good Enough and Just Right.
Good Enough is when your product is not fully functional, or lacks interesting features, and is really subpar, but it satisfies the demand and offers some core features that make it more desirable, atleast at first. Just Right, on the other hand, is when your product is overall a superb product, well designed, fully functional, lots of great features.
This distinction comes up all over the place: MySpace is Good Enough, Vox is Just Right; Windows is Good Enough, Mac OS is Just Right; etc. And we see who's won: Good Enough. But why? Why is it that people settle for something thats Good Enough when there's something that's Just Right? Perhaps it's timing; getting something to be Just Right takes extra time, extra effort, whereas getting it to be Good Enough means you get out onto the market sooner.
Good Enough is not Version 1.0
Now you might think, "Oh, so that means that in order for a product to be successful I need to make it in a rush." Well no, not exactly. Getting something out the door early is a good thing, yes. We see this all the time in the Web 2.0 world where products always seem to hit Beta real early. Paul Graham has noted in his essay on startups that releasing early is essential. Merlin Mann did an entire netcast, First Time Sex & the Beauty of 1.0, specifically about this idea.
But the big difference between Good Enough and Version 1.0 is that Good Enough is a final product. Contrastively, a Version 1.0 product is the first version of many, and it'll get better as time goes on. Good Enough ships with bugs, glitches, and problems, but works well enough to attract people. Version 1.0 is not necessarily feature laden but it's stable, and it works properly.
Beating Good Enough
Good Enough wins because it's first. It gets out there, attracts users, and holds onto them despite the eventual release of better stuff. People are too afraid to change. We all want to beat Good Enough, but sometimes it's hard to do when they're already entrenched. I've got some ideas though on how we can win.
- One thing we can do is literally beat them to market with our products. Do as they say: Release early and release often. Get your product out there, get it known, and make sure that you never let it stagnate once you get people using it, lest it become Good Enough. But sometimes this isn't possible to do because there's something out there already that's Good Enough.
- In that case, you need to make your product sufficiently better to make people see that things aren't so great where they are. Ruby on Rails makes development so much easier to do that it's pulling in PHP developers like crazy. Where it used to be hot to code PHP, now it's old fashioned. Ruby on Rails is the stuff of the future, or so it seems. But sometimes switching is just too hard.
- Whether it's switching from Windows to Mac or from MySpace to anything else, it's a unpleasant process to have to go through. You could loose all of your data, things no longer work right, important programs aren't available; your social network is no longer accessible, your blog has to start from scratch, you have to redesign it to feel like your site again. Anything that can make this process simpler is a good thing.
- Get known. I know it sounds like an obvious point to make, but really it has to be made. If you've got this great product that's better than your competition and easy to switch over to, then you need to make people aware of it. And this doesn't mean advertisement, no. This means let people play with it, show some people who are popular, get them to give it a try. If they like it, they'll talk about it and tell their friends. The only reason I know about Vox is because I heard Leo Laporte and Amber MacArthur talk their their Vox accounts.
So...
Good Enough is our biggest enemy. We strive to create great products but we're often beaten by inferior products that seem to become popular for whatever reason. But with the right approach, we might be able to beat them, and create great products that succeed.
The other day I was talking with a friend, trying to gauge why he uses MySpace and why he's so attached to it. What is so desirable about MySpace? It looks horrible, makes it damn near impossible to customize, and is full of ads, so why are so many people enthralled with this service?
The answer turns out to have two parts. First is why they join: All of their friends are there too, so why not them? The second is why they stay: You try migrating your entire online presence away from MySpace; vendor lock in is a bitch. This second point really is true of every hosted site service, so it's not uncommon, but it's definitely important. To see why, we have to first look at the first point.
When you create your MySpace site, one of the biggest concerns is networking with your friends. You do this by finding their sites on MySpace and simply clicking a link to add them as a friend. But what if your friend is on LiveJournal, Blogger, or Vox? Sorry, only MySpace members can be your friend. The same is true of so many other services. So ofcourse people are joining MySpace, you've got to in order to be able to network with your friends who are already on MySpace.
But this, as I said, is true of every hosted service, so why is MySpace itself so popular? Who knows. Perhaps it had to do with the extra content it provided early on. I don't really know. But someone had to come out on top and that was MySpace. The second point now becomes easier to understand. If all of your friends are on MySpace, and you can't network with outsiders, then noone's going to leave MySpace because that would amount to leaving their social network. Not only that but you'd loose are of your customizations, etc. It's really impossible to make your online presence portable.
It essentially is that the hassle of the migration process makes it not worth doing. This is true of so many things, including switching from PCs to Macs. But does it have to be this way? I don't think so. I doubt that many of the relevant companies would support the idea, but if there were an open API for certain things like site profiles, preferences, style/layout, blog content, and most importantly, social networks, then I think the migration process could be easier.
I'm sure that the companies would fear that it would reduce their member count, and maybe this is true, but I suspect that it could only hurt you if you're mistreating your customers. If you treat your customers like people, and provide them with the tools they need to do what they want with their online presence, then I think they'll be loyal to your service, even if those tools include utilities to make it easy to leave your service, or interact with other services without forcing people to join.
The key to this, as I said earlier, is the social networking element. Things like blog posts can be accessed via RSS of a sort, so it's not a huge issue to migrate them over. But the social network is difficult for a few reasons.
The obvious feature of social networking is being able to have a list of friends. This by itself isn't difficult, it's just a link to their site, basically. But along with that usually comes things like displaying their profile picture and their name or nickname automatically. Getting this data would require a standard API for accessing their site and retrieving name or nickname and profile picture. Putting this into an RSS feed located somewhere like "../PsygnisFive/profile/feed" would make the process immensely more easy.
I imagine the process would go something like this: You click an add link which uses JS to pop up a little lightbox-like container with two fields: your service (Vox, in my case here), and your username on that service, or perhaps just the location of your personal site on the service. Then it would direct you to a REST-ful url, in my case perhaps "psygnisfive.vox.com/neighborhood/add". The GET or POST data would contain the URL of your friend's page, and that would be used to then get whatever info your service needs to add a person to your friends list.
Now that you've entered your service and your username, this gets stored in a cookie, so the next time you click the add button in someone's site, it's all done instantly.The service would then attempt to send a message back to the other user's page indicating a befriending, maybe by posting to something like "../messages/befriendedBy", and the procedure would be complete. I don't see there being any technical reason why this can't be done.
Some other issues to resolve would be the ability to make your site friends only. Doing this might require an implementation of the OpenID system so that when you want to view your friends site, you don't need an account on their service, they just need to have you as a friend because your service acts as your ID server. Again cookies could be used to make this process invisible after the first time. This might even promote something of a blending of your online profile, your OpenID, and other forms of online identification like vCards, into something that makes a lot more sense, a truly unified online identity.
But current systems don't support this stuff, so how can people migrate over currently? Well it's not going to be easy at first, this is a given. The friends-only issue won't be solved until people support some standard. But adding friends can be done in a semi-manual sort of fashion by simply pointing your service to their site and having their data scraped. This would create atleast a one way network, connecting you to people on closed services. Getting the reverse to work will be something that the service must do themselves
Frameworks for GUI widgets gotta have a look right? You can't have a graphical user interface without graphics! So ofcourse I've got to have a skin for this thing. And I don't want it to be some silly looking skin like EYLO's garish blue, Microsoft's Windows Vista with its pointless transparencies, or, god forbid, Tonka's Microsoft's MyFirstOS XP, I want it to be a well designed GUI that's functional but not distracting.
So what I've come up with, atleast for how the panels will look, is this:
I'm also considering another design that's similar but without any special gradients or anything, just flat white/grey, or white/black with a subtle transparency, reminiscent of Apple's pro apps. I've enlisted Brian's help in coming up with a proper skin for the framework, so I probably won't be using the one above as is, but who knows.