As I thought about this deep problem (policing the boundaries of free speech) I realised that *moderate* means both to be free from excess and to decide whether something stays or not on a forum. Say what you want about the U.S., free speech *maximalists* ought to be glad the internet (and with it a lot of internet culture) started in the U.S. what with its long history of constitutionally protected free speech.
If an internet forum is to self police and to not rely on the actual bobbies to police its content then it makes sense that content on that forum is policed in as transparent, fair, and dare I say it, democratic a manner as possible.
This post is half political, half confessional, and half dyscalculical.
Few like having their core beliefs called into question. That’s why philosophy is a tough old slog. That’s not to say that the many other parts of life that require critical thinking are not a tough old slog. Generally speaking though few disciplines compel a person to be wary of their unexamined beliefs quite to the same extent that philosophy compels them†. We look at the world through the window of our beliefs, which is why people come to radically different conclusions about identical circumstances. The more fearless the philosopher the more transparent and without defect the window. This is one reason to mistrust technocrats, formalists, and all those who would instruct with mathematical, formal, or technical language.
I recently did a search on the big G for “Hacker News” and “Machine Learning” to see which posts had attracted the most amount of search attention. I thought it might be the recent announcement of TensorFlow from the aforementioned Google or the even more recent announcement of the Distributed Machine Learning Toolkit (DMLT) from Microsoft. These two multinational corporations are not the only tech giants to have entered this arena. Amazon Machine Learning has been in this space since April albeit they’ve taken their traditional SaaS route so while technically speaking they are providing machine learning services they don’t have an open-source toolkit offering a la Google and Microsoft. Rather, TensorFlow and DMLT follow on the heels of community offerings Torch and Theano.
Sometimes it’s hard to spot a trend that’s right under your nose. It will be interesting to see the worlds of humanities computing and machine learning collide.
Anyway, no one posting caught my eye. What I did notice is that several companies have written about the classification of Hacker News(HN) posts. The three articles I noticed were this one about news categorizing by MonkeyLearn, this one about algorithmic tagging by Algorithmia, and this one about autotagging by Dato. There appear to be supervised and unsupervised versions of these algorithms. The supervised version matches on a predefined list of categories and training data whereas the unsupervised does not need any training data. Dato call the unsupervised approach autotagging and the approach with a training dataset simply classification. Being new to the machine learning camp I couldn’t say if these terms are standard or not. All three articles are informative, and interesting for their different take on things.
A more descriptive term than classification (which seems overly general) is topic analysis or topic modeling and this is the term I have been using in my collaboration with the originators of Saffron(Bordea, 2014). Relatedly I was looking at the introductory video for TypeScript by Anders Hejlsberg today and was struck by the applicability of the notion of type inference to topic analysis. I think we should call all these classification methods topic inference and when those topics are related one to the other then we have topic modeling, or ontology inference of one stripe or another.
I honestly couldn’t say what I’m trying to get at with this short blog post. It merely amused me that a number of machine learning shops had hit upon the same task to demonstrate their tools and wares.
Imagine that you had to develop a working toy model of reality that captured not the entire facts of the matter as they are in our world but instead was required to model solely the epistemological part. You might imagine the world as a boundless two-dimensional spatial plane, something like Flatland perhaps. Let us call this toy model Epistemic Flatland. I am not suggesting an infinite plane, perhaps the plane wraps around on itself like the surface of a sphere, in such a way that it is finite but with no edges. On the plane “live” two-dimensional beings that have two sense organs, one for input, one for output, and a rudimentary “brain” with the faculty of language.
What internal machinery would these toy beings need to perform basic cognition and recognition. What internal machinery would these micro-inforgs require to “speak”, make simple judgements and perform elementary logical operations, perceive and make sense of their world? Would these beings exhibit emotion and display affect as they each internally simulate their own little world and have their expectations met and thwarted? Each being or system would have a permeable boundary that encloses its internal structures and separates the system from its environment but allows data and information to pass through. It seems like an impossibly complex micro-world to construct; it seems like a thought experiment whose realization in the actual world is an impossible task. Nonetheless it is a thought experiment that I have found illuminating and instructive to play with.
In software engineering confounded programmers routinely ask for help with non-working code on web forums. A common request by the peers of the perplexed is for a minimal working example. That is to say a snippet of the entire whole is requested that demonstrates the piece of non-working code or markup is requested. All other non-impinging details are stripped away to reveal the essential workings of the problem. What I am suggesting is that the grand project of epistemology is nothing else but to construct Epistemic Flatland. How much of this world (our universe) can we strip away and yet retain beings with the features of basic learning, basic cognition, basic pattern-matching and semiosis? The beings would not have to be recognisably human in any way but they would have to exhibit the recognisably epistemic features of human beings: language, subjectivity, and so on. How much “cheating” would be allowable, how atomic would this micro-world have to get in other words.
I believe that there could be value in creating a global challenge with a substantial monetary reward the better to spur research (something akin to the Millennium Prize Problems) with Epistemic Flatland as the goal.
Imagine if this universe we find ourselves in is just that minimal working example, perfectly coincided with it! And imagine further if we could prove that situation to be the case.