Scientific Method
Summary of Tasks
One option: work on some concrete, real-world problems and judge what skills you need. Look at what people do and find some examples where you can get good feedback (this could be hard). Also, analyze the ratio of skill usage in those problems.
Another (not mutually exclusive) option: take individual skills and practice them and thus learn what your scientific method can do. Look at your practice ideas in the binder notebook.
Plus, I need local feedback mechanisms for those tasks.
One task is to collect my scientific method ideas from my notebooks. Also, to work through the Causality book with a commentary.
Domain-independence hypothesis - how much of real-world success is because of information-processing? exactly which parts involve information-processing? what else matters? deliberate practice? Information and entropy. Rate of acquiring information. Limits of a causal model of a given size.
Information-processing is one factor in creating value. However, building skills is also a factor (remember skateboarding).
Central Question: What is so hard about applying the scientific method in real life? Designing good variables (by finding conditional independences), finding their values, getting the causal model, etc. The next complication is that information has a cost. Plus, you have limited cognition.
Summary
Figure out the decoupled pieces. Look at things that are orthogonal to one another.
Transcribe the important ideas from my notebooks.
Use each technique several dozen times. Now is the time to push at full speed! Aim: Practice individual scientific techniques a thousand times. Solve useful mini-problems with these techniques. Show that you can outstrip conventional thinking in at least some areas.
What are the most valuable surprising things you’ve discovered about the scientific method? That is all that matters. That is how other people (and you) will judge your progress. It’s ok if you have unanswered questions still. Release the other insights you’ve gained, like the ones about information and probability, or information gain, locality of causality, etc.
Talk in terms of probability theory
If everything were already in the form of causal models or even just probabilities, then it would all be straightforward! It would be obvious what you forbade.
The challenge seems to be to translate everything I see into the world of probability theory (and information). Translate every real-world problem into a probability theoretic problem (or causal problem). Also, know how to translate them back into the real-world. This is common for every problem, I suppose, unless you have to cache some techniques in your head to avoid going back and forth. Then, look at techniques for doing the math more efficiently as a human.
Talking in terms of probability theory requires variables though (which involves categories). So, figure that out really well too.
Corollary: This is probably how you achieve domain-independent reasoning. Once you drop down into the level of mathematics, there is no distinction between programming, painting, or party-planning. In the realm of probability theory (or information theory - they’re two sides of the same coin), there’s nothing but theorems, hypotheses, and evidence.
Actually, dig even deeper. Hypotheses of any kind, like Bayesian networks or causal models, are defined in terms of variables. Theorems and evidence too are defined in terms of variables. When probability theory tells us that P(A) + P(~A) = 1
, the A
is a variable that we define.
So, we need to figure out how to translate real-world data into a model world of variables and probabilities. There are the skills needed to translate to and from the model world and then there are the skills needed to reduce uncertainty within the model world.
However, the model world doesn’t exist at the beginning. You build it as you go. With an established theory, like that of supply and demand in economics, you just have to figure out the supply and demand curves for your market and then you can calculate the price assuming that things are efficient. Economists have already understood that supply and demand depend only on the price and other factors but not on each other. They know which things are decoupled from each other.
But when you’re starting in a new domain, you have to discover the conditional independences by yourself. All you have is a bunch of apparently unrelated observables. You have to create order out of the chaos. This produces a third task: build a model world from the basic observables.
Therefore, the three key tasks of scientific thinking are building a model world, translating to and from that world, and working within that model world to reduce uncertainty. In microeconomics, for example, the model world consists of supply, demand, price, elasticity, etc. When trying to figure out how well a new ice cream flavour will do, we judge the supply and demand for it by looking at surveys and human psychology and past market behaviour. Finally, we calculate the price of that ice cream in an efficient market by using the law of supply and demand and thus predict the profits we can make.
What is this mythical “model world”? It is just a collection of variables that you can observe. It’s your Thingspace. You may start out with just a few observables and add more as you need them. Note that, as a human, your observables can already be very complex. Your brain most of the work already in narrowing it down to a binary variable like “male” vs “female”, a simple integer variable like “age”, or even “funny” vs “not funny”. These contain vast amounts of information, most of which we can’t even replicate with our computers. Still, our Thingspace will just have those final variables, ignoring the complexity underneath them.
So, the tasks of scientific thinking are designing categories, recognizing them, and reasoning with them. Get ten examples of categories and observe how they were formed.
Remember that goals dictate abstractions. You have to decide which things are decoupled given your goals.
What am I still uncertain about? Which variables should I make explicitly part of my causal models and which ones should I leave inside other categories?
TODO: How is designing categories doing Bayesian work? How are you reducing your uncertainty? How can you do that efficiently, given what you know about Bayesian thinking? Let’s be clear: going into this “model world” is not some special act exempt from the Bayesian laws. Your whole model world is built of observables organized into categories. There has to be a rhyme and rhythm to designing categories. If you have n bits of uncertainty at first, designing categories probably reduces it by some amount. How much of the information work are you doing by designing categories and how much by causal thinking?
What is the boundary between categories and causal models? When do you stop categorizing and start looking for causes? I suppose you can observe categories well enough. In other words, what is the level of abstraction at which you model the system? I think this might be decided by your tools for interventions.
Look at the space of possible actions you can take. That is severely limited compared to thingspace, especially once you factor in costs.
TODO: I think a category is the value of a variable, not the variable itself. Dog, cat, and zebra are categories or values taken by the variable “animal”.
If much of the information work is done in designing the categories, shouldn’t you take extra care to make sure you get them right? Isn’t that where you should spend most of your time? Otherwise, you’re working on a brittle foundation; you will never be confident that your conclusions are valid.
Let’s think pragmatically. If we only care about categories to a certain level of abstraction (not all the way down to quarks), then why not ask: what is a good category? It’s not like I want to derive the pure essence of categories and then use it to make perfect inferences. I just want to do it better. That’s all.
TODO: What is a good category? How can you make a category better? How can you make it worse? For a given domain, what is a good size for a category? Give me good and bad concrete examples of categories.
Why make categories have low entropy (and thus need shorter messages to describe them)? Because you want reality to communicate the right answer as quickly as possible. Just by hearing the roar of a tiger, you can quickly infer things like “it’s nearby” and “yikes! I’m going to be eaten”. You didn’t need to observe its eyes, shape, fur, teeth, or DNA to know these things.
Here’s what I get from Eliezer’s conceptspace post. You want to infer stuff about stuff that matters to you. For that, you need to know the probability distributions and causal models over other variables. He says that the probability distributions are enough for you to make inferences. So, our goal is to get probability distributions over things that are related to our goals. And he says we get the probability distributions by drawing our boundaries - by creating categories. So, how do categories get you probability distributions? What would you do without categories?
I suspect you try to get the broadest categories that help you infer things about your goal. Keeping categorizing as long as it gives you information about your goal variables. I suspect that as your goals change, so will your categories.
In short, categories give you the variables you need for your probability theory. Instead of having a variable with a trillion possible values (based on the configurations of the component variables), you combine the configurations with mutual information into categories and get far fewer possible values (like dog, cat, tiger, etc. for the variable animal).
Falling into place
You want to answer only a small set of queries. If your abstractions work well enough, if they consistently make correct, narrow predictions, then your hypothesis is accurate. It will work well for that set of queries.
Look at the purpose you want to achieve and build a hypothesis just complex enough to meet your needs. When reality defies your expectations, add some more details to your hypothesis.
Hmmm… I let myself be fooled by the impressiveness of the statement “goals dictate abstractions”. I failed to see through to its actual meaning. How exactly do goals dictate abstractions? You only make a hypothesis about the variables that can affect your goal variables. So, how do you decide if a “variable” can (directly or indirectly) affect your goal variable? Let’s be clear: your goals are just a bunch of variables that you want to see in certain states. For example, all else being equal, you want your happiness level to be as high as possible, you want the oxygen level in the air to stay above 21%, and so on. This doesn’t tell you anything about which variables are related to these goal variables. You have to come about that knowledge by other means.
It comes down to value of information. You can choose to ask reality about certain variables, but that information (from observations or experiments or just studying) has a cost. So, you look at the decisions you can possibly make and seek only the most valuable information.
Causal models are Disguised Queries
You don’t actually want a “causal model” of the system; you just want to be able to answer a set of queries. Causal models are disguised queries. So, what are the queries to which a causal model is equivalent?
Take a simple causal model, X = f(A, B). What is the total information stored in this causal equation? Once you observe the values of A and B, you can determine the value of X. Similarly, if you set the value of A or B, X will change accordingly. Also, if you observe X, you can infer something about A and B, especially if you observe one of them too. Ignore the causal structure; ignore the directionality. Even if you knew nothing about how X, A, and B fit together in the causal model above, you can still answer every query you have with just the above information. So, if you observe any of X, A, or B, you can infer things about the remainder. Further, if you intervene on A or B, you can infer the value of X. The key here is the Causal Markov condition: once you know about A and B, other variables don’t matter for X.
More generally, you have all the properties of a Bayesian Network, namely that you factorize the JPD into product over i of P(x_i|parents of x_i)
= 1. You can answer any question that you can answer using the JPD. Then, you have the properties of a causal model, saying that surgical intervention leads to a particular new causal model.
Goals, Queries, and Abstraction Levels
TODO: In my phone notes, I wrote that your abstractions are decided by the set of queries you hope to answer. How does that relate to the idea of categories and minimum entropy?
A query is about setting or observing the values of some variables, and asking for the values of others. So, given a set of queries, you can get the variables that you care about. It already contains the level of abstraction at which you should operate.
So, now the question is, how do you get the set of queries you want to answer? Is this decided by your possible decisions? For one, you don’t care about variables that have nothing to do with your goals.
Think in terms of causal models: what kind of variables matter to your decisions? Also, assume that you can’t intervene on some variables. Any intervenable causal ancestor of your goal variables matters. In fact, they are the only ones on which you want to intervene in your decisions. However, to understand the causal structure, you may want to intervene on non-ancestors also. Next, you want to observe ancestors of your goal variables (or of the decision-important variables). You may also want to observe non-ancestors of goal variables or decision-important variables in case you can’t observe those variables directly.
Question: In other words, given a set of goal variables (not necessarily observable), and possible interventions with their costs, what are the queries you want to answer in order to get the most value? (You can also observe certain variables, at some cost.)
Why do you need any queries at all? Why not just intervene on the independent variables? First of all, you don’t know how they will affect the goal (if at all). Next, those interventions have costs. So, given your limited resources, you have to intervene efficiently to get your desired results.
The only choice we have is which interventions to make. Take the space of possible actions. You could intervene on different variables and set them to particular values. Or just leave them be. You want to know the expected value for each action so that you can pick the best one. And the expected value of an action depends on how it affects the goal variables. For each action, you form a probability distribution over the states of the goal variables. So, the job of our model is simply to tell us how our actions will affect the goal variables. And those are the queries that we actually care about!
So, if these are the only queries we care about, then we can use them to decide the abstraction level of our variables.
However, the action-goals links only talk about independent variables. There may still be some dependent variables (observables) that you’re unsure how to abstract. How do you decide how to abstract them? What queries correspond to them? Well, in case the goal variables are not observable, then we need to know how to infer their values from the observables. Also, there may be some observables that come between the independent variables and the goal variables. Roughly, I think we need to talk about the link between observables and the goal variables, given the independent variables.
TODO: So, given goal variables, independent variables, observables, and costs, we have to find out the causal model (or, rather, answer a set of queries). That information has a cost and thus we need to know its value. I suspect that certain pieces of information matter more, even at compile-time. Like if you think that a certain independent variable may have a big effort on an important goal variable, then you should seek that information first.
TODO: Wait, but wasn’t this whole discussion about the level of abstraction for variables? Isn’t that decided by the above set of queries (action-goals link, etc.)? Well, what if you can observe or manipulate at a fine grain? Maybe drinking exactly 221.354 ml of Pepsi will give you huge amounts of joy unlike 220 ml or 500 ml. To abstract confidently, you must know somehow that there’s not much difference between 220 ml and 221.354 ml when it comes to joy. Should you use entropy or value of information there somehow?
Here’s the tricky part: our variables may not be related to the goal variables at all. The independent variables may not be causal ancestors and the observables may be siblings. Anyway, that comes under standard causal modelling - you find out which variables are conditionally independent.
Compile-time Worth
Distinguish between compile-time and run-time costs. Compile-time refers to the making of the causal model, while run-time refers to its usage. When you’re talking about the cost and value of information, it’s information about the causal model! You’re unsure how the model looks, so you’re willing to pay to find out. It’s a one-time cost.
TODO: How much is a causal model worth? Or, rather, how much are the answers to those queries worth? I suppose it’s about the reduction in uncertainty in the action-goals link and the value of that information.
What about observables? When are you justified in paying for them? Is this at compile-time or run-time? I think you get the knowledge of how observables relate to the other variables at compile-time.
TODO: I need three concrete examples about goals, actions, and value.
Note that you can also be uncertain about possible interventions. They’re not all available to you. You could discover a new intervention in the future.
There is a difference between modelling for the sake of better decisions and modelling for the sake of knowledge. In the former, you want to get the most value at the least cost. In the latter, you want to answer a variety of questions, like at the end of a textbook chapter or in an interview. Needless to say, the latter is just a means to the former. You can use tests or exercise problems for training, but at the end you have to take decisions somewhere to get value.
Phone Notes
Why give concrete examples?
Goal for 2016.
Domain-independence Ideas
Some domains require you to do a lot of simple problems but really quickly. Like truck driving, where you have to keep track of several small systems in real-time. I think cooking, sports, and social conversations fall in the same real-time processing category.
Overall Aim: Figure out how to think well independently of the domain and then use that algorithm to go and solve problems in different domains. Master domain-independent thinking.
Why don’t people do this more already? I think it’s because we snowflake. We think that each domain is unique and special and requires a special way of thinking that is not found in any other domains. Of course, most people are not polymaths and so can’t see how their learning algorithms extend to more than one domain even if you go deep. We feel we can’t learn about some intimidating field (quantum physics or painting or neuroscience) even in principle. It feels almost impossible to learn and, therefore by naive realism, it actually is almost impossible to learn.
So, cognitive science is what I want to study. I want to learn how to think better. Probability theory comes in, as do causality and decision theory. But they apply only insofar as they can help me think better.
Decoupled Ideas
The overall aim is to reduce uncertainty.
Causal models help you factorize your uncertainty and answer interventional queries. It’s all about finding conditional independences or which parts are decoupled from each other. How to do this given some information?
To reduce uncertainty at speed, look at differing predictions. Ideally, focus on high-entropy experiments (but I’m not sure how to calculate this for two given hypotheses).
One problem is that you have to write models explicitly. You ideally want models in a domain-independent format. How can you do that with the minimum effort?
So far, the key ideas seem to be decoupling, entropy, and format.
What else am I uncertain about? What else do I need to get to utilons? Domain-independent information-processing; constraints on interventions and observations.
How to deal with scary domains like biology or math or photography? They seem unimaginably complex. Or, rather, I feel like there’s no way to master them (or at least get a lot of valuable information) without putting in months and years of hard work. To reframe the question: how can I get valuable information from these domains with the least work? “Asked in such fashion, the question answers itself.” I need to look at things that have high value of information.
In fact, I must do nothing else but this. That’s as far as information-seeking is concerned. For maximizing utilons, I need to look at decisions that have high expected value.
Actually, I can’t even talk about the “value” of information without assessing my goals or options. That’s what it all comes down to. There’s no point talking about theory otherwise. Once I know exactly what I want, I can dive into the biology textbook and extract the precious few bits I need, instead of drowning in the sea of apparently irrelevant information.
In short, I’m shrinking away from trying to understand “chordata” or “xylem” or whatever because it seems like too much work. Well, “too much work” depends on how valuable that work is. All else being equal, shovelling wads of free cash into your bag is not too much work - it’s probably the best work you can do.
So, even if I can’t figure out how to multiply my learning efficiency tenfold, it may still be worth it to slog through a textbook if it can pay off. I am a bounded agent and I have to do the best I can with my limited time and cognition. So far I’ve been acting like if I can’t find a way to zoom through entire textbooks in hours, then it’s not worth learning from textbooks at all, like I should just give up if I can’t win big. That’s stupid. Win big or small, I must win as much as I can.
Figure out your goals and limits and then do the best you can with your few years on this planet, even if it isn’t a lot.
Ideas on Information
Programmers (or at least I) tend to undervalue mere information. We judge things by their bits of algorithmic complexity, not their bits of information. This means that we (or at least I) scoff at things that are simple to code or understand. Take Goodreads - it is (roughly) a simple CRUD app plus social networking and suggestions. As a program, it may not have too much complexity. But as information, it has tons of value.
I think the same goes for startups and their apps. As programs, they’re not too complex. But based on information or just plain impact, they’re very valuable!
Corollary: Don’t underestimate something just because it is simple. It could still be valuable!
One more thing is that I tend to overvalue skills and underrate memory. This is the same problem as above: I overvalue the procedures and undervalue the memories. I think it’s important to know how to do things, rather than know trivia about them. Part of this might be a reaction to childhood horror stories of rote learning. The smart guys, it seemed, were always those who understood the ideas instead of memorizing the answers. I’d always looked down upon people who prided themselves for remembering tons of (pointless) details. It seemed like obvious showing off. “If memorizing things is so powerful, why haven’t you achieved much?” was the question in my mind.
However, as a smart guy, you should be able to weaponize memory. In fact, part of what powered the great achievements of the smart people I know was their vast knowledge of their subject. It just felt different from the rote learning that those other trivia masters did. When my friend spoke of all the algorithms and data structures he knew, it felt as natural as a memory of his friends or favourite TV shows. It seemed effortlessly part of him.
Anyway, memory seems to be a vital ingredient in human cognition. It would be wise to memorize lots of topic-relevant things and do so efficiently.
“Mr. Potter, one of the requisites for becoming a powerful wizard is an excellent memory. The key to a puzzle is often something you read twenty years ago in an old scroll, or a peculiar ring you saw on the finger of a man you met only once.”
– Professor Quirrell, Chapter 26: Noticing Confusion
As a thought experiment to stretch your mind, consider how far you could go with just memory.
Key Insights
Causal Bayesian Networks as a bunch of stable, autonomous mechanisms. Locality of causality and factorizing our uncertainty. Being able to talk about interventions. Use observed correlations and experimental evidence to pin down the causal structure. See nothing but causal models and variables. A “hypothesis” is no longer a black box to me. I have a good idea of the pieces inside (be it a full-fledged causal model or a mere JPD).
In short, how is causal thinking different from conventional thinking? Think of causal thinking as Bayesian networks plus interventions and counterfactuals. For one, you think in terms of probabilities. Then, you factorize your uncertainty using the causal markov condition. You seek simple, naturalistic mechanisms behind the effects you see. You automatically get falsifiable hypotheses with narrow predictions and can thus move quickly to the correct answer. Also, you acquire information at the maximum rate by looking for high entropy situations. We test hypotheses smartly by looking at their differing predictions. This includes always looking for ways to test your beliefs and generating alternative hypotheses given a set of evidence.
Here’s the key difference from before: earlier, I thought in terms of constructing entire giant alternative hypotheses, things that differed at every level. However, now that we have locality of causality, we can think locally. Take up parts of the causal model - maybe just a single causal link, or a triplet of variables - and ask how they might be arranged differently.
Thinking locally means thinking in terms of nearby causes and shutting out faraway ones. Once you know about the parents of a variable, you don’t need to care about the ancestors. You have all the information you need. So, instead of asking for abstract general causes of things, ask what happened just before this; ask what happened close to this. Think locally and concretely, not globally and generally.
I can practice creating causal models from scanty evidence by looking at unfamiliar sciences so that I can’t use my existing knowledge.
Problem statement: Given evidence, come up with causal models that fit; given sources of information, eliminate hypotheses quickly; given a scenario, predict what will happen.
However, in learning, you don’t always have to crunch the evidence yourself. Textbooks and other resources may give you the known correct hypothesis. The challenge there is to compress the model while keeping it useful.
I must be able to express any theory using a causal model.
How do you test a given causal model efficiently? Look at the tests with maximum entropy.
Aim: Get a step-by-step algorithm to get to an accurate causal model from zero.
I don’t want to do original scientific research. I can’t do research in psychology or physics or neuroscience without a lot of capital or resources. I just want to solve problems. And the methods of uncertainty reduction seem to be the best for that. For now, focus mainly on learning well. That should take me a long way towards solving problems. I suspect that I don’t need to do original research - get evidence on my own and crunch it. Much of the information I need is probably already out there.
When you have got to a hypothesis with a high prior probability, you won’t expect to lose confidence in it. You have hit the jackpot.
Conditioning on a collider is about setting the lhs of the equation C = f(A, B), thereby constraining A and B.
Major attitude: everything has an explanation! Explain literally everything! Or, rather, anticipation-constrain literally everything! Figure out the cause behind everything. And, of course, when somebody claims that X is the cause, don’t accept it without evidence.
Don’t accept anything without evidence Somebody: “X is true” You: “So you say. Show me the evidence.”
Historical Perspective: Curiosity about how things came to be the way they are Why do we have supermarkets instead of ration shops? Why voting / democracy? Why civil service? Why parliament? These aren’t arbitrary facts. They are tightly governed by the true hypothesis about reality. They are superbly entangled. You shouldn’t be accepting it either way without question. “X happened? Oh, ok”. “X didn’t happen? Oh, ok.” – Wrong! Why does USA have Hollywood and why is it so popular around the world? It doesn’t have more people than Europe (I think)
Programming is different and powerful because you can intervene cheaply. Modifying the source code of a program or even just giving it different inputs is easy. So, the main differences that come about between scientific thinking in the real world and in programming would be because of the ease of intervention in the latter.
One of the outstanding questions I have is about how to identify variables from a textbook.
Look at textbooks from economics, decision theory, thermodynamics, or neuroscience.
Even if the mechanisms are complex, the equations might be simple. Like the idea of disproportionate rewards (80/20 Principle) in different fields or just any simple microeconomics model.
Lesson: What would you do without this “key” idea? For example, what if you didn’t have a good theory about categories? How would you meet your inference needs?
Essay topics
Write one essay on what you learned from information theory.
One more on variables and abstractions.
Open questions about domain independence and how much you can learn in general without going into a specific field.
Write a commentary on Pearl’s Causality.
Ideas about confusion, speed, etc.
Practice Ideas
Look for surprises. Use the experiments in some domain to test your grasp of the model. Taboo the major words. Think only in terms of models and variables.
Think in terms of the variables. Don’t deal in abstract concepts. Ask for precise predictions: “A often leads to B” - exactly how often? Specify the exact effect of some cause.
When positing a cause, check if you can manipulate the supposed cause to get the desired effect.
Science is about generating alternative hypotheses that explain the existing evidence equally well and using further evidence to distinguish between them. So, let’s design a basic practice plan. Step one is to collect evidence - correlations, experimental evidence, or just plain observations. Step two is to identify the variables. Step three is to come up with alternative models for each piece of evidence. Step four is to note down where they make differing predictions, i.e., variables for which they posit different causes. Step five is to combine the mini-models to get several complete models.
Get into the habit of solving one problem at a time, no matter how trivial. You can always scale it up. Wrap up each problem before moving to the next.
Maybe start The Great Scientific Method Project - take up one resource or topic every day and extract the relevant, detailed causal models from it and run experiments, if possible.
Maybe extract models from technical books.
Look at the inverse of the variable. Ask what causes X, but then also ask what causes not-X.
Similarly, ask what happens when the alleged cause is turned off. If you do X, then Y happens. What if you do not-X?
Think only in terms of hypotheses. Every single thing in the world is either a hypothesis, experiment, prediction, or evidence. That’s all. Focus on one particular thing at a time - in one pass, look for hypotheses; in the next, look for experiments; and so on.
If some part of a hypothesis doesn’t deal with any evidence, then it has no business being there. This is test-driven development, in other words. Get the evidence, show that your current model fails to predict it, and then postulate some extra cause. So, the key is in finding experiments where your current model doesn’t predict correctly.
It’s not about the number of reps. It’s about the number of reaches. Big difference! You have to actually do something you’re not good at.
Have Fun Failure. You should fail about 80% of the time.
Much of the time, the scientific thinking you’re doing is parallelizable. So, don’t try to go through to the end before you start the next pass. Stop at some convenient point and go to the next step so that you have a short cycle with quick feedback.
Ask: What would falsify your hypothesis?
Taboo “often”, “sometimes”, “usually”, “many”, etc. Be precise. Ask for narrow predictions - “you become stupid after hours of work” - exactly how stupid, after how long?
For each independent variable, ask yourself if you can manipulate it. For each dependent variable, ask yourself it you can measure it. Taboo it, in other words.
For each independent variable, ask which dependent variables it might affect.
It’s not that the independent variables “cause” the dependent variables to change. It’s just that they let you predict the values of the dependent variables.
If you have to add a lot of corner-cases and details to your hypotheses, you should probably go a level deeper and find a more powerful, general hypothesis (like with stuff and signalling).
The key problem is that I need to get as little evidence as possible from the domain, but in general, the main way people present evidence is by showing the correct hypothesis itself. Yes, that encodes within in it all the evidence seen, but it ruins the exercise for me.
Ask “what would falsify this?” for every correlation or causal link.
Make sure to create solid, measurable variables. “People will overlook that idea” is not good enough as a variable. Go for “% of people who pursue that idea after hearing it”.
If someone solves a problem, ask what technique they used to get to the right answer? Did they taboo their words or maybe look for differing predictions?
Ask what will happen if you toggle the variable. Do this even for correlations (A+,B+; A-,B-). This makes you realize what you forbid, because if you get the same effect no matter what the value of the variable, then it is not the cause.
Use locality of causality when you’re in a bind. Actually, use it even when you’re not.
Ask about the disguised queries that a phrase represents.
Skills I need to Learn
(Also look at the practice ideas above.)
The skill of not flinching away from thoughts. I somehow manage to not-think about inconvenient thoughts - about how I could be going wrong at the moment, or how I could be doing something better. I need to be more aware of my thoughts processes, as Harry Potter suggests.
Another skill is the game of Follow the Improbability. It’s what Harry Potter used in one of the later chapters of HPMOR. I don’t fully know how to use it, but I suspect that it is very useful. (Perhaps it uses the fact that high improbability means high information.)
How do I get feedback on my practice? Some measurement - maybe number of reps - number of reps of what?
Practice noticing confusion and noticing surprise. Also, learn to notice naive realism.
Generating alternative hypotheses for some particular variable, given some evidence. The problem statement becomes: given an amount of evidence, what different causal models can you come up with?
Also, making predictions using a given causal model, and noticing where they don’t match with reality.
One more core skill is to use non-experimental evidence to eliminate hypotheses. You don’t always have the luxury of running interventions, let alone randomized controlled trials.
Next, of course, is the skill of running experiments, hopefully ones that give you maximum information.
One skill is to list variables of interest - anything that might cause, be caused, or in any other way be correlated with the others.
Most important of all, connect your models to reality. You form models using experimental evidence, cool. But, now, test it against real life.
Another skill is to enumerate the instances of a class. You may say that “stress” causes decreased “willpower”, but you should be able to predict that when someone is worried about giving a presentation, they might stuff their face with sugary cookies before the event.
Better yet, come up with a test that tells you whether some particular instance is part of a class. In short, each variable is just an interface and you calculate its value by asking whether some instance satisfies that interface. But the question remains, what does this “interface” look like? Is it nothing but the causes of that variable? For example, stress may decrease your success at some willpower challenge, but what constitutes a willpower challenge?
Another important skill to learn: noticing and preventing the conjunction fallacy. Pay attention to every detail added to a story, and add up the absurdity (-log2(p)) of each detail. You should be able to eyeball the absurdity of any given statement or essay.
Another skill: Start from evidence, not hypotheses. Don’t just pick out hypotheses from nowhere and search halfassedly for evidence. I think it is safe to say that you’re only allowed to come up with abstractions and correlations on your own. Getting a causal model from there should be a deliberate process. Human minds are good at come up with categories, I think. So, notice things that go together and then sit and ask why.
Smoke out hypotheses that don’t forbid anything. That is, hypotheses for whom no outcome will falsify them. How? Take plausible-sounding hypotheses that can explain every outcome in hindsight.
Demand narrow predictions. Spot hypotheses that make falsifiable yet vague predictions.
Take confusing ideas, or ones that you think you understand. Then, taboo adjectives, verbs, and nouns. You must get falsifiable predictions at the end.
Everything boils down to a question of fact. You’re not entitled to your own opinion.
Basically, how do people avoid having to change their mind? They hide behind words and “personal opinions” (which they express in words, but not narrow predictions, and thus are safe from falsification). Or they conveniently explain every outcome in hindsight. Or they make vague predictions that are hard to falsify.
Think locally. Think in terms of causal models. Learn, research, and infer using causal models.
Find out which things are decoupled.
A key skill from programming: look at the complete life-cycle of an object. It’s not enough to know how to use that object when it’s given to you. You must be able to construct it from scratch, and update it with time. Otherwise, you don’t really understand it. For example, take Eliezer’s Thingspace. Once you have a bunch of dimensions, you can reason about things. However, you have to get those dimensions from scratch someday. Imagine you’re a fledgling AI and you just got your first observational tool. That will be your first dimension. How do you go from zero to one dimension? And then from one to n dimensions? Without knowing this, I can’t really apply the Thingspace concept to my life.
Attention control: LW article.
Resources
Elias Barenboim at Purdue
Understanding Science - http://undsci.berkeley.edu/article/howscienceworks_02
A beginner’s guide to the scientific method - Stephen Carey
What is this thing called science?
Introduction to the Philosophy of Science - Cutting Nature at its Seams (concrete examples from biology)
Scientific Method in Practice - Hugh Gauch
How the Great Scientists Reasoned - The Scientific Method in Action - Tibbett
Scientific Methods - An Online Book
Tasks
Simplify your tasks so that you can reason about them.
What is my overall goal? I want to master the techniques of reducing uncertainty.
What am I uncertain about regarding mastering the scientific method?
I’m unsure about the domain-independence hypothesis - that some techniques can get you a long way regardless of the domain. I don’t know the ratio of domain-independent and domain-dependent tasks in any field. Plus, cognitive psychology constraints like the number of chunks we can learn and memorize could play a role too.
I don’t know how to apply the idea of information and entropy. It seems like we should seek out experiments that are high entropy, but how exactly do we do that in practice and how helpful is it?
But these aren’t the main questions.
How do I obtain accurate causal models about the systems I care about? How abstract should they be? Do I need to memorize them or can I write them out and refer to them when needed? How much predictive power does one causal model give you? How much information do you need to answer a given set of queries and a source of information?
How quickly can I get information from a source? How much information can I possibly gather in my life? Thus, what is the maximum expected number of utilons I can get from my life?
Aim
My aim is to design a step-by-step algorithm for thinking scientifically: learning, researching, and problem-solving. The next step is to burn that algorithm into my brain.
So, first, I have to show that the algorithm can actually help you solve problems (which needs me to flesh out the algorithms). Then, I need to show that I can use this algorithm on real-world problems, which are rough and noisy. Then, I need to practice it like hell.
I suspect that a bare skeleton of a real-world problem will be enough to let me infer what kind of problems my algorithm needs to solve. Right now, I have no empirical examples behind the term “real-world problems”. I need to take up a real-world problem - any real-world problem - and actually look at the problems I need to solve along the way.
The thing is I don’t even have a particular problem I want to solve. I’m just looking at a generic algorithm to solve problems in the abstract. Which means that I’m probably not thinking concretely about what my algorithm needs to contain. The time has come, I think, to take up a reasonably challenging project, employ my scientific thinking skills as best I can, and learn from the train wreck that ensues.
What sort of a project should it be? It should have a high standard for success and it should be judged objectively, not by my own mind. I want it to take no more than a week, maybe two at most, so that I can learn quickly from my mistakes and attempt it again. What are some examples of such projects, even if I can’t or won’t attempt them? Writing an essay about a known topic qualifies. So does painting a portrait, composing a song, or preparing a speech. These seem like places where you solve problems using your existing knowledge (of the art of writing, painting, or public speaking). The same goes for tackling exam problems from past college courses, writing a program, or planning a weekend of fun.
A learning project seems harder to pick. I feel you can’t test how well you’ve learnt without solving problems. In fact, you need to know what problems you’ll be solving if you are to extract a useful model. What are some learning projects? Learning about human biology (from a high school textbook), learning category theory concepts and how they can be applied to problems, or learning how to take a good photo.
Hmmm… it seems like you can’t divorce learning from problem-solving. There’s no point “learning” category theory if I can’t write better Haskell programs afterwards. The problems you need to solve define the granularity of your causal model. And the abstraction level of your causal model limits the problems you can solve. You can’t solve biotechnology problems using anatomy concepts - they’re too heavy-handed.
So, I need a project with a bunch of problems that need to be solved and some resources that can help me solve them. One idea is to take up a high school physics textbook - it will have problems to solve, instead of long answer questions that expect me describe my model.
Or maybe start from the problems and move backward to the resources needed, for that is indeed a skill you need in the real world. You don’t usually have clear-cut resources with all the answers to a set of questions. That only happens in school, and only because they can’t do any better. In real life, you choose your own problems (or maybe they choose you) and you pick your own learning resources, based on your constraints.
For example, take photography. I’m not a good photographer, by any stretch. What would I need to do to have good photos at the end of the day?
A research project may be easier to test. You need to choose between alternative models and pick one winner at the end. This at least doesn’t demand problem-solving. By having a limited set of models among which to pick, you curtail your problem space.
I need a performance measure for scientific thinking. Feedback, in other words. What are the consequences of scientific thinking?
Think of local alternatives for your algorithm. Don’t consider the thing as a whole. For example, take the skill of judging the value of information for an action before doing it - like checking the use by date of other bread packets before choosing one. You could have just gone with your first choice and assumed that the others would be the same. Life is made up of such moments. Identify what you want to accomplish but can’t and spot things that can help you. This is analogous to the startup principle of “live in the future and build what’s missing”. Use your algorithm and encompass what’s missing. This could be chaotic, informal stuff like taking up the mantle in an unstructured group discussion, solving math problems, designing your study plan for the next three months, writing an essay, designing a program, etc.
Similarly, look for local feedback mechanisms. You don’t need a big honking measurement tool to tell you how good a “scientist” you are. Just check if you can taboo well, find differing predictions, and so on.
One way is to use resources with solved problems involving the same variables, like textbooks. The least your scientific method can do is let you solve all the exercise problems.
Lessons from Locality of Causality
Always think in terms of the immediate causes. Instead of talking about grand differences between hypotheses, just zoom in on one variable for which they posit different immediate causes. That’s all it takes. Locality of causality says that if some variable comes out differently according to two hypotheses, then the difference must be because of its parents; the rest of the model doesn’t matter. So, exercise the parent causes that are different and you will get evidence one way or another.
The dual of that is to think in terms of immediate effects. But you can’t really isolate the effects on its direct children, can you? The effects will propagate to all the descendants, which doesn’t simplify your task in any way. Thinking in terms of immediate causes is useful because you can cut the causal graph off at the parents.
Corollary: Don’t think in terms of faraway causes. Keep it up close and personal.
Corollary: Alternative hypotheses are nothing but different immediate causes. So, to formulate an alternative hypothesis, just ask which variable’s immediate causes you are going to change. It’s not magic. Think of possible factors and choose subsets to get distinct hypotheses.
So, the central question is: what are the possible immediate causes for this variable? What happened seconds before this? What happened a few metres around this?
Basically, ignore the path by which the world got here. Just look at the previous state and make all your predictions. It doesn’t matter how a switch gets flipped; once you know it’s flipped, you don’t care.
Could that be the essential difference between scientific and unscientific thinkers? Is it that we humans naturally posit long and hazy connections between things instead of looking at short, specific, immediate causes? (I think I just made a long and hazy connection here, instead of looking at the immediate causes.)
This state-based thinking leads to a dramatic reduction in complexity, I think. This could give us a huge advantage over algorithms that throw away the state-transition information. Instead of dealing with a large web where pretty much everything causes everything, we just look at causes and effects within a small locality in time and space.
Also, don’t accept any variable without knowing its causes. Otherwise, unless it’s some huge thing that doesn’t really change, you’re flying blind.
Test for “A causes B”: Manipulate A directly. B should change. Also, manipulate other things but keep A (and other immediate causes of B) constant. B shouldn’t change.
Taboo the variables. For example, what does “self-control” really imply? What tasks demand that of you? Does concentration require self-control? Does some physical activity require it?
When considering whether A can cause B, ask about the possible known causes of B. If some other factor unrelated to A is known to determine B, then A cannot cause B, directly or indirectly. The same goes when asking whether something can cause both A and B.
When someone poses a new hypothesis, ask what factors they say won’t affect some variable (directly). For example, the virtue theory of willpower says that time or action won’t affect willpower, whereas the willpower as muscle theory says that actions will affect willpower but virtue probably won’t. This gives us easy distinguishing tests.
So, don’t talk about what affects X. Talk about what doesn’t affect X - the things that are not direct causes.
An experiment basically tells you that the intervention A is a cause of the dependent variable B. So, barring independent variables that come between A and B, no matter what else we change, if you change A, then B will change accordingly.
Keep your variables narrow. For example, willpower available for tasks of type A need not be the same as that for tasks of type B. It just so happens that it is.
Make sure that your variables are observable or manipulable. For example, I thought “does the hypothesis constrain a variable” was a good enough variable. But, later I realized that I had no test for whether that variable was true or false. I couldn’t really observe its value.
Causal Diagram vs Process Description
I somehow feel uncomfortable with this idea of causal links between variables. It seems unsatisfactory to me.
What I really want is a description of the underlying process that reality is using to generate what we see. I want to know the program that reality is running. That is what programs are - descriptions of processes. And that is, I think, what hypotheses are too.
The idea of isolating all the variables (independent or dependent) and looking at their relationships seems too low-level to me, like I’m missing the forest for the trees. I want to know the deep process that generates those surface phenomena. Plus, the variables seem quite fragile; they seem to be based on your current tools for observation or manipulation. I want to go beyond current tools. I want to know “how the universe works on such a deep level that you know exactly what to do to make the universe do what you want”. I want an in-depth understanding of the domain, dammit.
I believe that is what I did instinctively when I summarized book chapters or PG essays - I came up with some simple, sweeping model that could explain all of the observations and more.
You need to use your model of the process to say how the system will proceed given the initial state. That is all. Wait. Also, you need to infer what happened earlier using the evidence.
The domain of programming really helps cement this intuition, because you can’t get away with simplistic relationships between variables. You absolutely have to talk about the program in its vast complexity (?). It defies easy summarization. You can see the difference in predictive power between a person who knows the program inside-out and somebody who is just making a few surface relationships. This is the power of reductionism! Once you know the actual program being executed, you can see everything! There is no question of probabilities or anything. You know damn well exactly what is going to happen at each point.
This is why I feel queasy about using probabilities within a hypothesis. That’s the ultimate sign that you don’t understand the domain too well - you’re basically guessing at what will happen (and will thus get a lower posterior probability than someone who makes sure predictions). If you know the exact model that underpins a domain, you will be able to predict exactly what will happen, no two ways about it.
Maybe we can have probabilities to represent our confidence in different hypotheses (their prior probabilities), because we don’t know which hypothesis reality is using, but within the hypothesis, we should be sure of what we expect.
In short, a tree belief network is just not expressive enough, I think. You need a better language with which to describe processes.
[Ed (December 2015): This is completely explained by the idea of stable, autonomous mechanisms making up the causal model. You may describe the mechanisms however you want, but the model itself is a DAG.]
High Confidence
Having a narrow overall confidence level means that you will make much more useful predictions. You will be thinking technically - you will say this will happen and that won’t. Also, it means that you’re pretty much sure of what will happen.
Yes, the future is uncertain and you don’t know for sure that something won’t come along and turn everything upside down. But the point is, you have no good reason to believe that. On average, your beliefs will be continue to be held by the future evidence. If you have a 99% belief in some hypothesis, then
No, you cannot say it’s good to be pessimistic and we should try to hedge our bets and whatnot. This is probability theory. If you had extra information of any sort, it would already be incorporated in your final confidence levels. If you knew that tomorrow you would find an unpredicted result, you would already reduce your confidence levels, the same way you would buy a stock today itself if you suspected it would skyrocket tomorrow. So, if after updating on all the evidence using Bayes Theorem, you get to a narrow confidence distribution among your hypotheses, it means you’ve hit the jackpot (assuming you’ve meticulously looked at all the information you have). You have no particular reason to fear becoming falsified tomorrow, just as you have no particular reason to rejoice becoming even further supported tomorrow. That is what narrow, falsifiable hypotheses help you achieve.
And narrow hypotheses help you achieve this at speed. The narrower the predictions, the more each piece of evidence will redistribute your confidence in your hypotheses.
Conditioning on a Collider: Example
Say you have a causal model like this A -> C <- B, where A and B are both causes of C. If you don’t know anything else, A and B will be uncorrelated. Knowing about one will tell you nothing more about the other.
However, if you know C, A and B will be inversely correlated. Why is this? Assume that A and B are binary variables and that C will be on only when one or more of them is on. Now, say you know that C is on. What can you say about A and B? Well, there are three possibilities: A is on and B isn’t, B is on and A isn’t, or they’re both on. Now, I tell you that A is on; what can you say about B?
P(A) = p; P(B) = q;
off-off - (1-p)(1-q)
off-on - (1-p)q
on-off - p(1-q)
on-on - pq
P(C) = P(A, not-B)P(C|A, not-B) + ... so on
If you know P(C) = c and you assume that only P(C|not-A, not-B) = 0 and all the other conditional probabilities are 1, then you get this equation:
c = (1-p)q + p(1-q) + pq = p + q - pq
c = p + q - pq -- (equation 1)
So, q = (c - p) / (1 - p)
Therefore, once you know about the collider (C in this case), its previously uncorrelated causes (A and B) become correlated, as per equation 1.
Generally, C = f(A, B). So, when you know the value of C, the values of A and B become constrained by the equation.
This article gives a good example. Assume that Hollywood actors have two types of abilities, looks and acting skills. Further assume that casting directors cast those whose total of looks score and acting score crosses a certain threshold. Now, if you observe the pool of actors who have been selected, you will find that their looks and acting are inversely correlated.
Why? Because of the selection pressure: looks-score + acting-score > threshold. I don’t understand why they would be perfectly inversely correlated (that would require looks-score + acting-score equal to threshold). But, they sure would be constrained by the equation above. And if very few have got selected, then the higher the looks-score, the lower their acting-score will be, and vice-versa.
Notes about the Scientific Method Sequence
Confusion: when we think that some part of a hypothesis constrains variables, when it actually doesn’t. It doesn’t tell us what will not happen. Anything can happen. It is unfalsifiable and useless. But, the danger is that it feels useful. It feels like it’s paying some rent. So, we don’t feel the need to go out and replace it. Test: We won’t feel surprised by any particular outcome? (examples of confusion?) What causes this? Can that make it go away? Is the answer taboo? (taboo -> confusion -> replacement hypothesis; So, if you’re confused, you won’t seek out alternative hypotheses. You will feel like you have got predictive power in that area; you will feel you understand it (?). At least, you won’t feel like changing your mind.)
confusion = #variables for whom you can’t say what won’t happen. No. confusion = can’t make predictions (since you don’t constrain variables), can’t learn from experience (? if you had falsifiable beliefs, you could learn over time. Here, you can’t. Better to be wrong than confused), problem feels like it can’t be solved or feels like it has been solved (? if you feel it can’t be solved, you won’t try to solve it. Note: if someone else does it in front of you, things will change (Servo example). if you feel you have already solved, you won’t pay attention to discrepancies.). confusion => can’t or won’t change your mind.
replacement hypothesis = do you look at other hypotheses? can you change your mind?
TODO: Notice surprises.
Aim to falsify your hypothesis as quickly as possible. Come up with alternative hypotheses and use differing predictions to distinguish between them. This is the heart of the scientific method. Either run experiments to test the differences or just use observations to do. (narrow hypotheses, differing predictions -> more predictive power - lose quickly and win big; also, require very few pieces of evidence -> efficient)
Key Question: What else could it be?
Where do we not try to look at differing predictions? Where are we inefficient? Give me three real-life examples of distinguishing between hypotheses.
One way to do this would be to just test one hypothesis in every scenario, especially in cases where it rules out most of the possibilities. But if it gets the right answer a few times, should you accept it? If not, why not?
Eventually, get to causal models and your best method for solving problems. Show how to use it to solve real, everyday problems. How is it superior to existing methods? (causal models -> narrow hypotheses, low complexity, differing predictions, already tabooed, efficiency, etc.)
Confusion, speed, efficiency, and formulation. To remedy confusion, use taboo and ensure that you constrain variables. To gain speed, make narrower predictions and notice surprises [Ed: I would now say: to get speed, look for high-entropy experiments]. To formulate better hypotheses, keep it simple and use causal models. To use scanty evidence efficiently, do empirical scholarship and use differing predictions to distinguish between hypotheses.
Surprising discovery: Unfalsifiable beliefs get confirmed by every outcome. So, your confidence in them grows over time, no matter what! Take the Fifth Column example. You’ll become more and more confident in a useless belief, and then use that to justify your preferred actions. Conspiracy theories rely on this for their sustenance.
Notice surprise and confusion: Make advance predictions. Take the Fifth Column example: you can’t have every outcome support your hypothesis. We don’t get to see all the outcomes at once, so it can be hard to spot when somebody is bending the rules to be unfalsifiable. Antidote: make advance predictions. Also, would you have predicted in advance that this would happen? Like, when I felt that my water purifier’s tap was flowing slower than usual. It might well have been nothing, but why did I have that feeling of “something is not quite right”? Would I have predicted that I would get this feeling? If not, then my model is wrong. Something is wrong with the purifier or I have some unspecified problem in my perception or both. (Turned out that I had turned off the purifier’s power switch earlier and that’s why the flow of water was slowing down.) Corollary: Notice your slight feelings of confusion. Did you predict this in advance? No? Then, raise the alarm! PG - faint feeling at first. Eliezer - design flaw in human cognition.
A more subtle skill is to “pay attention to the sensation of still feels a little forced”. Two weeks ago, during the rainy season here, I noticed mould all around my gas stove. I cleaned it up once. But it came back a week later. I was surprised to see that and couldn’t quite figure out why it even sprouted up in that area. From what I had experienced in the past, mould grew where there was high humidity, places where you didn’t have good ventilation. There was a sink right next to my stove but there wasn’t any mould on or around it. I just ignored the fact that I had no idea what was happening.
Then, finally I decided to look around the kitchen to see exactly where the mould had spread. I discovered that it was centered around my washing machine which was in a nook a few feet behind the kitchen platform. Now things fell into place! I had let the drain pipe of the washing machine lie on the ground inside of inserting it into a drain. So, every time I used the machine, water used to spread around the basin near the machine and take a while to evaporate. Of course the mould had come because of the moisture! I had fallen as a rationalist by making up a false explanation: “maybe somehow the gas stove has something to do with it”, even though there was no clear pattern around the gas stove and that mould extended behind the stove too. I was forcing a fake explanation onto the data and I didn’t even realize it. I should have just said told myself “I don’t know what’s happening” and then sought more empirical evidence.
Confidence even in uncertainty
You can be confident about your choices even in a sea of uncertainty because the laws of probability theory and decision theory guide you. There is a best choice you can make and the closer you get, the better. You don’t have to be worried about truly unknown unknowns; if you have an inkling about something in advance, take action; if you don’t know anything about it, there’s nothing you can do to avoid it!
Given a certain amount of resources (physical or cognitive) and a given state of uncertainty, there is a fixed best action that you can take. There is nothing more you can reasonably do. Your job is done once you crunch the numbers and come to the right choice. (Though, of course, we must stay on guard because of our tendency to justify laziness. However, that too is part of your uncertainty and there is, once again, a right action that you can take. Your mind is not magic; it fits perfectly within the world of uncertainty.)
Canonical Answers, FTW!
By the way, I love canonical answers. One solution to rule them all! In one shot, you eliminate all of the “an-answers” - the ad hoc solutions, the arcane gobbledygook, the cargo cult rituals. You know exactly what works and you can rest safely with that knowledge.
And so - having looked back on my mistakes, and all the an-answers that had led me into paradox and dismay - it occurred to me that here was the level above mine.
I could no longer visualize trying to build an AI based on vague answers - like the an-answers I had come up with before - and surviving the challenge.
– Eliezer Yudkowsky, My Bayesian Enlightenment
Why yes, this might be my own Bayesian Enlightenment as well, knowing that there is a correct answer, that there is a precise dance, with no room for whimsy.
Learn from others’ mistakes
Why should we need years of experience to become as good a detective as Sherlock or as good a body-reader as Cal Lightman (in the TV show Lie to Me)? You can “just” get all the information they got over their years in the field and come to the same model they did, maybe even do better with the benefit of hindsight and improved techniques. PG said “History seems to me so important that it’s misleading to treat it as a mere field of study. Another way to describe it is all the data we have so far.” So, look over their history of work and build your model efficiently. One problem is that humans take time to build skills, so you can’t just slurp it all up in a single week. Another problem is that they would have practiced intervening too; Sherlock would have practiced asking probing questions, and so would Lightman. You can’t get that experience just by reading the data. Still, you can go a long way. It would be unwise to disregard such a potent source of information.
(Note: Sherlock and Lightman are fictional characters. So, it’s not accurate to use them as evidence. Just take them as representative examples of world-class experts.)
Bibliographies form a graph for literature search
One way of searching for information is through the bibliography of some resource that interested you. I’m looking for good resources on categorization in cognitive psychology and other things that can help improve my thinking. None of the web articles or videos I’ve viewed so far have helped. I finally hit on one Youtube video that covered topics of interest to me. Now, I have to look for the professor in that video and go through his website or published papers to see what books he has read to understand cognitive science the way he does, and so on. This applies to bibliographies on people’s personal web pages, research papers, textbooks, and essays.
This is one reason why number of citations is important: people use it to narrow their literature search. Yes, if an obviously important paper cites a resource, they will go read it regardless of its citation count, but all else being equal, they consider a paper with higher citations to be better.
Category theory for Causal models?
TODO: Do they form a category? Any existing work on this?
comments powered by Disqus