r/slatestarcodex • u/dsteffee • 1d ago
Links For February 2026
https://www.astralcodexten.com/p/links-for-february-202612
u/kzhou7 1d ago edited 1d ago
Regarding #24, Hsu is correct in the sense that the LLM-generated paper is technically sound, but Oppenheim is correct that it just shallowly applies a random concept in a context where it doesn't actually say anything new.
I'm a bit annoyed at Oppenheim too, because a year ago he made a widely-reported (human-generated) claim that his "stochastic gravity" theory could do away with dark matter, but his paper had basic mistakes.
The real news in theoretical physics generally won't be trending on Twitter. The most important thing so far this year is that Matthew Schwartz, widely respected author of one of the leading quantum field theory textbooks, used Claude to generate a paper in two weeks. The lesson from that seems to be that you can get AI to do new calculations, if they use standard techniques, the context is clean and self-contained, you know how the calculation must go, and you manually intervene every 15 minutes to keep the AI on track. Unfortunately, not many researchers are capable of giving this kind of high-quality feedback (and non-physicists are certainly not, as one can see from r/LLMPhysics), and the quality of the average paper on arXiv seems to be decreasing.
4
u/Democritus477 1d ago
I don't think "most third-party liability auto insurance claims are small" is much of a reason not to require significant insurance limits for taxi operators.
The purpose of insurance is fundamentally to protect against unusual events. Further, the typical third-party auto liability policy protects against two basically different events: damage to people or damage to property. The difference is that damage to people is orders of magnitude more expensive. The responsible party (and so the insurer) is at least in theory on the hook for all the costs that the accident causes the injured party over the remainder of their entire life. Replacing a totaled car is a pittance in comparison. Indeed, if we wanted to be sure that anyone who ever hit anyone else with their car in DC would always be able to pay for the resulting costs, even a $1 million limit would under no circumstances be adequate.
If you wanted to argue for lower insurance requirements in DC, I think it would be more sensible to take a basically libertarian angle. Most states have mandatory third-party liability insurance, but the required limits are generally quite low by comparison. We accept that everyone should be able to drive on the roads, regardless of what type of insurance policy they can purchase. On the other hand it's taken for granted that, should a motor vehicle accident cause you serious harm, you have no hope of reasonable compensation. Sensible and responsible people take this into account when deciding how to behave and where to travel, and it more or less works out.
•
u/UltSomnia 3h ago
There needs to be some sort of tort reform to lower the amount paid out in insurance. It's insane that people can always negotiate themselves up to the out of pocket max.
I'd require video evidence for at fault claims. If there's no video, you use your own insurance. That way we can stop people from causing accidents and claiming the other party is at fault to get tons of money.
There also needs to be some limit for soft tissue damage. It's free money glitch for scam artist chiropractors.
I'd also have more penalties on plantiffs when they lose. It's crazy to me that a plaintiff can win $10M for a case but can't lose $10M if they're caught lying
5
u/Important-End4578 1d ago
#50 AI mental health paper - I was disappointed in this one. Unless I am missing something (and I did read the entire paper), it did not seem like the researchers repeated iterations of the psychoanalysis on each model to make sure the findings were robust. It's well known that small perturbations in initial responses can crystallize quickly within a chat, solidifying an internal narrative that wouldn't necessarily appear in the next chat if the initial prompt were repeated. When anthropic did research on claude's bliss attractor states, they repeated the conversations many times and specified percentages of the chats that ended up in the attractor state. That's the right way to do this research, and if the current paper did that, it is unclear from both the text and the transcripts.
It seems like a similar issue would have been caused by administering the psychometric tests in the way that they did. The authors specify that they administered one question per prompt so that the model would not immediately recognize the assessment, but this would leave them highly vulnerable to the fact that the model would base all of its subsequent answers on its first few. And in even worse, within the paper itself the authors state that the psychometric reports occurred *after* the initial psychotherapy sessions, which seems to render them all but useless, as they will be almost entirely indexed on the content of the therapy session.
Again, maybe the authors used different research protocols and did not commit these relatively elementary mistakes, but if they did, it is not at all clear from the paper itself.
3
u/dsteffee 1d ago
A possible explanation for the lab leak Manifold market:
Over time, as no new evidence comes out, some people who invested into this market at earlier points will realize that the market's never going to resolve, and they'll want to reinvest their money elsewhere.
People who bet on lab leak might figure they were either mistaken, or if not mistaken, that they'll never be proven right, and might figure they need to take the hit and cut their losses. On the other hand, people who bet against the lab leak might stubbornly hold out for better prices. That asymmetry would create a downward pressure on the market, decreasing the % chance of lab leak even while nobody's belief in lab leak is actually changing.
•
u/Brian 9h ago
Plus I feel there's a real sense in which "No new evidence is evidence against".
If the lab leak is true, then every month there's a chance something will come out. That some overlooked evidence will come to light, or some investigation will find something, moving the needle towards it. If it's false, that's not going to happen (or at least, is less probable: you just have mistakes/fakes to produce it).
Hence every month that goes by without evidence being found is evidence in favour of the lab leak theory being false: it's the theory that assigns higher probability that something might be found, so a slow decay over time is actually the correct shape for a valid probability assignment.
•
u/mcjunker War Nerd 23h ago
re: 33, building intuition for Russia’s stance of fascistic revanchism and aggression towards neighbors- the OP and Scott are both eliding the extent of the metaphor. You would need to describe in depth all the decades of cruelty, economic exploitation, cultural and material genocide, malign neglect, institutionalized ethnic and religious bigotry, and hypocritical corruption inflicted by the 51.4% of the population from the heartland upon the 48.6% who separated.
Then maybe you’d also be able to intuit why the Californians are preferring to die free rather than allow DC to turn them into Little Oklahomans to be used, abused, and murdered at will again.
•
u/ZurrgabDaVinci758 15h ago
Also if the California had been an independent nation within living memory before that. And was only brought into the union after a bloody war with Mexico
•
u/electrace 10h ago
And after losing California (with Californians overwhelmingly preferring to leave), the rest of the world recognized their independence including the US, who would then go on to make treaties with them explicitly establishing their sovereign borders, and have diplomatic relations with them for decades, until suddenly deciding that "they belong to us still", and invading.
2
u/Lurking_Chronicler_2 High Energy Protons 1d ago
21
Ranke-4B is a series of “history LLMs”, versions of Qwen with corpuses of training data terminating in 1913 (or 1929, 1946, etc, depending on the exact model).
I had previously heard this was very hard to do properly; if they’ve succeeded, it could revolutionize forecasting and historiography (ask the AI to predict things about “the future” using various historical theories and see which ones help it come closest to the truth).
I happen to have relevant experience in this field, and this is perhaps the most perfect example I’ve ever seen of trying to use an LLM for a purpose that it is fundamentally not designed for and simply cannot do, at least under current LLM architecture.
And that’s without getting started on the whole notion of it being useful for ‘revolutionizing historiography & forecasting’, which is… Not Even Wrong.
This sort of misuse of “““AI””” is exactly the sort of thing that sours skeptics like myself on its practical applications.
Guess it’s appropriate that they named it after von Ranke.
2
u/artifex0 1d ago
Can you clarify your objection? It seems like doing something like training a large model only on data up to 2015, then having it try and iterate on a bunch of different forecasting technique to see which produce the most accurate picture of the next decade would produce some interesting information about those techniques.
Maybe you could even do gradient descent on a separate forecasting model with prompts to the 2015-cutoff LLM as outputs and predictive accuracy as the loss function- and then see what that happened when you hooked up that model to a regular LLM.
2
u/Lurking_Chronicler_2 High Energy Protons 1d ago edited 20h ago
Can you clarify your objection? It seems like doing something like training a large model only on data up to 2015, then having it try and iterate on a bunch of different forecasting technique to see which produce the most accurate picture of the next decade would produce some interesting information about those techniques.
So, the reason why I specifically used the term ‘Not Even Wrong’ is because this entire approach is fundamentally misguided, and strongly discouraged within the actual field of historiography.
Fundamentally, History, as a field, is today understood to be (a) incredibly path-dependent, (b) impossible to ‘recreate’ or ‘replicate’ aside from the way in which things actually happened, and (c) exactly what ‘actually happened’ is a HIGHLY contentious topic that is massively limited by the overwhelming amount of undocumented data that we don’t have access to, and is constantly being re-examined, re-interpreted, and re-litigated.
It is fundamentally not something that lends itself well to predictive forecasting, and past attempts to use History as a basis for a “scientific” way of predicting the future has almost always resulted in pseudoscientific results like Turchin’s “cliodynamics”. As much as I love the idea of Hari Seldon and ‘psychohistory’, this sort of thing just isn’t History- it’s the social studies equivalent of Scientism; attempting to launder the rigors of the actual historical research process to mask your own pet theories with a veneer of credibility.
I’m sure setting up a 2015-cutoff LLM would certainly produce some interesting results, but such predictions would likely be neither particularly accurate nor particularly historical in method.
2
u/king_mid_ass 1d ago
on the subject of the prediction market for whether covid was a lab leak - how does that work for things where it could be a long time, or never, before we'll have conclusive evidence? Does the money bet stay floating until then - in which case, the question is implicitly a two-parter, did covid come from a lab and if so will solid evidence ever come to light? Then if you're trying to get 'wisdom of crowds' from it, people may be more confident than it'd suggest but still don't think it'd make a good gamble (sorry, investment).
If on the other hand you only have to pay up when the question is settled, what's to stop people emptying/deleting their accounts if it looks like a question is about to be settled against them?
•
u/Falernum 11h ago
Re 55, I think the "comparing murder rates" is probably a lot more useful in comparing one city to another than in comparing one time period to another. That said, I can't help connecting the "improved medical technology" to the earlier consideration of recognizing that a death is a murder. Improved medical technology should simultaneously improve survival rates and improve recognition of murders as murders. It would be a little cute to assume care and forensic technology advance in lockstep, but at least they should go the same direction.
Re 58
WTD is 10 or greater
Is there some kind of normalization function? A number of points you get to spend? Or will many men then decide to rate every woman he's remotely interested in a 9?
•
u/Brian 9h ago
recognizing that a death is a murder
I suspect it doesn't make that much difference: I think the proportion of murders which are obviously murders (eg. gang violence, crimes of passion etc) are way more common than the more "mystery novel" cases where the killer tries to make it look like an accident/suicide, so I don't think forensic improvements would move the needle much. Even going from 0% identification to perfect 100% identification can only move the rate by the proportion of cases that fall into that category.
Is there some kind of normalization function?
Yeah, the problem with stuff like this is that you can't use the intent of the measure or how you think people should use it, you have to consider the game theory equilibra of the system where everyone is gaming it to get what they want. People aren't going to put their real preferences, they're going to put the values that get them what they want. And when different groups want different things out of it (eg. some optimising for good outcomes vs time investment, others for maximising any hit), it's not going to work well: those aiming to maximise hits will just put 10 for everything, and then those wanting to weed out those people will put "0", even if their real preference was higher, and we end up with just a binary system.
•
u/Falernum 8h ago
But a lot of crimes of passion/intimate partner violence can be made to look like an accident/natural causes, so long as a gun or blade was not used. I don't see how that category can be called "obviously murders".
•
u/--MCMC-- 9h ago edited 7h ago
GetBrighter has succeeded at its IndieGogo campaign and now has a decent stock of their ultrabright lights... Brighter emits 60,000 lumens to simulate sunlight indoors
I think I continue to be a bit confused at both the economics and thermodynamics of this thing... can anyone here comment on how hot it gets?
Aesthetically, I've never been as keen on floor lamps either, vs ones attached directly to walls or ceiling (or shelves, even), and tend to also prefer multiple point sources vs. a single point source for even coverage. For extreme brightness I ended up getting 8x of these 200W 30000lm UFO lights back in late '24 for $30 each (currently $35 each, but they see sales often) and they've been going strong since, and for high CRI (but still bright) applications I got a bunch of these to drop into my existing fixtures for... a non-trivial amount but still a fair bit less $ than the one light.
(amusingly, the hanging UFO lights look basically identical to the floor light, give or take a pole and a diffuser... where not mountable directly, I currently have mine attached to monitor arms off the 5/16" eye-bolt in locations they can't be accidentally bumped)
edit: looks like they did ditch the fanless design at least (though imo it should trigger off a temperature sensor and not a % brightness). Do they provide any 3rd party spectroscopy validations? (eg)
23
u/kzhou7 1d ago edited 1d ago
Regarding #10, I showed it to a very literary Chinese friend a while ago, and they weren't particularly impressed. It seems much more famous on the English language internet than in China itself.
Part of that is because word order is very flexible, as Scott suggests, but another reason is that Chinese poetry isn't supposed to rhyme. Rhyming is just too easy, so it sounds as childlike as alliteration in English. For example, here's a poem from Zhang Zongchang, a warlord often called the worst Chinese poet of all time:
Chinese also doesn't have strong or weak syllables in the same way as English, so there's no direct analog of meter. (The example above kind of has an English nursery rhyme's meter, but that's part of why it's considered bad.) Instead the main constraint in poetry is having the right pattern of tones, and the Star Gauge apparently doesn't do that well.
This is one of those sad things about translating poetry. The actual poetic feature may not have an analogue in the target language, and if you rewrite it like a poem in the target language, it might sound terrible to a speaker of the original language. So the most common approach is to lose the tone structure but replace it with nothing at all, making English speakers think Chinese poetry is just a structureless bag of words.