• Open

    death of the artist (Sebastião Salgado, 1944-2025)
    Visit the post for more.  ( 8 min )
  • Open

    In Defense of Defensive Forecasting
    Is superforecasting just sneaky accounting?
  • Open

    In Defense of Defensive Forecasting
    Is superforecasting just sneaky accounting?
  • Open

    Unicode CJK Unified Ideographs Extension J and the nature of the sinographic writing system
    Submitted by Charles Belov: I've been browsing through the proposed Unicode 17 changes, currently undergoing a comment period, with interest. While I don't have the knowledge to intelligently comment on the proposals, it's good to see that they are actively improving language access. I'm puzzled that some new characters have been added to the existing […]  ( 11 min )
  • Open

    How to Combine Scikit-learn, CatBoost, and SHAP for Explainable Tree Models
    Machine learning workflows often involve a delicate balance: you want models that perform exceptionally well, but you also need to understand and explain their predictions.

  • Open

    off to Singapore (BayesComp 2025)
    Visit the post for more.  ( 8 min )
  • Open

    Dungan radio broadcasts from 2018-2021
    We've talked about Dungan a lot on Language Log.  That's the northwest Sinitic topolect written in Cyrillic that has been transplanted to Central Asia.  See "Selected readings" below. For those of you who are interested and would like to hear what it sounds like in real life — spoken and sung by male and female […]  ( 11 min )
  • Open

    Positional Encodings in Transformer Models
    This post is divided into five parts; they are: • Understanding Positional Encodings • Sinusoidal Positional Encodings • Learned Positional Encodings • Rotary Positional Encodings (RoPE) • Relative Positional Encodings Consider these two sentences: "The fox jumps over the dog" and "The dog jumps over the fox".

  • Open

    Pegasus bridge half [37ième]
    Visit the post for more.  ( 8 min )
  • Open

    Conversation with a Chinese restaurateur in a west central Mississippi town
    Running down the road in Clarksdale, Mississippi, I screeched to a halt (felt like Rroad Runner) when I passed by a Chinese restaurant with the odd name Rice Bowl (in Chinese it was Fànwǎn lóu 饭碗楼 — the only characters I saw on the premises).  It was a tiny, nondescript establishment, with six or so […]  ( 17 min )

  • Open

    2025 IMS International Conference on Statistics and Data Science in Sevilla [Dec. 15-18]
    Visit the post for more.  ( 8 min )
  • Open

    ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization
    The field of large language models is shifting toward lower-precision computation. This shift necessitates a rethinking of scaling laws to account for the effects of quantization on resulting quantized model...  ( 42 min )
  • Open

    Persian language in the Indian subcontinent
    That's the title of a valuable Wikipedia article.  I have no idea who wrote it, but I'm very glad to have access to this comprehensive article, since it touches on so many topics that concern my ongoing research. Here are some highlights: Before British colonisation, the Persian language was the lingua franca of the Indian […]  ( 13 min )
    Plato's cave
    The first two panels from SMBC a few days ago: The rest of the strip: The aftercomic: The mouseover title: "Would you rather sit with friends watching shadows on the bigscreen or spending your time arguing with Plato about whether poetry should be legal?" This expands on the 9/9/2015 SMBC: Wikipedia explains the Allegory of […]  ( 9 min )
    The linguistic pragmatics of LLMs
    "Does GPT-4 Surpass Human Performance in Linguistic Pragmatics?" Bojic, Ljubiša et al. Humanities and Social Sciences Communications 12, no. 1 (June 10, 2025). Ljubiša Bojić, Predrag Kovačević, & Milan Čabarkapa.  Humanities and Social Sciences Communications volume 12, Article number: 794 (2025) Cite this article Abstract As Large Language Models (LLMs) become increasingly integrated into everyday life as general-purpose […]  ( 10 min )
  • Open

    Strunk and White for Science
    Validity as a style guide for telling stories about correlations
  • Open

    Strunk and White for Science
    Validity as a style guide for telling stories about correlations
  • Open

    Behind “ANCESTRA”: combining Veo with live-action filmmaking
    We partnered with Darren Aronofsky, Eliza McNitt and a team of more than 200 people to make a film using Veo and live-action filmmaking.  ( 16 min )
  • Open

    Advanced Feature Engineering Using Scikit-Learn Pipelines with Pandas’ ColumnTransformer and NumPy Arrays
    Pandas , NumPy , and Scikit-learn .
  • Open

    msign的导数
    这篇文章我们来推导$\newcommand{msign}{\mathop{\text{msign}}}\msign$算子的求导公式。如果读者想要像《Test-Time Training Done...  ( 6 min )

  • Open

    "More and more less confident"
    From Adam Rasgon and Natan Odenheimer, "U.S. Embassy in Jerusalem Braces for Possible Israeli Strike on Iran" NYT 6/12/2025: More recently, however, Mr. Trump has said he was less convinced that talks with Iran would yield a new nuclear deal. “I’m getting more and more less confident about it,” he told The New York Post […]  ( 10 min )
    Names as verbs
    In a comment on yesterday's post "A 12th-century influencer", Laura Morland wrote: Thanks for sharing "to abelard," the new verb of the month! Note to AP: the grammarians will insist that it be spelled with a lower-case "a". (Verbs are never capitalized, not even in German, I don't believe.) This is one where The Errorist […]  ( 12 min )
    "Good Science"
    The first two panels of today's xkcd: The rest of it: The mouseover title: "If you think curiosity without rigor is bad, you should see rigor without curiosity." It's not just science — today's Tank McNamara: Some extra reading: Gavin Francis, "What Do You Expect?", The New York Review 6/26/2025. A couple of relevant past […]  ( 11 min )
    Sinograph ambigram for "mindfulness"
    From Ting Fen Yik on Facebook: It's been a while since we've posted on ambigrams.  David Moser is the master in Chinese and in English.  See the references below.   Selected readings "Weird characters" (7/7/13) "Orientation-dependent ambiguity" (12/27/18) "Sinographic memory in Vietnamese writing" (4/16/14) — see esp. the last comment "Freemocracy" (6/13/19) "Happy LÓNG year!" […]  ( 9 min )
  • Open

    AI Narratives [book review]
    AI Narratives: A history of imaginative thinking about intelligent machines is a 2020 collective book edited by Stephen Cave, Kanta Dihal, and Sarah Dillon, with about twenty contributing authors, through a series of 16 chapters on the relation between culture (literature, films) and our societal approach to AI, with varying perspectives, some overlap between chapters, […]  ( 10 min )
  • Open

    How we're supporting better tropical cyclone prediction with AI
    We’re launching Weather Lab, featuring our experimental cyclone predictions, and we’re partnering with the U.S. National Hurricane Center to support their forecasts and warnings this cyclone season.  ( 7 min )
  • Open

    Trevisan prize (guest post by Alon Rosen)
    The Trevisan Prize for outstanding work in the Theory of Computing is sponsored by the Department of Computing Sciences at Bocconi University and the Italian Academy of Sciences. The prize is named in honor of Luca Trevisan in recognition of his major contributions to the Theory of Computing. It aims to recognize outstanding work in the field, and to broaden the reach … Continue reading Trevisan prize (guest post by Alon Rosen)  ( 13 min )
  • Open

    Trevisan prize (guest post by Alon Rosen)
    The Trevisan Prize for outstanding work in the Theory of Computing is sponsored by the Department of Computing Sciences at Bocconi University and the Italian Academy of Sciences. The prize is named in honor of Luca Trevisan in recognition of his major contributions to the Theory of Computing. It aims to recognize outstanding work in the field, and to broaden the reach … Continue reading Trevisan prize (guest post by Alon Rosen)  ( 13 min )
  • Open

    Navigating Imbalanced Datasets with Pandas and Scikit-learn
    Imbalanced datasets, where a majority of the data samples belong to one class and the remaining minority belong to others, are not that rare.

  • Open

    a journal of the chaos (en cuisine) year
    Read La Fille du Grand Hiver (The Daughter of the Great Winter) by Isabelle Autissier (also a sailor who was the first woman to complete a solo world race in 1991). This is a novelised version of the story of Arnarulunguaq, who accompanied Knud Rasmussen on the Fifth Thule Expedition over several years, all the […]  ( 10 min )
  • Open

    A 12th-century influencer
    From Ada Palmer, "Inventing the Renaissance: The Myth of a Golden Age": The new scholastic method was so exciting! that when Peter Abelard got kicked out of his monastery (for proving its founding saint didn’t exist—that pissed off the abbot, who’d have guessed?) and went to live as a hermit in the wilderness of Champagne, […]  ( 11 min )
    Boop?
    The latest xkcd: Mouseover title: "With a good battery, the device can easily last for 5 or 10 years, although the walls probably won't." The joke worked for me, although I was pretty sure that a (current) MacBook makes no sound when a usb device connects. I checked, and that's true. A current Windows 11 […]  ( 10 min )
    The grammar and sense of a poetic line
    Randy Alexander is not a professional Sinologist, but when it comes to reading Chinese poetry, he's as serious as one can be.  The following poem is by Du Fu (712-770), said by some to be "China's greatest poet".  In the presentation below, I will first give the text with its transcription, and then Randy's translation.  […]  ( 17 min )
  • Open

    Step-by-Step Guide to Deploying Machine Learning Models with FastAPI and Docker
    You've trained your machine learning model, and it's performing great on test data.
  • Open

    Mistral Magistral:纯强化学习炼就的推理引擎,颠覆LLM训练范式
    无需蒸馏、抛弃SFT,Mistral用纯强化学习在数学与代码推理任务上实现50%性能飞跃 近日,Mistral […]  ( 4 min )
    小红书dots.llm1:重新定义MoE效率边界,14B激活参数挑战72B密集模型极限
    核心突破:极简激活的超级大脑 三大技术支柱撑起SOTA表现 1. 数据工程:11.2T高质量token的炼金术 […]  ( 4 min )
  • Open

    A Few Announcements
    Trevisan Prize 2025 Here is a call for nominations for a new theoretical computer science prize, in memory of  Luca Trevisan. (h/t Alon Rosen.) Three Near Future Events at HUJI While the Erdős Lectures 2025 given by Mehtaab S. Sawhney, … Continue reading →  ( 14 min )
  • Open

    A Few Announcements
    Trevisan Prize 2025 Here is a call for nominations for a new theoretical computer science prize, in memory of  Luca Trevisan. (h/t Alon Rosen.) Three Near Future Events at HUJI While the Erdős Lectures 2025 given by Mehtaab S. Sawhney, … Continue reading →  ( 14 min )

  • Open

    [fool’s] gold standard science
    In this new presidential order of 23 May 2025, Trump pretends to “restore the scientific integrity policies of my first Administration and ensures that agencies practice data transparency, acknowledge relevant scientific uncertainties, are transparent about the assumptions and likelihood of scenarios used, approach scientific findings objectively, and communicate scientific data accurately” repeating his goal in […]  ( 10 min )
  • Open

    Milton Friedman's p-values
    Remind me what happens when a measure becomes a target.
  • Open

    Milton Friedman's p-values
    Remind me what happens when a measure becomes a target.
  • Open

    Implementing Vector Search from Scratch: A Step-by-Step Tutorial
    There’s no doubt that search is one of the most fundamental problems in computing.

  • Open

    exceptional OWABI web/sem’inar [19 June, BayesComp²⁵]
    Exceptionally, the next One World Approximate Bayesian Inference (OWABI) Seminar will be hybrid as it is scheduled to take place during BayesComp 2025 in Singapore, on Thursday 19 June at 8pm Singapore time (1pm in Tórshavn) and two talks, one by Filippo Pagani on Approximate Bayesian Fusion Bayesian Fusion is a powerful approach that enables […]  ( 10 min )
  • Open

    How to Optimize Language Model Size for Deployment
    The rise of language models, and more specifically large language models (LLMs), has been of such a magnitude that it has permeated every aspect of modern AI applications — from chatbots and search engines to enterprise automation and coding assistants.

  • Open

    Proto
    That's the title of a brand new (3/13/25) book by Laura Spinney, author of Pale Rider, a noteworthy volume on the 1918 influenza pandemic.  Here she is interviewed (6/7/25) by Colin Gorrie (the interview is too long [58:14] to post directly on Language Log): Proto-Indo-European Origins: A Conversation with Laura Spinney     Follow along with the […]  ( 9 min )
    De(semi)colonization
    Babbel's April 2025 Semicolon Survey looked at students' reactions to the obvious secular decline in semicolon frequency: The semicolon once stood as a symbol of thoughtful, elegant writing, a punctuation mark beloved by literary greats like Jane Austen and Virginia Woolf. But today, the humble semicolon faces an uncertain future. New analysis from Babbel uncovers […]  ( 13 min )
    LLMs that quack like a duck
    A letter to the editor on the essential nature of LLMs from the Times Literary Supplement (5/30/25):  Large language models As someone who has spent the past few years working out what AI means to academic journals, I found Melanie Mitchell’s excellent review of These Strange New Minds by Christopher Summerfield (May 16) full of […]  ( 11 min )
  • Open

    OCEAN day
    Visit the post for more.  ( 9 min )

  • Open

    tenets of quantile-based inference in Bayesian models
    This 2023 paper of Perepolkin, Goodrich, and Sahlin vaguely relates to our insufficient Gibbs work in that a Bayesian analysis is conducted based solely on quantile summaries. Except that here the input is the entire cdf, or the—inverse cdf—quantile function, or—its derivative—the quantile density function, instead of the probability density function—used as the likelihood in […]  ( 10 min )
  • Open

    "Public Universal Friend"
    Stephanie Farr, "The nonbinary Revolutionary leader who preached in Philly during the Revolution", The Philadelphia Inquirer 6/5/2025: Sometimes when I walk the streets of Old City, I imagine the people of colonial times who walked those roads before me, before Philadelphia was Philly and before this nation secured its liberty and identity. I mostly think […]  ( 10 min )
    Drama at the National Spelling Bee
    Faizan Zaki overcomes a shocking, self-inflicted flub and wins the Scripps National Spelling Bee Ben Nuckols, AP (5/30/25) Not what you would expect when the stakes are so high: The favorite entering the bee after his runner-up finish last year — during which he never misspelled a word in a conventional spelling round, only to […]  ( 12 min )

  • Open

    death of a benefactor of humanity (Etienne-Emile Baulieu, 1926-2025)
    Visit the post for more.  ( 9 min )
  • Open

    HuggingFace Safetensors Support in PyTorch Distributed Checkpointing
    Summary  PyTorch Distributed Checkpointing (DCP) is making investments into addressing the interoperability blockers to ensure that popular formats, like HuggingFace safetensors, can work well with PyTorch’s ecosystem. Since HuggingFace has...  ( 38 min )
  • Open

    The agonies of an ABC learning Chinese
    As most readers of Language Log know, ABC means "American-born Chinese".  Depending upon how (in)sensitive their parents are, learning Chinese can be hell, and leave them scarred for life. The actors in this video are brilliant and the tale it tells reveals so much about the trials and pitfalls of learning Chinese overseas. If only […]  ( 10 min )
    "The girls are fighting"
    The news has been full of the Musk-Trump feud. Among the linguistic aspects, there's an interesting amount of explicit or implied gender association — here's Alexandria Ocasio-Cortez in a memic clip widely linked on social media: Your browser does not support the video tag. From the other end of the political spectrum, check out Nellie […]  ( 11 min )
    Buena
    Following up on the issue of English spelling variation, this picture has been making the rounds on social media: I thought of it when I was reminded that the New Jersey borough of Buena is pronounced /ˈbjuːnə/ — so that the first syllable is the same as the first syllable of beauty. It's not clear […]  ( 13 min )
    The gender of gender
    For English speakers, a mind-boggling letter to the editor on linguistic gender from the Times Literary Supplement (3/9/25): Masculine and feminine In Cristina Rivera Garza’s Death Takes Me, reviewed by Lucy Popescu (In Brief, April 18), a character points out that “in Spanish, the word victim, or victima, is always feminine”. This is evidently true, but […]  ( 12 min )
  • Open

    通过msign来计算mclip(奇异值裁剪)
    前面我们用了两篇文章《msign算子的Newton-Schulz迭代(上)》和《msign算子的Newton-Schulz迭代(下)》讨论了矩阵的$\newcommand{msign}{\mat...  ( 6 min )
  • Open

    Dealing with Missing Data Strategically: Advanced Imputation Techniques in Pandas and Scikit-learn
    Missing values appear more often than not in many real-world datasets.
  • Open

    Qwen3 Embedding 技术解析:多语言文本嵌入与重排序的新标杆
    阿里巴巴通义实验室发布的 Qwen3 Embedding 系列模型在文本嵌入(Embedding)和重排序(R […]  ( 4 min )

  • Open

    optimal importance sampling for stochastic optimisation
    A recent arXival by Liviu Aolaritei, Bart Van Parys, Henry Lam, and Michael Jordan (a co-PI in our ERC Synergy Ocean project) discusses optimal importance sampling schemes for stochastic optimisation, processed by an iterative Robbins-Munro algorithm improvement (with the Polyak-Ruppert improvement). “Despite its popularity, IS is often described as a `double-edged sword.’ Its performance depends […]  ( 10 min )
  • Open

    Introducing the PyTorch Ecosystem Working Group and Project Spotlights
    The PyTorch Ecosystem goes back several years, with some of its earliest projects like Hugging Face, Fast.ai, and PyTorch Lightning going on to grow incredible communities of their own. The...  ( 41 min )
  • Open

    "Artificial Intelligence and its evil twin, Darwinism"
    In Daniel Dennett's 1995 book Darwin's Dangerous Idea: Evolution and the Meanings of Life, the chapter titled "Chomsky contra Darwin, Four Episodes" ends with this provocative sentence: The hostility to Artificial Intelligence and its evil twin, Darwinism, lies just beneath the surface of much of the most influential work in recent twentieth-century philosophy. What Dennett […]  ( 14 min )
    "A tricky little area of semantics"
    Elizabeth Ribbens, "How the use of a word in the Guardian has gotten some readers upset", The Guardian 6/4/2025: ‘Got’ was changed during the editing of an opinion piece, leading to correspondence lamenting a slide into American English. But language isn’t a fortress. In Shakespeare’s Henry VI, Part II, a messenger breathlessly announces to the […]  ( 13 min )
  • Open

    Loss Functions Explained: Understand the Maths in Just 2 Minutes Each
    I must say, with the ongoing hype around machine learning, a lot of people jump straight to the application side without really understanding how things work behind the scenes.
    10 MLOps Tools for Machine Learning Practitioners to Know
    Machine learning is not just about building models.
  • Open

    The Open Marketplace of Ideas
    If peer review only refers to papers, it is not defensible
  • Open

    The Open Marketplace of Ideas
    If peer review only refers to papers, it is not defensible
  • Open

    msign算子的Newton-Schulz迭代(下)
    在上文《msign算子的Newton-Schulz迭代(上)》中,我们试图为$\mathop{\text{msign}}$算子寻找更好的Newton-Schulz迭代,以期在有限迭代步数内能达到...  ( 8 min )
  • Open

    Erdős Lectures 2025: Mehtaab S. Sawhney, June 5,9 & 11
    (Click to enlarge) Today, at 14:30, Monday June 9 at 11:00 and Wednesday June 11, at 11:20 Mehtaab S. Sawhney will deliver the 2005 Erdos lecture. Three talks, each representing a monumental achievement!  ( 14 min )
  • Open

    Erdős Lectures 2025: Mehtaab S. Sawhney, June 5,9 & 11
    (Click to enlarge) Today, at 14:30, Monday June 9 at 11:00 and Wednesday June 11, at 11:20 Mehtaab S. Sawhney will deliver the 2005 Erdos lecture. Three talks, each representing a monumental achievement!  ( 14 min )

  • Open

    introduction to Bayesian methods for the social sciences (18-22 Aug, Università della Svizzera italiana, Lugano)
    Visit the post for more.  ( 9 min )
  • Open

    Mapping the exposome
    More than 20 years ago, I posted about the explosion of -ome and -omic words in biology: "-ome is where the heart is", 10/27/2004. I listed more than 40 examples: behaviourome, cellome, clinome, complexome, cryptome, crystallome, ctyome, degradome, enzymome,epigenome, epitome, expressome, fluxome, foldome, functome, glycome, immunome, ionome, interactome, kinome, ligandome, localizome, metallome, methylome, morphome, nucleome, […]  ( 11 min )
  • Open

    Open Source AI is Transforming the Economy—Here’s What the Data Shows
    No content preview  ( 40 min )
    Build Responsible AI Products with your own Yellow Teaming LLM
    The tools we use to build AI are evolving fast, with PyTorch at the heart of many advances. But unless we evolve the way we approach building AI systems, we...  ( 43 min )
  • Open

    NumPy Ninjutsu: Mastering Array Operations for High-Performance Machine Learning
    Machine learning workflows typically involve plenty of numerical computations in the form of mathematical and algebraic operations upon data stored as large vectors, matrices, or even tensors — matrix counterparts with three or more dimensions.

  • Open

    a reason why the Dept of Education is needed (and so are independent universities)
    Visit the post for more.  ( 9 min )
  • Open

    Acronymomania, part 2
    A brief collection of "Chinese words for Adults!", with the last one being "KPI", which I had to look up in English. Posted by UFL – University Of Foreign Languages – LE on Monday, May 26, 2025 A performance indicator or key performance indicator (KPI) is a type of performance measurement. KPIs evaluate the success of an organization or of a particular activity […]  ( 10 min )
  • Open

    Advanced audio dialog and generation with Gemini 2.5
    Gemini 2.5 has new capabilities in AI-powered audio dialog and generation.  ( 14 min )
  • Open

    A Defense of Peer Review
    Wait, what?
  • Open

    A Defense of Peer Review
    Wait, what?
  • Open

    Ethereum Foundation Talk and Conversation: A Critical View on Quantum Computing & A geometry day honoring Micha Sharir
    Ethereum Foundation talk, today This afternoon (Tuesday, June 3, 2025) at 17:00 Israel time I give a zoom lecture on A Critical View on Quantum Computing. The lecture is hosted by the Ethereum Foundation and the 90 minute events will … Continue reading →  ( 15 min )
  • Open

    Ethereum Foundation Talk and Conversation: A Critical View on Quantum Computing & A geometry day honoring Micha Sharir
    Ethereum Foundation talk, today This afternoon (Tuesday, June 3, 2025) at 17:00 Israel time I give a zoom lecture on A Critical View on Quantum Computing. The lecture is hosted by the Ethereum Foundation and the 90 minute events will … Continue reading →  ( 15 min )
  • Open

    10 Python One-Liners That Will Simplify Feature Engineering
    Feature engineering is a key process in most data analysis workflows, especially when constructing machine learning models.

  • Open

    bridging ratio estimators
    Visit the post for more.  ( 9 min )
  • Open

    Word Embeddings in Language Models
    This post is divided into three parts; they are: • Understanding Word Embeddings • Using Pretrained Word Embeddings • Training Word2Vec with Gensim • Training Word2Vec with PyTorch • Embeddings in Transformer Models Word embeddings represent words as dense vectors in a continuous space, where semantically similar words are positioned close to each other.
  • Open

    等值振荡定理:最优多项式逼近的充要条件
    最近在阅读时,遇到了一个关于最优多项式逼近的“等值振荡定理(Equioscillation Theorem)”,证明过程还涉及到无穷范数求导,感觉结论和证明都颇为新奇,特来记录一番。参考资料:《...  ( 7 min )

  • Open

    Marseille mais non Marseille!
    Visit the post for more.  ( 9 min )

  • Open

    a novel discrepancy measure
    My friend EJ Wagenmakers, along with his colleague  Raoul Grasman, have proposed a novel measure of discrepancy between two distributions that they formulate through a basic Bayesian lens as an expected posterior probability, written as (with no priors harmed in the process!) in case both distributions are absolutely continuous wrt a common dominating measure, with […]  ( 9 min )

  • Open

    Brussels snapshot [jatp]
    Visit the post for more.  ( 9 min )
  • Open

    A Gentle Introduction to SHAP for Tree-Based Models
    Machine learning models have become increasingly sophisticated, but this complexity often comes at the cost of interpretability.
  • Open

    Cosmin Pohoata and Daniel G. Zhu: Hypergraphic Zonotopes and Acyclohedra
    I would like to draw your attention to the short beautiful paper Hypergraphic Zonotopes and Acyclohedra by Cosmin Pohoata and Daniel G. Zhu. The paper introduces higher-uniformity analogue of graphic zonotopes and permutohedra, and provides formulas for their volume, and, … Continue reading →  ( 15 min )
  • Open

    Cosmin Pohoata and Daniel G. Zhu: Hypergraphic Zonotopes and Acyclohedra
    I would like to draw your attention to the short beautiful paper Hypergraphic Zonotopes and Acyclohedra by Cosmin Pohoata and Daniel G. Zhu. The paper introduces higher-uniformity analogue of graphic zonotopes and permutohedra, and provides formulas for their volume, and, … Continue reading →  ( 15 min )
  • Open

    解密小米MiMo-VL:7B小模型如何实现多模态SOTA性能
    近日,小米开源社区发布了MiMo-VL-7B视觉语言模型技术报告,其SFT(监督微调)和RL(强化学习)版本在 […]  ( 3 min )

  • Open

    The Good, The Bad, and The Science
    Who should define what is "good" and "bad" science?
  • Open

    The Good, The Bad, and The Science
    Who should define what is "good" and "bad" science?
  • Open

    Using Quantized Models with Ollama for Application Development
    Quantization is a frequently used strategy applied to production machine learning models, particularly large and complex ones, to make them lightweight by reducing the numerical precision of the model’s parameters (weights) — usually from 32-bit floating-point to lower representations like 8-bit integers.

  • Open

    Tokenizers in Language Models
    This post is divided into five parts; they are: • Naive Tokenization • Stemming and Lemmatization • Byte-Pair Encoding (BPE) • WordPiece • SentencePiece and Unigram The simplest form of tokenization splits text into tokens based on whitespace.
    10 Python Libraries That Speed Up Model Development
    Machine learning model development often feels like navigating a maze, exciting but filled with twists, dead ends, and time sinks.
  • Open

    Logic at HUJI
    As part of a day-long meeting on organization of science in Prague, I gave a 10-minite presentation on Science and Diversity in a small Country (click for my slides) and devoted a few minutes and one slide  to the amazing … Continue reading →  ( 14 min )
  • Open

    Logic at HUJI
    As part of a day-long meeting on organization of science in Prague, I gave a 10-minite presentation on Science and Diversity in a small Country (click for my slides) and devoted a few minutes and one slide  to the amazing … Continue reading →  ( 14 min )

  • Open

    PyTorch Hangzhou Meetup Recap: Exploring the AI Open Source Ecosystem and Cutting-Edge Technology Practices
    On May 17, the PyTorch Meetup was successfully held in Hangzhou, drawing nearly 60 developers and industry experts from companies including Huawei, Tencent, Ant Group, and ByteDance. The event focused...  ( 40 min )
  • Open

    This Is Fine
    There is no reproducibility crisis in science. So what is JD Vance on about?
  • Open

    This Is Fine
    There is no reproducibility crisis in science. So what is JD Vance on about?
  • Open

    QwenLong-L1:通过强化学习实现长上下文推理的大模型飞跃
    近年来,大型推理模型(Large Reasoning Models, LRMs)在数学、编程和逻辑推理等任务中 […]  ( 4 min )

  • Open

    Recent Academic Events
    With members of the discrete geometry seminar in Prague. The 2025 Colloquia in Combinatorics took place in QMUL and UCL, 7 and 8 May 2025. 2025 is the 18th year of the Colloquia in Combinatorics, and like every year, the … Continue reading →  ( 15 min )
  • Open

    Recent Academic Events
    With members of the discrete geometry seminar in Prague. The 2025 Colloquia in Combinatorics took place in QMUL and UCL, 7 and 8 May 2025. 2025 is the 18th year of the Colloquia in Combinatorics, and like every year, the … Continue reading →  ( 15 min )
  • Open

    生成扩散模型漫谈(三十):从瞬时速度到平均速度
    众所周知,生成速度慢是扩散模型一直以来的痛点,而为了解决这个问题,大家可谓“八仙过海,各显神通”,提出了各式各样的解决方案,然而长久以来并没一项工作能够脱颖而出,成为标配。什么样的工作能够达到这...  ( 7 min )

  • Open

    The Open Source Legacy and AI’s Licensing Challenge
    Open source licensing revolutionized software development, creating a thriving ecosystem built on shared innovation and collaboration. Licenses like MIT and Apache-2.0 gave developers a standard, legally robust way to share...  ( 40 min )
    Featured Sessions: Exploring Innovation at PyTorch Day China 2025
    Featured Sessions: Exploring Innovation at PyTorch Day China 2025 PyTorch Day China 2025, proudly hosted by the PyTorch Foundation, will take place on June 7 in Beijing, China collocated with...  ( 38 min )
  • Open

    Discrete Diffusion: Continuous-Time Markov Chains
    A tutorial explaining some key intuitions behind continuous time Markov chains for machine learners interested in discrete diffusion models: alternative representations, connections to point processes, and the memoryless property.  ( 9 min )
  • Open

    Loading Pydantic models from JSON without running out of memory
    You have a large JSON file, and you want to load the data into Pydantic. Unfortunately, this uses a lot of memory, to the point where large JSON files are very difficult to read. What to do? Assuming you’re stuck with JSON, in this article we’ll cover: The high memory usage you get with Pydantic’s default JSON loading. How to reduce memory usage by switching to another JSON library. Going further by switching to dataclasses with slots. Read more...  ( 4 min )

  • Open

    Accelerating GPU Performance with Triton: April 30th PyTorch ATX Event
    The PyTorch ATX Triton event, sponsored by Red Hat, was held on April 30, 2025, at the University of Texas. It was an exciting gathering focused on the Triton framework...  ( 34 min )
  • Open

    Quantum Duel in Prague
    Later this evening on 18:00 Prague time there will be a debate between Matthias Christandl and me, about the possibility of quantum computation. The debate will be live-streamed on YouTube. The beautiful poster was designed by Helena Gráfová. Click here … Continue reading →  ( 19 min )
  • Open

    Quantum Duel in Prague
    Later this evening on 18:00 Prague time there will be a debate between Matthias Christandl and me, about the possibility of quantum computation. The debate will be live-streamed on YouTube. The beautiful poster was designed by Helena Gráfová. Click here … Continue reading →  ( 19 min )
  • Open

    Advancing Gemini's security safeguards
    We’ve made Gemini 2.5 our most secure model family to date.  ( 6 min )
    Fuel your creativity with new generative media models and tools
    Introducing Veo 3 and Imagen 4, and a new tool for filmmaking called Flow.  ( 16 min )
    Our vision for building a universal AI assistant
    We’re extending Gemini to become a world model that can make plans and imagine new experiences by simulating aspects of the world.  ( 15 min )
    SynthID Detector — a new portal to help identify AI-generated content
    Learn about the new SynthID Detector portal we announced at I/O to help people understand how the content they see online was generated.  ( 15 min )
    Announcing Gemma 3n preview: Powerful, efficient, mobile-first AI
    Gemma 3n is a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio, empowering developers to build live, interactive applications and sophisticated audio-centric experiences.  ( 5 min )
    Gemini 2.5: Our most intelligent models are getting even better
    Gemini 2.5 Pro continues to be loved by developers as the best model for coding, and 2.5 Flash is getting even better with a new update. We’re bringing new capabilities to our models, including Deep Think, an experimental enhanced reasoning mode for 2.5 Pro.  ( 17 min )

  • Open

    Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models
    Alberto Testolin  ( 2 min )
    Statistical Guarantees for Link Prediction using Graph Neural Networks
    Alan Chung, Amin Saberi, Morgane Austern  ( 2 min )
    Learning from Time Series under Temporal Label Noise
    Sujay Nagaraj, Walter Gerych, Sana Tonekaboni, Anna Goldenberg, Berk Ustun, Thomas Hartvigsen  ( 2 min )
    Scaling laws for learning with real and surrogate data
    Ayush Jain, Andrea Montanari, Eren Sasoglu  ( 2 min )
    Simple online learning with consistent oracle
    Alexander Kozachinskiy, Tomasz Steifer  ( 2 min )

  • Open

    Classical Verification of Quantum Learning
    Matthias C. Caro, Marcel Hinsche, Marios Ioannou, Alexander Nietner, Ryan Sweke  ( 3 min )
    An Empirical Study of Self-supervised Learning with Wasserstein Distance
    Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai  ( 3 min )
    Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding
    Rob Brekelmans, Frank Nielsen  ( 2 min )
    Provably Efficient UCB-type Algorithms For Learning Predictive State Representations
    Ruiquan Huang, Yingbin Liang, Jing Yang  ( 2 min )
    Large Margin Mechanism and Pseudo Query Set on Cross-Domain Few-Shot Learning
    Jia-Fong Yeh, Hsin-Ying Lee, Bing-Chen Tsai, Yi-Rong Chen and Ping-Chia Huang, Winston H. Hsu  ( 2 min )
    Complete intersections in binomial and lattice ideals
    Hiram H. Lopez, Rafael H. Villarreal  ( 2 min )
    Projective nested cartesian codes
    Cicero Carvalho, V. G. Lopez Neumann, Hiram H. Lopez  ( 2 min )
    Affine cartesian codes
    Hiram H. Lopez, Carlos Renteria, Rafael H. Villarreal  ( 2 min )
    Parameterized affine codes
    Hiram H. Lopez, Eliseo Sarmiento, Maria Vaz Pinto, Rafael H. Villarreal  ( 2 min )
    Computing the degree of a lattice ideal of dimension one
    Hiram H. Lopez, Rafael H. Villarreal  ( 2 min )
    Complete intersection vanishing ideals on degenerate tori over finite fields
    Hiram H. Lopez, Rafael H. Villarreal, Leticia Zarate  ( 2 min )
    Rank distribution of Delsarte codes
    Javier de la Cruz, Elisa Gorla, Hiram H. Lopez, Alberto Ravagnani  ( 2 min )

  • Open

    On the combinatorics of commutators of Lie algebras
    Eduardo Hitomi, Felipe Yasumura  ( 2 min )
    Geometric properties of some totally ordered compact sets
    Mohammad Daher, Khalil Saadi  ( 2 min )
    In constructive and informal mathematics, in contradistinction to any empirical science, the predicate of the current knowledge in the subject is necessary
    Apoloniusz Tyszka  ( 3 min )
    Holographic Phase Transitions in (2+1)-dimensional black hole spacetimes in NMG
    E. Abdalla, Jeferson de Oliveira, A. B. Pavan, C. E. Pellicer  ( 2 min )
    Impulse control maximising average cost per unit time: a non-uniformly ergodic case
    Jan Palczewski, Lukasz Stettner  ( 2 min )
    Nilpotent orbit theorem in $p$-adic Hodge theory
    Mohammad Reza Rahmati, Gerardo Flores  ( 3 min )

  • Open

    From geometry to invertibility preservers
    Hans Havlicek, Peter \v{S}emrl  ( 2 min )
    Cayley's surface revisited
    Hans Havlicek  ( 2 min )
    On distant-isomorphisms of projective lines
    Andrea Blunck, Hans Havlicek  ( 2 min )
    Lifting of divisible designs
    Andrea Blunck, Hans Havlicek, Corrado Zanella  ( 2 min )
    Divisible designs from twisted dual numbers
    Andrea Blunck, Hans Havlicek, Corrado Zanella  ( 2 min )

  • Open

    Comparing Machine Learning Algorithms by Union-Free Generic Depth
    Hannah Blocher, Georg Schollmeyer, Malte Nalenz, Christoph Jansen  ( 2 min )
    Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI
    Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Jose Miguel Hernandez Lobato, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David R\"ugamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang  ( 2 min )
    Benefits of Transformer: In-Context Learning in Linear Regression Tasks with Unstructured Data
    Yue Xing, Xiaofeng Lin, Namjoon Suh, Qifan Song, Guang Cheng  ( 2 min )
    Efficient Exploration for LLMs
    Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy  ( 2 min )

  • Open

    A Purely Algebraic Approach to The Generalized Jacobian Conjecture
    Susumu Oda  ( 3 min )
    Mapping Exoplanets
    Nicolas B. Cowan, Yuka Fujii  ( 2 min )
    Atoms of None of the Elements Ionize While Atoms of Inert Behavior Split by Photonic Current
    Mubarak Ali  ( 3 min )

  • Open

    The Lieb-Thirring inequality revisited
    Rupert L. Frank, Dirk Hundertmark, Michal Jex, Phan Th\`anh Nam  ( 2 min )
    Algebraic approximations of compact K\"ahler threefolds
    Hsueh-Yung Lin  ( 2 min )
    On the dual positive cones and the algebraicity of a compact K\"ahler manifold
    Hsueh-Yung Lin  ( 2 min )
    Branching process descriptions of information cascades on Twitter
    James P. Gleeson, Tomokatsu Onaga, Peter Fennell, James Cotter, Raymond Burke, David J. P. O'Sullivan  ( 2 min )
    Tautological classes and symmetry in Khovanov-Rozansky homology
    Eugene Gorsky, Matthew Hogancamp, Anton Mellit  ( 2 min )
    Neural networks for geospatial data
    Wentao Zhan, Abhirup Datta  ( 2 min )
    Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing
    Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, Anima Anandkumar  ( 2 min )
    Effect of Weight Quantization on Learning Models by Typical Case Analysis
    Shuhei Kashiwamura, Ayaka Sakata, Masaaki Imaizumi  ( 2 min )
    Multiple Yield Curve Modeling and Forecasting using Deep Learning
    Ronald Richman, Salvatore Scognamiglio  ( 2 min )
    Unified Transfer Learning Models in High-Dimensional Linear Regression
    Shuo Shuo Liu  ( 2 min )

  • Open

    The sample complexity of multi-distribution learning
    Binghui Peng  ( 2 min )
    Computer Vision Self-supervised Learning Methods on Time Series
    Daesoo Lee, Erlend Aune  ( 2 min )
    A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics
    Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Vignesh Bhethanabotla, Nakul Rampal, Omar Yaghi, Christian Borgs, Anima Anandkumar, Hongyu Guo, Jennifer Chayes  ( 2 min )
    Adversarial Attacks on Graph Neural Networks via Meta Learning
    Daniel Z\"ugner, Stephan G\"unnemann  ( 2 min )
    Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
    Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Di He, Jingjing Xu, Zhi Zhang, Hongxia Yang, Liwei Wang  ( 2 min )

  • Open

    Rank Jumps and Growth of Shafarevich--Tate Groups for Elliptic Curves in $\mathbb{Z}/p\mathbb{Z}$-Extensions
    Lea Beneish, Debanjana Kundu, Anwesh Ray  ( 2 min )
    Categorical Koszul duality
    Julian Holstein, Andrey Lazarev  ( 2 min )
    Fast-HotStuff: A Fast and Resilient HotStuff Protocol
    Mohammad M. Jalalzai, Jianyu Niu, Chen Feng, Fangyu Gai  ( 2 min )
    Some Rigidity Theorem for Anosov Geodesic Flows
    \'Italo Dowell, Sergio Roma\~na  ( 2 min )
    Affine Anosov representations and proper actions
    Sourav Ghosh, Nicolaus Treib  ( 2 min )
    Signature Methods in Machine Learning
    Terry Lyons, Andrew D. McLeod  ( 3 min )
    ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
    Yi-Lin Sung, Jaehong Yoon, Mohit Bansal  ( 3 min )
    A multiobjective continuation method to compute the regularization path of deep neural networks
    Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz  ( 3 min )
    Understanding Disparities in Post Hoc Machine Learning Explanation
    Vishwali Mhasawade, Salman Rahman, Zoe Haskell-Craig, Rumi Chunara  ( 2 min )
    High-dimensional Functional Graphical Model Structure Learning via Neighborhood Selection Approach
    Boxin Zhao, Percy S. Zhai, Y. Samuel Wang, Mladen Kolar  ( 3 min )

  • Open

    Building a Nest by an Automaton
    Jurek Czyzowicz, Dariusz Dereniowski, Andrzej Pelc  ( 2 min )
    Non-fattening of mean curvature flow at singularities of mean convex type
    Or Hershkovits, Brian White  ( 2 min )
    RETRACTED: Yang-Mills theory for bundle gerbes
    Varghese Mathai, David Roberts  ( 2 min )
    A Two-Page "Derivation" of Schroedinger's Equation
    C. Baumgarten  ( 2 min )
    Quadratic fields, Artin-Schreier extensions, and Bell numbers
    Yoshinosuke Hirakawa  ( 2 min )
    CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks
    Andrei Tomut, Saeed S. Jahromi, Sukhbinder Singh, Faysal Ishtiaq, Cesar Mu\~noz, Prabdeep Singh Bajaj, Ali Elborady, Gianni del Bimbo, Mehrazin Alizadeh, David Montero, Pablo Martin-Ramiro, Muhammad Ibrahim, Oussama Tahiri Alaoui, John Malcolm, Samuel Mugel, Roman Orus  ( 2 min )
    Transfer Learning for Contextual Multi-armed Bandits
    Changxiao Cai, T. Tony Cai, Hongzhe Li  ( 2 min )
    At the junction between deep learning and statistics of extremes: formalizing the landslide hazard definition
    Ashok Dahal, Rapha\"el Huser, Luigi Lombardo  ( 3 min )
    A Systematic Approach to Robustness Modelling for Deep Convolutional Neural Networks
    Charles Meyers, Mohammad Reza Saleh Sedghpour, Tommy L\"ofstedt, Erik Elmroth  ( 3 min )
    A V2X-based Privacy Preserving Federated Measuring and Learning System
    Levente Alekszejenk\'o, Tadeusz Dobrowiecki  ( 2 min )

  • Open

    Probabilistic Demand Forecasting with Graph Neural Networks
    Nikita Kozodoi, Elizaveta Zinovyeva, Simon Valentin, Jo\~ao Pereira, Rodrigo Agundez  ( 2 min )
    Deep Latent Force Models: ODE-based Process Convolutions for Bayesian Deep Learning
    Thomas Baldwin-McDonald, Mauricio A. \'Alvarez  ( 2 min )
    Differentially Private Distributed Estimation and Learning
    Marios Papachristou, M. Amin Rahimian  ( 3 min )
    Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint
    Zhongjie Shi, Fanghui Liu, Yuan Cao, Johan A.K. Suykens  ( 3 min )
    Full Bayesian Significance Testing for Neural Networks
    Zehua Liu, Zimeng Li, Jingyuan Wang, Yue He  ( 2 min )

  • Open

    Deep multitask neural networks for solving some stochastic optimal control problems
    Christian Yeo  ( 2 min )
    Bayesian identification of nonseparable Hamiltonians with multiplicative noise using deep learning and reduced-order modeling
    Nicholas Galioto, Harsh Sharma, Boris Kramer, Alex Arkady Gorodetsky  ( 2 min )
    A Stability Principle for Learning under Non-Stationarity
    Chengpiao Huang, Kaizheng Wang  ( 2 min )
    Deep Neural Network Benchmarks for Selective Classification
    Andrea Pugnana, Lorenzo Perini, Jesse Davis, Salvatore Ruggieri  ( 2 min )
    Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target Annotation
    Shoaib Meraj Sami, Md Mahedi Hasan, Nasser M. Nasrabadi, Raghuveer Rao  ( 3 min )

  • Open

    Quantum-classical phase transition with spontaneous superposition breaking and single photon detection
    Vladan Pankovic  ( 3 min )
    Orbital period derivative of a binary system using an exact orbital energy equation
    Vikram H. Zaveri  ( 2 min )
    Periodic relativity: the theory of gravity in flat space time
    Vikram H. Zaveri  ( 3 min )
    Bayesian nonparametric modeling for mean residual life regression
    Valerie Poynor, Athanasios Kottas  ( 3 min )
    Collapsing 4-manifolds under a lower curvature bound
    Takao Yamaguchi  ( 2 min )

  • Open

    Braid group actions from categorical symmetric Howe duality on deformed Webster algebras
    Mikhail Khovanov, Aaron D. Lauda, Joshua Sussan, Yasuyoshi Yonezawa  ( 2 min )
    Characterizing Exoplanet Habitability
    Tyler D. Robinson  ( 2 min )

  • Open

    Labeling Neural Representations with Inverse Recognition
    Kirill Bykov, Laura Kopf, Shinichi Nakajima, Marius Kloft, Marina M.-C. H\"ohne  ( 2 min )
    Upper and lower bounds for the Lipschitz constant of random neural networks
    Paul Geuchen, Thomas Heindl, Dominik St\"oger, Felix Voigtlaender  ( 2 min )
    Training Neural Networks is NP-Hard in Fixed Dimension
    Vincent Froese, Christoph Hertrich  ( 2 min )
    Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression
    Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep Ravikumar  ( 3 min )
    Querying Easily Flip-flopped Samples for Deep Active Learning
    Seong Jin Cho, Gwangsu Kim, Junghyun Lee, Jinwoo Shin, Chang D. Yoo  ( 2 min )

  • Open

    Expected Utility Networks
    Pierfrancesco La Mura, Yoav Shoham  ( 2 min )
    Game Networks
    Pierfrancesco La Mura  ( 2 min )
    Projective Expected Utility
    Pierfrancesco La Mura  ( 2 min )
    Team Decision Problems with Classical and Quantum Signals
    Adam Brandenburger, Pierfrancesco La Mura  ( 2 min )
    Local regularity of the Bergman projection on a class of pseudoconvex domains of finite type
    Tran Vu Khanh, Andrew Raich  ( 2 min )
    Algebraic solutions of tropical optimization problems
    N. Krivulin  ( 2 min )

  • Open

    Tangent functor on microformal morphisms, and non-linear pullbacks for forms and cohomology
    Theodore Th. Voronov  ( 2 min )
    Universal transient behavior in large dynamical systems on networks
    Wojciech Tarnowski, Izaak Neri, Pierpaolo Vivo  ( 3 min )
    Geometric analysis of 1+1 dimensional quasilinear wave equations
    Leonardo Enrique Abbrescia, Willie Wai Yeung Wong  ( 2 min )
    Global Optimization of Gaussian processes
    Artur M. Schweidtmann, Dominik Bongartz, Daniel Grothe, Tim Kerkenhoff, Xiaopeng Lin, Jaromil Najman, Alexander Mitsos  ( 2 min )
    The Ramanujan-Petersson ans Selberg conjectures for Maass forms
    Andr'e Unterberger  ( 2 min )
    Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions
    Thorsten Gl\"usenkamp  ( 3 min )
    Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks
    Jing An, Jianfeng Lu  ( 2 min )
    Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via Mixed-Effect Models and Hierarchical Clustering
    Lorenzo Valleggi, Marco Scutari, Federico Mattia Stefanini  ( 3 min )
    Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models
    Naoki Egami, Musashi Hinck, Brandon M. Stewart, Hanying Wei  ( 3 min )
    Modeling Latent Selection with Structural Causal Models
    Leihao Chen, Onno Zoeter, Joris M. Mooij  ( 2 min )

  • Open

    Explaining Neural Networks without Access to Training Data
    Sascha Marton, Stefan L\"udtke, Christian Bartelt, Andrej Tschalzev, Heiner Stuckenschmidt  ( 2 min )
    Controlling Moments with Kernel Stein Discrepancies
    Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey  ( 2 min )
    A finite sample analysis of the benign overfitting phenomenon for ridge function estimation
    Emmanuel Caron, Stephane Chretien  ( 3 min )
    On the Query Complexity of Training Data Reconstruction in Private Learning
    Prateeti Mukherjee, Satya Lokam  ( 3 min )
    EC-NAS: Energy Consumption Aware Tabular Benchmarks for Neural Architecture Search
    Pedram Bakhtiarifard, Christian Igel, Raghavendra Selvan  ( 2 min )

  • Open

    An Explainable Stacked Ensemble Model for Static Route-Free Estimation of Time of Arrival
    S\"oren Schleibaum, J\"org P. M\"uller, Monika Sester  ( 3 min )
    ARMA Cell: A Modular and Effective Approach for Neural Autoregressive Modeling
    Philipp Schiele, Christoph Berninger, David R\"ugamer  ( 2 min )
    Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
    Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto  ( 2 min )
    GPEX, A Framework For Interpreting Artificial Neural Networks
    Amir Akbarnejad, Gilbert Bigras, Nilanjan Ray  ( 3 min )
    CP-PINNs: Changepoints Detection in PDEs using Physics Informed Neural Networks with Total-Variation Penalty
    Zhikang Dong, Pawel Polak  ( 2 min )

  • Open

    A Theoretical View of Linear Backpropagation and Its Convergence
    Ziang Li, Yiwen Guo, Haodi Liu, Changshui Zhang  ( 2 min )
    GANDALF: Gated Adaptive Network for Deep Automated Learning of Features
    Manu Joseph, Harsh Raj  ( 2 min )
    Hierarchical Correlation Clustering and Tree Preserving Embedding
    Morteza Haghir Chehreghani, Mostafa Haghir Chehreghani  ( 2 min )
    Adaptive joint distribution learning
    Damir Filipovic, Michael Multerer, Paul Schneider  ( 2 min )
    Generalized Optimistic Methods for Convex-Concave Saddle Point Problems
    Ruichen Jiang, Aryan Mokhtari  ( 3 min )

  • Open

    New bounds on the strength of some restrictions of Hindman's Theorem
    Lorenzo Carlucci, Leszek Aleksander Ko{\l}odziejczyk, Francesco Lepore, Konrad Zdanowski  ( 2 min )
    The zonoid algebra, generalized mixed volumes, and random determinants
    Paul Breiding, Peter B\"urgisser, Antonio Lerario, L\'eo Mathis  ( 2 min )
    Virasoro Constraints for Toric Bundles
    Tom Coates, Alexander Givental, Hsian-Hua Tseng  ( 2 min )
    Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition
    Hamidreza Kasaei, Songsong Xiong  ( 3 min )
    On the Evolution of A.I. and Machine Learning: Towards a Meta-level Measuring and Understanding Impact, Influence, and Leadership at Premier A.I. Conferences
    Rafael B. Audibert, Henrique Lemos, Pedro Avelar, Anderson R. Tavares, Lu\'is C. Lamb  ( 3 min )
    Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning
    Wenhan Xia, Chengwei Qin, Elad Hazan  ( 2 min )
    Optimal rates of approximation by shallow ReLU$^k$ neural networks and applications to nonparametric regression
    Yunfei Yang, Ding-Xuan Zhou  ( 2 min )
    Non-separable Covariance Kernels for Spatiotemporal Gaussian Processes based on a Hybrid Spectral Method and the Harmonic Oscillator
    Dionissios T.Hristopulos  ( 3 min )
    General-Purpose In-Context Learning by Meta-Learning Transformers
    Louis Kirsch, James Harrison, Jascha Sohl-Dickstein, Luke Metz  ( 2 min )
    Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes
    Hyunouk Ko, Xiaoming Huo  ( 2 min )
2025-06-17T03:19:40.856Z osmosfeed 1.15.1