osmos::feed

Open

“对角+低秩”三角阵的高效求逆方法

从文章《线性注意力简史：从模仿、创新到反哺》我们可以发现，DeltaNet及其后的线性Attention模型，基本上都关联到了逆矩阵$(\boldsymbol{I} + \boldsymbol{... ( 6 min )

Open

Bilingualism as a bonus for the brain

Is being bilingual good for your brain?Perhaps. Learning languages offers other, more concrete benefitsEconomist (6/27/25) Yes! I won't mince words. At least in my case, multilingualism has been very good for my brain. In my rural Ohio high school, I took Latin and French, which is what were on offer. I enjoyed both of them […] ( 10 min )
Open

a journal of the cherry year

Read La Maison des veilleurs (last part of Le Cycle de Syffe) by Patrick K. Dewdney, still a pleasure to read (and even more to complete) if a wee too much skewed towards self-inspection by the main character, Syffe, and some unnecessary and homely episode. (And with a profusion of characters that I have mostly […] ( 9 min )
Open

Your First Local LLM API Project in Python Step-By-Step

Interested in leveraging a large language model (LLM) API locally on your machine using Python and not-too-overwhelming tools frameworks? In this step-by-step article, you will set up a local API where you'll be able to send prompts to an LLM downloaded on your machine and obtain responses back.

Linear Layers and Activation Functions in Transformer Models

This post is divided into three parts; they are: • Why Linear Layers and Activations are Needed in Transformers • Typical Design of the Feed-Forward Network • Variations of the Activation Functions The attention layer is the core function of a transformer model.

Open

De pierre et d’os

Visit the post for more. ( 8 min )

Open

what the hell..?!

When seeing that Jorge Bergoglio’s autobiography, Hope, was part of the weekly book review in Nature, I was at the end of my Latin [customary French expression for it’s all Greek to me] as I could not get the point since the brief stint of his as a chemical technician did not sound like a […] ( 9 min )
Open

A Breakthrough of an Unusual Nature: the Media Control Symbol “Play” was Successfully Embedded into the London Skyline!

Three pictures showing that the media control symbol “play” was successfully embedded into London’s skyline 🙂 . The fourth rare picture is a screenshot with the control symbol “play” appearing side by side with its natural London demonstration. And here … Continue reading → ( 14 min )
Open

A Breakthrough of an Unusual Nature: the Media Control Symbol “Play” was Successfully Embedded into the London Skyline!

Three pictures showing that the media control symbol “play” was successfully embedded into London’s skyline 🙂 . The fourth rare picture is a screenshot with the control symbol “play” appearing side by side with its natural London demonstration. And here … Continue reading → ( 14 min )
Open

Computational phylogeny of Indo-European

Alexei S. Kassian and George Starostin, "Do 'language trees with sampled ancestors' really support a 'hybrid model' for the origin of Indo-European? Thoughts on the most recent attempt at yet another IE phylogeny". Humanities and Social Sciences Communications, 12, no. 682 (May 16, 2025). Abstract In this paper, we present a brief critical analysis of […] ( 13 min )

Open

Thalys static

[When returning from Brussels, after the privaCI workshop, I happened to sit in the Thalys train to Paris, next to three French (?) lobbyists (??) discussing very loudly about the specifics of their trade, apparently acting for the main agriculture union FNSEA (???) model of industrial and intensive agriculture at the European Commission or Parliement […] ( 9 min )
Open

Animal calls are not comparable to human speech

But can they still tell us something useful about language? Here are two new papers that address that question: I. "What the Hidden Rhythms of Orangutan Calls Can Tell Us about Language – New Research." De Gregorio, Chiara. The Conversation, May 27, 2025. In the dense forests of Indonesia, you can hear strange and haunting […] ( 11 min )
Open

7 AI Agent Frameworks for Machine Learning Workflows in 2025

Machine learning practitioners spend countless hours on repetitive tasks: monitoring model performance, retraining pipelines, data quality checks, and experiment tracking.

A Gentle Introduction to Attention Masking in Transformer Models

This post is divided into four parts; they are: • Why Attention Masking is Needed • Implementation of Attention Masks • Mask Creation • Using PyTorch's Built-in Attention In the <a href="https://machinelearningmastery.

Open

cont[rump]inuing [mag]attacks on reproductive rights

[From the Center for Reproductive Rights:] Cancelling the Emergency Medical Treatment and Labor Act (EMTALA), which was requiring hospitals to provide life-saving abortion care Blocking funding from Title X, a federal program dedicated to providing family planning and preventive reproductive health care services Supporting the Born-Alive Abortion Survivors Protection Act (H.R. 21) Removing information on […] ( 9 min )
Open

PyTorch + vLLM = ♥️

Key takeaways: PyTorch and vLLM are both critical to the AI ecosystem and are increasingly being used together for cutting-edge generative AI applications, including inference, post-training, and agentic systems at... ( 41 min )

FlagGems Joins the PyTorch Ecosystem: Triton-Powered Operator Library for Universal AI Acceleration

No content preview ( 39 min )

Presenting Flux Fast: Making Flux go brrr on H100s

In our earlier post, diffusion-fast, we showed how the Stable Diffusion XL (SDXL) pipeline can be optimized up to 3x using native PyTorch code. Back then, SDXL was an open... ( 39 min )
Open

Taiwanese Twosome: tea and Sino-Korean

Even if you can't understand spoken Taiwanese, you can learn a lot from these two videos because of the excellent visuals, plus it is nice just to hear the clearly spoken Taigi and compare terms in Taigi with their parallels in Sino-Korean. The first is a video from Taiwan's public TV (公視台語台) on the interesting distribution […] ( 14 min )

Unit utility

Today's xkcd: The mouseover title: "'This HAZMAT container contains radioactive material with activity of one becquerel.' 'So, like, a single banana slice?'" explainxkcd currently fails to explain the strip's implicit reference to the entry for bogosity in the Jargon File: 1. [orig. CMU, now very common] The degree to which something is bogus. Bogosity is measured […] ( 19 min )

Linguistics vs. archeology and (physical) anthropology

Subtitle: "A cautionary note on the application of limited linguistics studies to whole populations" A prefatory note on "anthropology". In the early 90s, I was deeply involved in the first ancient DNA studies on the Tarim mummies* with Paolo Francalacci, an anthropologist at the University of Sassari. Sardinia. Paolo was deputed to work with me […] ( 13 min )
Open

10 Essential Machine Learning Key Terms Explained

Artificial intelligence (AI) is an umbrella computer science discipline focused on building software systems capable of mimicking human or animal intelligence capabilities to solve a task.

Open

Nature snipets [17 April 2025]

From the Nature 17 April Issue: the “usual” wishful tribunes with no implementation spreadsheet (incentivizing bug tech companies to make the digital world safer, make quantum tech sustainable and ethical) Trump’s bull-in-a-China-shop attitude towards science and academia (US pullback from Antarctica, NSF halving PhD fellowships in 2025, reflecting on how the US became a science […] ( 9 min )
Open

Machines of Faithful Obedience

[Crossposted on LessWrong] Throughout history, technological and scientific advances have had both good and ill effects, but their overall impact has been overwhelmingly positive. Thanks to scientific progress, most people on earth live longer, healthier, and better than they did centuries or even decades ago. I believe that AI (including AGI and ASI) can do … Continue reading Machines of Faithful Obedience ( 23 min )
Open

Machines of Faithful Obedience

[Crossposted on LessWrong] Throughout history, technological and scientific advances have had both good and ill effects, but their overall impact has been overwhelmingly positive. Thanks to scientific progress, most people on earth live longer, healthier, and better than they did centuries or even decades ago. I believe that AI (including AGI and ASI) can do … Continue reading Machines of Faithful Obedience ( 23 min )
Open

Sergey Avvakumov and Alfredo Hubard Construct Cubical Spheres with Many Facets!

In this post, I discuss a remarkable new paper Cubulating the sphere with many facets by Sergey Avvakumov and Alfredo Hubard Abstract: For each we construct cube complexes homeomorphic to the -sphere with vertices in which the number of facets … Continue reading → ( 17 min )
Open

Sergey Avvakumov and Alfredo Hubard Construct Cubical Spheres with Many Facets!

In this post, I discuss a remarkable new paper Cubulating the sphere with many facets by Sergey Avvakumov and Alfredo Hubard Abstract: For each we construct cube complexes homeomorphic to the -sphere with vertices in which the number of facets … Continue reading → ( 17 min )
Open

Individual experiences and collective evidence

Jessica Dai on theory for the world as it could be
Open

Individual experiences and collective evidence

Jessica Dai on theory for the world as it could be
Open

Combining XGBoost and Embeddings: Hybrid Semantic Boosted Trees?

The intersection of traditional machine learning and modern representation learning is opening up new possibilities.
Open

Mi, mi, mi

[first draft written June 9-10, 2025 in Bemidji, Minnesota, where the famous giant statues of Paul Bunyan and Babe the Blue Ox stand next to beautiful Lake Bemidji*] During my peregrinations in upper midwest USA, I noticed a proliferation of place names beginning with "mi-". Because there are 10,000 big and little glacial lakes up […] ( 15 min )

Open

death on the Großglockner

Visit the post for more. ( 8 min )
Open

A Gentle Introduction to Multi-Head Latent Attention (MLA)

This post is divided into three parts; they are: • Low-Rank Approximation of Matrices • Multi-head Latent Attention (MLA) • PyTorch Implementation Multi-Head Attention (MHA) and Grouped-Query Attention (GQA) are the attention mechanisms used in almost all transformer models.

Converting Pandas DataFrames to PyTorch DataLoaders for Custom Deep Learning Model Training

Pandas DataFrames are powerful and versatile data manipulation and analysis tools.
Open

通过msign来计算奇异值裁剪mclip（下）

前面我们在《通过msign来计算奇异值裁剪mclip（上）》讨论了奇异值裁剪$\newcommand{mclip}{\mathop{\text{mclip}}}\mclip$的数值计算，核心思路... ( 7 min )

矩阵符号函数mcsgn能计算什么？

在《msign的导数》一文中，我们正式引入了两种矩阵符号函数$\newcommand{msign}{\mathop{\text{msign}}}\msign$和$\newcommand{mcsg... ( 6 min )
Open

"… and its launch it got."

There are several different types of "fronting" or "preposing" in English, sometimes categorized in syntactic terms (e.g. wh-movement) and sometimes in pragmatic terms (e.g. topicalization). Here's recent example of a familiar type, for which I don't know a standard name: The stage was set for Tesla to get its launch, and its launch it got. […] ( 10 min )

Open

THAMES for mixtures, a reply from the authors

[Here is a reply to my comments on THAMES sent by the first author of the paper, Martin Metodiev. The above replica of the cover of Rivers of London is obviously unrelated with the reply or the original blog, beyond presenting a fantasy map of the Thames!] Thank you for your review of our article! […] ( 10 min )

Pegasus bridge half marathon [1:26:26, 114/5227, 1/109 M5M and first over 55]

Tough race when compared with last year despite good weather conditions (if quite warm), and bagpipes at the start, maybe due to overtraining. I barely managed to keep my first position in the M5M category, by a mere 2s, not that I was aware of the threat to my crown! The briefly leased Hoka One […] ( 9 min )
Open

The importance of rhythm for memorization

My wonderful 2nd grade teacher taught me how to spell Mississippi with a special sing-song rhythm, and I've never forgotten it thereafter. Her jingle makes spelling "Mississippi" — whose shape is as contorted as its riverine course and scared me the first few times I tried to spell it myself, before she taught me the […] ( 19 min )

Open

A cautionary note on the application of limited genetics studies to whole populations

"Unraveling the origins of the sogdians: Evidence of genetic admixture between ancient central and East Asians", Jiashuo Zhang, Yongdi Wang, Naifan Zhang, Jiawei Li, Youyang Qu, Cunshi Zhu, Fan Zhang, Dawei Cai, and Chao Ning, Journal of Archaeological Science: Reports (Volume 61, February 2025, 104957) Highlights: Genome-wide data was generated for two individuals from a […] ( 11 min )

Open

BayesComp 2025.4

The third and final day of the (main) conference started tih Emtiyaz Khan’s plenary talk on adaptive Bayesian intelligence. Or, imho, [adaptive [Bayesian]] intelligence, with the brackets indicating redundancy since intelligence need include adaptivity and [intelligent] adaptivity need proceed in a Bayesian way! Focussing first on the Bayesian learning rule via variational Bayes (with a […] ( 11 min )
Open

Fault Tolerant Llama: training with 2000 synthetic failures every ~15 seconds and no checkpoints on Crusoe L40S

Collaborators: Less Wright, Howard Huang, Chien-Chin Huang, Crusoe: Martin Cala, Ethan Petersen tl;dr: we used torchft and torchtitan to train a model in a real-world environment with extreme synthetic failure... ( 45 min )
Open

"AI" == "vehicle"?

Back in March, the AAAI ("Association for the Advancement of Artificial Intelligence") published an "AAAI Presidential Panel Report on the Future of AI Research": The AAAI 2025 presidential panel on the future of AI research aims to help all AI stakeholders navigate the recent significant transformations in AI capabilities, as well as AI research methodologies, […] ( 12 min )

Incredulous, incredible, whatever. . .

I thought this use of incredulous in a recent Forbes article was a malapropism for incredible: If you thought that my May 23 report, confirming the leak of login data totaling an astonishing 184 million compromised credentials, was frightening, I hope you are sitting down now. Researchers have just confirmed what is also certainly the […] ( 14 min )

Bopomofo Cafe

Chris Button saw this bubble tea place at 3:45 PM today in Hollywood: From the cafe's website: BOPOMOFO CAFE draws its name from the phonetic Traditional Chinese Alphabets. ㄅ, ㄆ, ㄇ, and ㄈ [bo, po, mo, and fo] are the “ABCs” of the Mandarin Chinese alphabet symbolizing nostalgia and strength as the building blocks of Mandarin […] ( 11 min )
Open

One out of five AI researchers

To figure out the purpose of forecasting, I put on my Dan Davies hat and ask, “What do forecasts do?”
Open

One out of five AI researchers

To figure out the purpose of forecasting, I put on my Dan Davies hat and ask, “What do forecasts do?”
Open

Beyond GridSearchCV: Advanced Hyperparameter Tuning Strategies for Scikit-learn Models

Ever felt like trying to find a needle in a haystack? That’s part of the process of building and optimizing machine learning models, particularly complex ones like ensembles and neural networks, where several hyperparameters need to be manually set by us before training them.
Open

On Coping with the War — and a 1931 Postcard from Akitsugu Kawaguchi to Abraham Fraenkel

One question that I came across on social media (paraphrased here) was: How can you celebrate colleagues’ birthdays or attend conferences while the terrible war that began on October 7 — marked by senseless horror, death, and destruction in both … Continue reading → ( 15 min )
Open

On Coping with the War — and a 1931 Postcard from Akitsugu Kawaguchi to Abraham Fraenkel

One question that I came across on social media (paraphrased here) was: How can you celebrate colleagues’ birthdays or attend conferences while the terrible war that began on October 7 — marked by senseless horror, death, and destruction in both … Continue reading → ( 15 min )
Open

线性注意力简史：从模仿、创新到反哺

在中文圈，本站应该算是比较早关注线性Attention的了，在2020年写首篇相关博客《线性Attention的探索：Attention必须有个Softmax吗？》时，大家主要讨论的还是BERT... ( 8 min )

Open

PyTorch Docathon 2025: Wrap Up

Huge congratulations and a massive thank you to all the amazing participants of the PyTorch Docathon 2025! Over the past two weeks (June 3rd-15th), our virtual Docathon brought together over... ( 37 min )
Open

Probability Is Only A Game

Ruthless subjective probability without Bayes' Rule
Open

Probability Is Only A Game

Ruthless subjective probability without Bayes' Rule
Open

Shakhar Smorodinsky’s Solution to a Radon-Type Problem

A brief update: Since Friday June 13 Israel has been engaged in a direct war with Iran. This follows two major missiles attacks of Iran against Israel in April and October 2024, as well as Iran’s central role in the … Continue reading → ( 15 min )
Open

Shakhar Smorodinsky’s Solution to a Radon-Type Problem

A brief update: Since Friday June 13 Israel has been engaged in a direct war with Iran. This follows two major missiles attacks of Iran against Israel in April and October 2024, as well as Iran’s central role in the … Continue reading → ( 15 min )

Open

DeepNVMe: Affordable I/O scaling for Deep Learning Applications

Introduction We introduced DeepNVMe in summer 2024 as a suite of optimizations for tackling I/O bottlenecks in Deep Learning (DL). DeepNVMe delivers significant speedups for I/O bound DL workloads by leveraging storage... ( 41 min )
Open

Restatements or Forecasts?

A slightly more sophisticated example of Defensive Forecasting.
Open

Restatements or Forecasts?

A slightly more sophisticated example of Defensive Forecasting.

Open

death of the artist (Sebastião Salgado, 1944-2025)

Visit the post for more. ( 8 min )
Open

In Defense of Defensive Forecasting

Is superforecasting just sneaky accounting?
Open

In Defense of Defensive Forecasting

Is superforecasting just sneaky accounting?
Open

Unicode CJK Unified Ideographs Extension J and the nature of the sinographic writing system

Submitted by Charles Belov: I've been browsing through the proposed Unicode 17 changes, currently undergoing a comment period, with interest. While I don't have the knowledge to intelligently comment on the proposals, it's good to see that they are actively improving language access. I'm puzzled that some new characters have been added to the existing […] ( 11 min )
Open

How to Combine Scikit-learn, CatBoost, and SHAP for Explainable Tree Models

Machine learning workflows often involve a delicate balance: you want models that perform exceptionally well, but you also need to understand and explain their predictions.

Open

off to Singapore (BayesComp 2025)

Visit the post for more. ( 8 min )
Open

Dungan radio broadcasts from 2018-2021

We've talked about Dungan a lot on Language Log. That's the northwest Sinitic topolect written in Cyrillic that has been transplanted to Central Asia. See "Selected readings" below. For those of you who are interested and would like to hear what it sounds like in real life — spoken and sung by male and female […] ( 11 min )
Open

Positional Encodings in Transformer Models

This post is divided into five parts; they are: • Understanding Positional Encodings • Sinusoidal Positional Encodings • Learned Positional Encodings • Rotary Positional Encodings (RoPE) • Relative Positional Encodings Consider these two sentences: "The fox jumps over the dog" and "The dog jumps over the fox".

Open

Pegasus bridge half [37ième]

Visit the post for more. ( 8 min )
Open

Conversation with a Chinese restaurateur in a west central Mississippi town

Running down the road in Clarksdale, Mississippi, I screeched to a halt (felt like Rroad Runner) when I passed by a Chinese restaurant with the odd name Rice Bowl (in Chinese it was Fànwǎn lóu 饭碗楼 — the only characters I saw on the premises). It was a tiny, nondescript establishment, with six or so […] ( 17 min )

Open

2025 IMS International Conference on Statistics and Data Science in Sevilla [Dec. 15-18]

Visit the post for more. ( 8 min )
Open

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

The field of large language models is shifting toward lower-precision computation. This shift necessitates a rethinking of scaling laws to account for the effects of quantization on resulting quantized model... ( 42 min )
Open

Persian language in the Indian subcontinent

That's the title of a valuable Wikipedia article. I have no idea who wrote it, but I'm very glad to have access to this comprehensive article, since it touches on so many topics that concern my ongoing research. Here are some highlights: Before British colonisation, the Persian language was the lingua franca of the Indian […] ( 13 min )

Plato's cave

The first two panels from SMBC a few days ago: The rest of the strip: The aftercomic: The mouseover title: "Would you rather sit with friends watching shadows on the bigscreen or spending your time arguing with Plato about whether poetry should be legal?" This expands on the 9/9/2015 SMBC: Wikipedia explains the Allegory of […] ( 9 min )

The linguistic pragmatics of LLMs

"Does GPT-4 Surpass Human Performance in Linguistic Pragmatics?" Bojic, Ljubiša et al. Humanities and Social Sciences Communications 12, no. 1 (June 10, 2025). Ljubiša Bojić, Predrag Kovačević, & Milan Čabarkapa. Humanities and Social Sciences Communications volume 12, Article number: 794 (2025) Cite this article Abstract As Large Language Models (LLMs) become increasingly integrated into everyday life as general-purpose […] ( 10 min )
Open

Strunk and White for Science

Validity as a style guide for telling stories about correlations
Open

Strunk and White for Science

Validity as a style guide for telling stories about correlations
Open

Behind “ANCESTRA”: combining Veo with live-action filmmaking

We partnered with Darren Aronofsky, Eliza McNitt and a team of more than 200 people to make a film using Veo and live-action filmmaking. ( 16 min )
Open

Advanced Feature Engineering Using Scikit-Learn Pipelines with Pandas’ ColumnTransformer and NumPy Arrays

Pandas , NumPy , and Scikit-learn .
Open

msign的导数

这篇文章我们来推导$\newcommand{msign}{\mathop{\text{msign}}}\msign$算子的求导公式。如果读者想要像《Test-Time Training Done... ( 6 min )

Open

a journal of the chaos (en cuisine) year

Read La Fille du Grand Hiver (The Daughter of the Great Winter) by Isabelle Autissier (also a sailor who was the first woman to complete a solo world race in 1991). This is a novelised version of the story of Arnarulunguaq, who accompanied Knud Rasmussen on the Fifth Thule Expedition over several years, all the […] ( 10 min )
Open

A 12th-century influencer

From Ada Palmer, "Inventing the Renaissance: The Myth of a Golden Age": The new scholastic method was so exciting! that when Peter Abelard got kicked out of his monastery (for proving its founding saint didn’t exist—that pissed off the abbot, who’d have guessed?) and went to live as a hermit in the wilderness of Champagne, […] ( 11 min )

Boop?

The latest xkcd: Mouseover title: "With a good battery, the device can easily last for 5 or 10 years, although the walls probably won't." The joke worked for me, although I was pretty sure that a (current) MacBook makes no sound when a usb device connects. I checked, and that's true. A current Windows 11 […] ( 10 min )

The grammar and sense of a poetic line

Randy Alexander is not a professional Sinologist, but when it comes to reading Chinese poetry, he's as serious as one can be. The following poem is by Du Fu (712-770), said by some to be "China's greatest poet". In the presentation below, I will first give the text with its transcription, and then Randy's translation. […] ( 17 min )
Open

Step-by-Step Guide to Deploying Machine Learning Models with FastAPI and Docker

You've trained your machine learning model, and it's performing great on test data.
Open

Mistral Magistral：纯强化学习炼就的推理引擎，颠覆LLM训练范式

无需蒸馏、抛弃SFT，Mistral用纯强化学习在数学与代码推理任务上实现50%性能飞跃近日，Mistral […] ( 4 min )

小红书dots.llm1：重新定义MoE效率边界，14B激活参数挑战72B密集模型极限

核心突破：极简激活的超级大脑三大技术支柱撑起SOTA表现 1. 数据工程：11.2T高质量token的炼金术 […] ( 4 min )
Open

A Few Announcements

Trevisan Prize 2025 Here is a call for nominations for a new theoretical computer science prize, in memory of Luca Trevisan. (h/t Alon Rosen.) Three Near Future Events at HUJI While the Erdős Lectures 2025 given by Mehtaab S. Sawhney, … Continue reading → ( 14 min )
Open

A Few Announcements

Trevisan Prize 2025 Here is a call for nominations for a new theoretical computer science prize, in memory of Luca Trevisan. (h/t Alon Rosen.) Three Near Future Events at HUJI While the Erdős Lectures 2025 given by Mehtaab S. Sawhney, … Continue reading → ( 14 min )

Open

[fool’s] gold standard science

In this new presidential order of 23 May 2025, Trump pretends to “restore the scientific integrity policies of my first Administration and ensures that agencies practice data transparency, acknowledge relevant scientific uncertainties, are transparent about the assumptions and likelihood of scenarios used, approach scientific findings objectively, and communicate scientific data accurately” repeating his goal in […] ( 10 min )
Open

Milton Friedman's p-values

Remind me what happens when a measure becomes a target.
Open

Milton Friedman's p-values

Remind me what happens when a measure becomes a target.
Open

Implementing Vector Search from Scratch: A Step-by-Step Tutorial

There’s no doubt that search is one of the most fundamental problems in computing.

Open

exceptional OWABI web/sem’inar [19 June, BayesComp²⁵]

Exceptionally, the next One World Approximate Bayesian Inference (OWABI) Seminar will be hybrid as it is scheduled to take place during BayesComp 2025 in Singapore, on Thursday 19 June at 8pm Singapore time (1pm in Tórshavn) and two talks, one by Filippo Pagani on Approximate Bayesian Fusion Bayesian Fusion is a powerful approach that enables […] ( 10 min )
Open

How to Optimize Language Model Size for Deployment

The rise of language models, and more specifically large language models (LLMs), has been of such a magnitude that it has permeated every aspect of modern AI applications — from chatbots and search engines to enterprise automation and coding assistants.

Open

death of a benefactor of humanity (Etienne-Emile Baulieu, 1926-2025)

Visit the post for more. ( 9 min )
Open

HuggingFace Safetensors Support in PyTorch Distributed Checkpointing

Summary PyTorch Distributed Checkpointing (DCP) is making investments into addressing the interoperability blockers to ensure that popular formats, like HuggingFace safetensors, can work well with PyTorch’s ecosystem. Since HuggingFace has... ( 38 min )
Open

The agonies of an ABC learning Chinese

As most readers of Language Log know, ABC means "American-born Chinese". Depending upon how (in)sensitive their parents are, learning Chinese can be hell, and leave them scarred for life. The actors in this video are brilliant and the tale it tells reveals so much about the trials and pitfalls of learning Chinese overseas. If only […] ( 10 min )

"The girls are fighting"

The news has been full of the Musk-Trump feud. Among the linguistic aspects, there's an interesting amount of explicit or implied gender association — here's Alexandria Ocasio-Cortez in a memic clip widely linked on social media: Your browser does not support the video tag. From the other end of the political spectrum, check out Nellie […] ( 11 min )

Buena

Following up on the issue of English spelling variation, this picture has been making the rounds on social media: I thought of it when I was reminded that the New Jersey borough of Buena is pronounced /ˈbjuːnə/ — so that the first syllable is the same as the first syllable of beauty. It's not clear […] ( 13 min )

The gender of gender

For English speakers, a mind-boggling letter to the editor on linguistic gender from the Times Literary Supplement (3/9/25): Masculine and feminine In Cristina Rivera Garza’s Death Takes Me, reviewed by Lucy Popescu (In Brief, April 18), a character points out that “in Spanish, the word victim, or victima, is always feminine”. This is evidently true, but […] ( 12 min )
Open

通过msign来计算mclip（奇异值裁剪）

前面我们用了两篇文章《msign算子的Newton-Schulz迭代（上）》和《msign算子的Newton-Schulz迭代（下）》讨论了矩阵的$\newcommand{msign}{\mat... ( 6 min )
Open

Dealing with Missing Data Strategically: Advanced Imputation Techniques in Pandas and Scikit-learn

Missing values appear more often than not in many real-world datasets.
Open

Qwen3 Embedding 技术解析：多语言文本嵌入与重排序的新标杆

阿里巴巴通义实验室发布的 Qwen3 Embedding 系列模型在文本嵌入（Embedding）和重排序（R […] ( 4 min )

Open

optimal importance sampling for stochastic optimisation

A recent arXival by Liviu Aolaritei, Bart Van Parys, Henry Lam, and Michael Jordan (a co-PI in our ERC Synergy Ocean project) discusses optimal importance sampling schemes for stochastic optimisation, processed by an iterative Robbins-Munro algorithm improvement (with the Polyak-Ruppert improvement). “Despite its popularity, IS is often described as a `double-edged sword.’ Its performance depends […] ( 10 min )
Open

Introducing the PyTorch Ecosystem Working Group and Project Spotlights

The PyTorch Ecosystem goes back several years, with some of its earliest projects like Hugging Face, Fast.ai, and PyTorch Lightning going on to grow incredible communities of their own. The... ( 41 min )
Open

"Artificial Intelligence and its evil twin, Darwinism"

In Daniel Dennett's 1995 book Darwin's Dangerous Idea: Evolution and the Meanings of Life, the chapter titled "Chomsky contra Darwin, Four Episodes" ends with this provocative sentence: The hostility to Artificial Intelligence and its evil twin, Darwinism, lies just beneath the surface of much of the most influential work in recent twentieth-century philosophy. What Dennett […] ( 14 min )

"A tricky little area of semantics"

Elizabeth Ribbens, "How the use of a word in the Guardian has gotten some readers upset", The Guardian 6/4/2025: ‘Got’ was changed during the editing of an opinion piece, leading to correspondence lamenting a slide into American English. But language isn’t a fortress. In Shakespeare’s Henry VI, Part II, a messenger breathlessly announces to the […] ( 13 min )
Open

Loss Functions Explained: Understand the Maths in Just 2 Minutes Each

I must say, with the ongoing hype around machine learning, a lot of people jump straight to the application side without really understanding how things work behind the scenes.

10 MLOps Tools for Machine Learning Practitioners to Know

Machine learning is not just about building models.
Open

The Open Marketplace of Ideas

If peer review only refers to papers, it is not defensible
Open

The Open Marketplace of Ideas

If peer review only refers to papers, it is not defensible
Open

msign算子的Newton-Schulz迭代（下）

在上文《msign算子的Newton-Schulz迭代（上）》中，我们试图为$\mathop{\text{msign}}$算子寻找更好的Newton-Schulz迭代，以期在有限迭代步数内能达到... ( 8 min )
Open

Erdős Lectures 2025: Mehtaab S. Sawhney, June 5,9 & 11

(Click to enlarge) Today, at 14:30, Monday June 9 at 11:00 and Wednesday June 11, at 11:20 Mehtaab S. Sawhney will deliver the 2005 Erdos lecture. Three talks, each representing a monumental achievement! ( 14 min )
Open

Erdős Lectures 2025: Mehtaab S. Sawhney, June 5,9 & 11

(Click to enlarge) Today, at 14:30, Monday June 9 at 11:00 and Wednesday June 11, at 11:20 Mehtaab S. Sawhney will deliver the 2005 Erdos lecture. Three talks, each representing a monumental achievement! ( 14 min )

Open

introduction to Bayesian methods for the social sciences (18-22 Aug, Università della Svizzera italiana, Lugano)

Visit the post for more. ( 9 min )
Open

Mapping the exposome

More than 20 years ago, I posted about the explosion of -ome and -omic words in biology: "-ome is where the heart is", 10/27/2004. I listed more than 40 examples: behaviourome, cellome, clinome, complexome, cryptome, crystallome, ctyome, degradome, enzymome,epigenome, epitome, expressome, fluxome, foldome, functome, glycome, immunome, ionome, interactome, kinome, ligandome, localizome, metallome, methylome, morphome, nucleome, […] ( 11 min )
Open

Open Source AI is Transforming the Economy—Here’s What the Data Shows

No content preview ( 40 min )

Build Responsible AI Products with your own Yellow Teaming LLM

The tools we use to build AI are evolving fast, with PyTorch at the heart of many advances. But unless we evolve the way we approach building AI systems, we... ( 43 min )
Open

NumPy Ninjutsu: Mastering Array Operations for High-Performance Machine Learning

Machine learning workflows typically involve plenty of numerical computations in the form of mathematical and algebraic operations upon data stored as large vectors, matrices, or even tensors — matrix counterparts with three or more dimensions.

Open

a reason why the Dept of Education is needed (and so are independent universities)

Visit the post for more. ( 9 min )
Open

Acronymomania, part 2

A brief collection of "Chinese words for Adults!", with the last one being "KPI", which I had to look up in English. Posted by UFL – University Of Foreign Languages – LE on Monday, May 26, 2025 A performance indicator or key performance indicator (KPI) is a type of performance measurement. KPIs evaluate the success of an organization or of a particular activity […] ( 10 min )
Open

Advanced audio dialog and generation with Gemini 2.5

Gemini 2.5 has new capabilities in AI-powered audio dialog and generation. ( 14 min )
Open

A Defense of Peer Review

Wait, what?
Open

A Defense of Peer Review

Wait, what?
Open

Ethereum Foundation Talk and Conversation: A Critical View on Quantum Computing & A geometry day honoring Micha Sharir

Ethereum Foundation talk, today This afternoon (Tuesday, June 3, 2025) at 17:00 Israel time I give a zoom lecture on A Critical View on Quantum Computing. The lecture is hosted by the Ethereum Foundation and the 90 minute events will … Continue reading → ( 15 min )
Open

Ethereum Foundation Talk and Conversation: A Critical View on Quantum Computing & A geometry day honoring Micha Sharir

Ethereum Foundation talk, today This afternoon (Tuesday, June 3, 2025) at 17:00 Israel time I give a zoom lecture on A Critical View on Quantum Computing. The lecture is hosted by the Ethereum Foundation and the 90 minute events will … Continue reading → ( 15 min )
Open

10 Python One-Liners That Will Simplify Feature Engineering

Feature engineering is a key process in most data analysis workflows, especially when constructing machine learning models.

Open

bridging ratio estimators

Visit the post for more. ( 9 min )
Open

Word Embeddings in Language Models

This post is divided into three parts; they are: • Understanding Word Embeddings • Using Pretrained Word Embeddings • Training Word2Vec with Gensim • Training Word2Vec with PyTorch • Embeddings in Transformer Models Word embeddings represent words as dense vectors in a continuous space, where semantically similar words are positioned close to each other.
Open

等值振荡定理：最优多项式逼近的充要条件

最近在阅读时，遇到了一个关于最优多项式逼近的“等值振荡定理（Equioscillation Theorem）”，证明过程还涉及到无穷范数求导，感觉结论和证明都颇为新奇，特来记录一番。参考资料：《... ( 7 min )

Open

Marseille mais non Marseille!

Visit the post for more. ( 9 min )

Open

Advancing Gemini's security safeguards

We’ve made Gemini 2.5 our most secure model family to date. ( 6 min )

Fuel your creativity with new generative media models and tools

Introducing Veo 3 and Imagen 4, and a new tool for filmmaking called Flow. ( 16 min )

Our vision for building a universal AI assistant

We’re extending Gemini to become a world model that can make plans and imagine new experiences by simulating aspects of the world. ( 15 min )

SynthID Detector — a new portal to help identify AI-generated content

Learn about the new SynthID Detector portal we announced at I/O to help people understand how the content they see online was generated. ( 15 min )

Announcing Gemma 3n preview: Powerful, efficient, mobile-first AI

Gemma 3n is a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio, empowering developers to build live, interactive applications and sophisticated audio-centric experiences. ( 5 min )

Gemini 2.5: Our most intelligent models are getting even better

Gemini 2.5 Pro continues to be loved by developers as the best model for coding, and 2.5 Flash is getting even better with a new update. We’re bringing new capabilities to our models, including Deep Think, an experimental enhanced reasoning mode for 2.5 Pro. ( 17 min )

Open

Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models

Alberto Testolin ( 2 min )

Statistical Guarantees for Link Prediction using Graph Neural Networks

Alan Chung, Amin Saberi, Morgane Austern ( 2 min )

Learning from Time Series under Temporal Label Noise

Sujay Nagaraj, Walter Gerych, Sana Tonekaboni, Anna Goldenberg, Berk Ustun, Thomas Hartvigsen ( 2 min )

Scaling laws for learning with real and surrogate data

Ayush Jain, Andrea Montanari, Eren Sasoglu ( 2 min )

Simple online learning with consistent oracle

Alexander Kozachinskiy, Tomasz Steifer ( 2 min )

Open

Classical Verification of Quantum Learning

Matthias C. Caro, Marcel Hinsche, Marios Ioannou, Alexander Nietner, Ryan Sweke ( 3 min )

An Empirical Study of Self-supervised Learning with Wasserstein Distance

Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai ( 3 min )

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

Rob Brekelmans, Frank Nielsen ( 2 min )

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

Ruiquan Huang, Yingbin Liang, Jing Yang ( 2 min )

Large Margin Mechanism and Pseudo Query Set on Cross-Domain Few-Shot Learning

Jia-Fong Yeh, Hsin-Ying Lee, Bing-Chen Tsai, Yi-Rong Chen and Ping-Chia Huang, Winston H. Hsu ( 2 min )

Complete intersections in binomial and lattice ideals

Hiram H. Lopez, Rafael H. Villarreal ( 2 min )

Projective nested cartesian codes

Cicero Carvalho, V. G. Lopez Neumann, Hiram H. Lopez ( 2 min )

Affine cartesian codes

Hiram H. Lopez, Carlos Renteria, Rafael H. Villarreal ( 2 min )

Parameterized affine codes

Hiram H. Lopez, Eliseo Sarmiento, Maria Vaz Pinto, Rafael H. Villarreal ( 2 min )

Computing the degree of a lattice ideal of dimension one

Hiram H. Lopez, Rafael H. Villarreal ( 2 min )

Complete intersection vanishing ideals on degenerate tori over finite fields

Hiram H. Lopez, Rafael H. Villarreal, Leticia Zarate ( 2 min )

Rank distribution of Delsarte codes

Javier de la Cruz, Elisa Gorla, Hiram H. Lopez, Alberto Ravagnani ( 2 min )

Open

On the combinatorics of commutators of Lie algebras

Eduardo Hitomi, Felipe Yasumura ( 2 min )

Geometric properties of some totally ordered compact sets

Mohammad Daher, Khalil Saadi ( 2 min )

In constructive and informal mathematics, in contradistinction to any empirical science, the predicate of the current knowledge in the subject is necessary

Apoloniusz Tyszka ( 3 min )

Holographic Phase Transitions in (2+1)-dimensional black hole spacetimes in NMG

E. Abdalla, Jeferson de Oliveira, A. B. Pavan, C. E. Pellicer ( 2 min )

Impulse control maximising average cost per unit time: a non-uniformly ergodic case

Jan Palczewski, Lukasz Stettner ( 2 min )

Nilpotent orbit theorem in $p$-adic Hodge theory

Mohammad Reza Rahmati, Gerardo Flores ( 3 min )

Open

From geometry to invertibility preservers

Hans Havlicek, Peter \v{S}emrl ( 2 min )

Cayley's surface revisited

Hans Havlicek ( 2 min )

On distant-isomorphisms of projective lines

Andrea Blunck, Hans Havlicek ( 2 min )

Lifting of divisible designs

Andrea Blunck, Hans Havlicek, Corrado Zanella ( 2 min )

Divisible designs from twisted dual numbers

Andrea Blunck, Hans Havlicek, Corrado Zanella ( 2 min )

Open

Comparing Machine Learning Algorithms by Union-Free Generic Depth

Hannah Blocher, Georg Schollmeyer, Malte Nalenz, Christoph Jansen ( 2 min )

Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI

Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Jose Miguel Hernandez Lobato, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David R\"ugamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang ( 2 min )

Benefits of Transformer: In-Context Learning in Linear Regression Tasks with Unstructured Data

Yue Xing, Xiaofeng Lin, Namjoon Suh, Qifan Song, Guang Cheng ( 2 min )

Efficient Exploration for LLMs

Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy ( 2 min )

Open

A Purely Algebraic Approach to The Generalized Jacobian Conjecture

Susumu Oda ( 3 min )

Mapping Exoplanets

Nicolas B. Cowan, Yuka Fujii ( 2 min )

Atoms of None of the Elements Ionize While Atoms of Inert Behavior Split by Photonic Current

Mubarak Ali ( 3 min )

Open

The Lieb-Thirring inequality revisited

Rupert L. Frank, Dirk Hundertmark, Michal Jex, Phan Th\`anh Nam ( 2 min )

Algebraic approximations of compact K\"ahler threefolds

Hsueh-Yung Lin ( 2 min )

On the dual positive cones and the algebraicity of a compact K\"ahler manifold

Hsueh-Yung Lin ( 2 min )

Branching process descriptions of information cascades on Twitter

James P. Gleeson, Tomokatsu Onaga, Peter Fennell, James Cotter, Raymond Burke, David J. P. O'Sullivan ( 2 min )

Tautological classes and symmetry in Khovanov-Rozansky homology

Eugene Gorsky, Matthew Hogancamp, Anton Mellit ( 2 min )

Neural networks for geospatial data

Wentao Zhan, Abhirup Datta ( 2 min )

Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing

Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, Anima Anandkumar ( 2 min )

Effect of Weight Quantization on Learning Models by Typical Case Analysis

Shuhei Kashiwamura, Ayaka Sakata, Masaaki Imaizumi ( 2 min )

Multiple Yield Curve Modeling and Forecasting using Deep Learning

Ronald Richman, Salvatore Scognamiglio ( 2 min )

Unified Transfer Learning Models in High-Dimensional Linear Regression

Shuo Shuo Liu ( 2 min )

Open

The sample complexity of multi-distribution learning

Binghui Peng ( 2 min )

Computer Vision Self-supervised Learning Methods on Time Series

Daesoo Lee, Erlend Aune ( 2 min )

A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics

Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Vignesh Bhethanabotla, Nakul Rampal, Omar Yaghi, Christian Borgs, Anima Anandkumar, Hongyu Guo, Jennifer Chayes ( 2 min )

Adversarial Attacks on Graph Neural Networks via Meta Learning

Daniel Z\"ugner, Stephan G\"unnemann ( 2 min )

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Di He, Jingjing Xu, Zhi Zhang, Hongxia Yang, Liwei Wang ( 2 min )

Open

Rank Jumps and Growth of Shafarevich--Tate Groups for Elliptic Curves in $\mathbb{Z}/p\mathbb{Z}$-Extensions

Lea Beneish, Debanjana Kundu, Anwesh Ray ( 2 min )

Categorical Koszul duality

Julian Holstein, Andrey Lazarev ( 2 min )

Fast-HotStuff: A Fast and Resilient HotStuff Protocol

Mohammad M. Jalalzai, Jianyu Niu, Chen Feng, Fangyu Gai ( 2 min )

Some Rigidity Theorem for Anosov Geodesic Flows

\'Italo Dowell, Sergio Roma\~na ( 2 min )

Affine Anosov representations and proper actions

Sourav Ghosh, Nicolaus Treib ( 2 min )

Signature Methods in Machine Learning

Terry Lyons, Andrew D. McLeod ( 3 min )

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models

Yi-Lin Sung, Jaehong Yoon, Mohit Bansal ( 3 min )

A multiobjective continuation method to compute the regularization path of deep neural networks

Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz ( 3 min )

Understanding Disparities in Post Hoc Machine Learning Explanation

Vishwali Mhasawade, Salman Rahman, Zoe Haskell-Craig, Rumi Chunara ( 2 min )

High-dimensional Functional Graphical Model Structure Learning via Neighborhood Selection Approach

Boxin Zhao, Percy S. Zhai, Y. Samuel Wang, Mladen Kolar ( 3 min )

Open

Building a Nest by an Automaton

Jurek Czyzowicz, Dariusz Dereniowski, Andrzej Pelc ( 2 min )

Non-fattening of mean curvature flow at singularities of mean convex type

Or Hershkovits, Brian White ( 2 min )

RETRACTED: Yang-Mills theory for bundle gerbes

Varghese Mathai, David Roberts ( 2 min )

A Two-Page "Derivation" of Schroedinger's Equation

C. Baumgarten ( 2 min )

Quadratic fields, Artin-Schreier extensions, and Bell numbers

Yoshinosuke Hirakawa ( 2 min )

CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks

Andrei Tomut, Saeed S. Jahromi, Sukhbinder Singh, Faysal Ishtiaq, Cesar Mu\~noz, Prabdeep Singh Bajaj, Ali Elborady, Gianni del Bimbo, Mehrazin Alizadeh, David Montero, Pablo Martin-Ramiro, Muhammad Ibrahim, Oussama Tahiri Alaoui, John Malcolm, Samuel Mugel, Roman Orus ( 2 min )

Transfer Learning for Contextual Multi-armed Bandits

Changxiao Cai, T. Tony Cai, Hongzhe Li ( 2 min )

At the junction between deep learning and statistics of extremes: formalizing the landslide hazard definition

Ashok Dahal, Rapha\"el Huser, Luigi Lombardo ( 3 min )

A Systematic Approach to Robustness Modelling for Deep Convolutional Neural Networks

Charles Meyers, Mohammad Reza Saleh Sedghpour, Tommy L\"ofstedt, Erik Elmroth ( 3 min )

A V2X-based Privacy Preserving Federated Measuring and Learning System

Levente Alekszejenk\'o, Tadeusz Dobrowiecki ( 2 min )

Open

Probabilistic Demand Forecasting with Graph Neural Networks

Nikita Kozodoi, Elizaveta Zinovyeva, Simon Valentin, Jo\~ao Pereira, Rodrigo Agundez ( 2 min )

Deep Latent Force Models: ODE-based Process Convolutions for Bayesian Deep Learning

Thomas Baldwin-McDonald, Mauricio A. \'Alvarez ( 2 min )

Differentially Private Distributed Estimation and Learning

Marios Papachristou, M. Amin Rahimian ( 3 min )

Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

Zhongjie Shi, Fanghui Liu, Yuan Cao, Johan A.K. Suykens ( 3 min )

Full Bayesian Significance Testing for Neural Networks

Zehua Liu, Zimeng Li, Jingyuan Wang, Yue He ( 2 min )

Open

Deep multitask neural networks for solving some stochastic optimal control problems

Christian Yeo ( 2 min )

Bayesian identification of nonseparable Hamiltonians with multiplicative noise using deep learning and reduced-order modeling

Nicholas Galioto, Harsh Sharma, Boris Kramer, Alex Arkady Gorodetsky ( 2 min )

A Stability Principle for Learning under Non-Stationarity

Chengpiao Huang, Kaizheng Wang ( 2 min )

Deep Neural Network Benchmarks for Selective Classification

Andrea Pugnana, Lorenzo Perini, Jesse Davis, Salvatore Ruggieri ( 2 min )

Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target Annotation

Shoaib Meraj Sami, Md Mahedi Hasan, Nasser M. Nasrabadi, Raghuveer Rao ( 3 min )

Open

Quantum-classical phase transition with spontaneous superposition breaking and single photon detection

Vladan Pankovic ( 3 min )

Orbital period derivative of a binary system using an exact orbital energy equation

Vikram H. Zaveri ( 2 min )

Periodic relativity: the theory of gravity in flat space time

Vikram H. Zaveri ( 3 min )

Bayesian nonparametric modeling for mean residual life regression

Valerie Poynor, Athanasios Kottas ( 3 min )

Collapsing 4-manifolds under a lower curvature bound

Takao Yamaguchi ( 2 min )

Open

Braid group actions from categorical symmetric Howe duality on deformed Webster algebras

Mikhail Khovanov, Aaron D. Lauda, Joshua Sussan, Yasuyoshi Yonezawa ( 2 min )

Characterizing Exoplanet Habitability

Tyler D. Robinson ( 2 min )

Open

Labeling Neural Representations with Inverse Recognition

Kirill Bykov, Laura Kopf, Shinichi Nakajima, Marius Kloft, Marina M.-C. H\"ohne ( 2 min )

Upper and lower bounds for the Lipschitz constant of random neural networks

Paul Geuchen, Thomas Heindl, Dominik St\"oger, Felix Voigtlaender ( 2 min )

Training Neural Networks is NP-Hard in Fixed Dimension

Vincent Froese, Christoph Hertrich ( 2 min )

Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep Ravikumar ( 3 min )

Querying Easily Flip-flopped Samples for Deep Active Learning

Seong Jin Cho, Gwangsu Kim, Junghyun Lee, Jinwoo Shin, Chang D. Yoo ( 2 min )

Open

Expected Utility Networks

Pierfrancesco La Mura, Yoav Shoham ( 2 min )

Game Networks

Pierfrancesco La Mura ( 2 min )

Projective Expected Utility

Pierfrancesco La Mura ( 2 min )

Team Decision Problems with Classical and Quantum Signals

Adam Brandenburger, Pierfrancesco La Mura ( 2 min )

Local regularity of the Bergman projection on a class of pseudoconvex domains of finite type

Tran Vu Khanh, Andrew Raich ( 2 min )

Algebraic solutions of tropical optimization problems

N. Krivulin ( 2 min )

Open

Tangent functor on microformal morphisms, and non-linear pullbacks for forms and cohomology

Theodore Th. Voronov ( 2 min )

Universal transient behavior in large dynamical systems on networks

Wojciech Tarnowski, Izaak Neri, Pierpaolo Vivo ( 3 min )

Geometric analysis of 1+1 dimensional quasilinear wave equations

Leonardo Enrique Abbrescia, Willie Wai Yeung Wong ( 2 min )

Global Optimization of Gaussian processes

Artur M. Schweidtmann, Dominik Bongartz, Daniel Grothe, Tim Kerkenhoff, Xiaopeng Lin, Jaromil Najman, Alexander Mitsos ( 2 min )

The Ramanujan-Petersson ans Selberg conjectures for Maass forms

Andr'e Unterberger ( 2 min )

Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions

Thorsten Gl\"usenkamp ( 3 min )

Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

Jing An, Jianfeng Lu ( 2 min )

Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via Mixed-Effect Models and Hierarchical Clustering

Lorenzo Valleggi, Marco Scutari, Federico Mattia Stefanini ( 3 min )

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models

Naoki Egami, Musashi Hinck, Brandon M. Stewart, Hanying Wei ( 3 min )

Modeling Latent Selection with Structural Causal Models

Leihao Chen, Onno Zoeter, Joris M. Mooij ( 2 min )

Open

Explaining Neural Networks without Access to Training Data

Sascha Marton, Stefan L\"udtke, Christian Bartelt, Andrej Tschalzev, Heiner Stuckenschmidt ( 2 min )

Controlling Moments with Kernel Stein Discrepancies

Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey ( 2 min )

A finite sample analysis of the benign overfitting phenomenon for ridge function estimation

Emmanuel Caron, Stephane Chretien ( 3 min )

On the Query Complexity of Training Data Reconstruction in Private Learning

Prateeti Mukherjee, Satya Lokam ( 3 min )

EC-NAS: Energy Consumption Aware Tabular Benchmarks for Neural Architecture Search

Pedram Bakhtiarifard, Christian Igel, Raghavendra Selvan ( 2 min )

Open

An Explainable Stacked Ensemble Model for Static Route-Free Estimation of Time of Arrival

S\"oren Schleibaum, J\"org P. M\"uller, Monika Sester ( 3 min )

ARMA Cell: A Modular and Effective Approach for Neural Autoregressive Modeling

Philipp Schiele, Christoph Berninger, David R\"ugamer ( 2 min )

Linear Spaces of Meanings: Compositional Structures in Vision-Language Models

Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto ( 2 min )

GPEX, A Framework For Interpreting Artificial Neural Networks

Amir Akbarnejad, Gilbert Bigras, Nilanjan Ray ( 3 min )

CP-PINNs: Changepoints Detection in PDEs using Physics Informed Neural Networks with Total-Variation Penalty

Zhikang Dong, Pawel Polak ( 2 min )

Open

A Theoretical View of Linear Backpropagation and Its Convergence

Ziang Li, Yiwen Guo, Haodi Liu, Changshui Zhang ( 2 min )

GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

Manu Joseph, Harsh Raj ( 2 min )

Hierarchical Correlation Clustering and Tree Preserving Embedding

Morteza Haghir Chehreghani, Mostafa Haghir Chehreghani ( 2 min )

Adaptive joint distribution learning

Damir Filipovic, Michael Multerer, Paul Schneider ( 2 min )

Generalized Optimistic Methods for Convex-Concave Saddle Point Problems

Ruichen Jiang, Aryan Mokhtari ( 3 min )

Open

New bounds on the strength of some restrictions of Hindman's Theorem

Lorenzo Carlucci, Leszek Aleksander Ko{\l}odziejczyk, Francesco Lepore, Konrad Zdanowski ( 2 min )

The zonoid algebra, generalized mixed volumes, and random determinants

Paul Breiding, Peter B\"urgisser, Antonio Lerario, L\'eo Mathis ( 2 min )

Virasoro Constraints for Toric Bundles

Tom Coates, Alexander Givental, Hsian-Hua Tseng ( 2 min )

Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition

Hamidreza Kasaei, Songsong Xiong ( 3 min )

On the Evolution of A.I. and Machine Learning: Towards a Meta-level Measuring and Understanding Impact, Influence, and Leadership at Premier A.I. Conferences

Rafael B. Audibert, Henrique Lemos, Pedro Avelar, Anderson R. Tavares, Lu\'is C. Lamb ( 3 min )

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

Wenhan Xia, Chengwei Qin, Elad Hazan ( 2 min )

Optimal rates of approximation by shallow ReLU$^k$ neural networks and applications to nonparametric regression

Yunfei Yang, Ding-Xuan Zhou ( 2 min )

Non-separable Covariance Kernels for Spatiotemporal Gaussian Processes based on a Hybrid Spectral Method and the Harmonic Oscillator

Dionissios T.Hristopulos ( 3 min )

General-Purpose In-Context Learning by Meta-Learning Transformers

Louis Kirsch, James Harrison, Jascha Sohl-Dickstein, Luke Metz ( 2 min )

Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes

Hyunouk Ko, Xiaoming Huo ( 2 min )