Tuesday, March 11, 2025
Opening (09:20 – 09:30)
09:30 – 10:10: Jakob Zech, Heidelberg University.
Statistical Learning Theory for Neural Operators
In this talk, we present new results on the sample size required to learn surrogates of nonlinear mappings between infinite-dimensional Hilbert spaces. Such surrogate models have a wide range of applications and can be used in uncertainty quantification and parameter estimation problems in fields such as classical mechanics, fluid mechanics, electrodynamics, earth sciences etc. Here, the operator input determines the problem configuration, such as initial conditions, material properties, or forcing terms of a partial differential equation (PDE) governing the underlying physics. The operator output corresponds to the PDE solution. Our analysis shows that, for certain neural network architectures, empirical risk minimization can overcome the curse of dimensionality. Specifically, we show that both the number of network parameters and the quantity of input-output data pairs required for training remain manageable, with the error converging at an algebraic rate. Additionally we provide numerical experiments comparing different architectures.
10:10 – 10:50: Kansei Ushiyama, The University of Tokyo.
Performance estimation problems for convergence rate analysis of continuous-time models for optimization algorithms
Joint work with: Shun Sato (The University of Tokyo), Takayasu Matsuo (The University of Tokyo)
Optimization algorithms are fundamental numerics for machine learning. The convergence rates of these algorithms, defined as the decreasing speed of ,
, or
, where
is the output of the
-th iteration of the algorithm and
is the optimal solution, represent the performance of these algorithms. Recently, the derivation and analysis of optimization algorithms using their continuous-time analogue have been attracting attention. This approach requires proving the convergence rates of the continuous-time analogue of algorithms expressed through ordinary differential equations (ODEs). Previous analyses of ODEs have relied on special Lyapunov functions that can reveal the convergence rates. However, identifying Lyapunov functions that can prove an anticipated convergence rate is a challenging task in most cases. In [1], a novel approach to analyzing convergence rates of ODEs is presented. This is a continuous-time analogue of the performance estimation problem proposed in [2], which is a framework for analyzing the convergence rates of discrete-time algorithms by formalizing the convergence rate of an algorithm as a solution to another optimization problem. However, this approach has certain limitations to this approach. it is only applicable to a restricted range of ODEs, and in some cases, the convergence rates it yields are not what we anticipate. In this talk, we propose a new continuous analogue of the performance estimation problem to overcome these limitations, as outlined in [3]. By discretizing an ODE and using results from this work, we can derive the fastest known optimization algorithm for a certain class of problems.
References
- J. Kim and I. Yang, Convergence analysis of ODE models for accelerated first-order methods via positive semidefinite kernels. In Advances in Neural Information Processing Systems, volume 37, 2023b.
- Y. Drori, and M. Teboulle, Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program., 145(1-2, Ser. A):451-482, 2014.
- K. Ushiyama, S. Sato, and T. Matsuo, Deriving optimal rates of continuous-time accelerated first-order methods via performance estimation problems, preprint, (2024).
Coffee break (10:50 – 11:20)
11:20 – 12:00: Eloi Martinet, University of Würzburg.
Meshless Shape Optimization using Neural Networks and Partial Differential Equations on Graphs
Joint work with: Leon Bungert (University of Würzburg)
Shape optimization involves the minimization of a cost function defined over a set of shapes, often governed by a partial differential equation (PDE). Since analytical solutions are typically unavailable, we need to rely on numerical methods to find an approximate solution. The level set method, when coupled with finite element analysis, is one of the most versatile numerical shape optimization approach. However, its reliance on meshing introduces limitations inherent to mesh-based methods.
In this talk, we present a fully meshless level set framework that leverages neural networks to parameterize the level set function and employs the graph Laplacian to solve the underlying PDE. This approach enables precise computation of geometric quantities such as normals and curvature. Furthermore, we exploit the flexibility of neural networks to address optimization problems within the class of convex shapes.
Lunch break (12:00 – 14:00)
14:00 – 14:40: Amy Braverman, Jet Propulsion Laboratory, California Institute of Technology.
Simulation-based Uncertainty Quantification
Simulation-based inference (SBI) [1] is a relatively new paradigm for statistical inference that enables inference without parametric model assumptions. It provides the general basis for techniques such as approximate Bayesian computation (ABC) and likelihood-free inference procedures that have been used in statistics for some time. The basic idea is that one does not need to know the likelihood function if one has a mechanism for simulating samples from the data-generating mechanism. For example, a data set produced by a carefully designed simulation experiment utilizing a high-fidelity computational model would suffice. One can produce as much data as one wants or needs by running the computational model, or a fast emulator of it provided by modern machine learning methods. Here I introduce simulation-based uncertainty quantification (SBUQ) [2, 3] which utilizes similar ideas, particularly in the context of inverse problems. For context, I use the real-world problem that led us to SBUQ: uncertainty quantification for remote sensing inversions. This is an ill-posed problem wherein spectra observed from Earth-orbiting satellites are used to infer physical characteristics of the Earth.
References
- Kyle Cranmer, Johann Brehmer, and Gilles Louppe (2020), The frontier of simulation-based inference, PNAS, Volume 117, Number 48, pages 30055-30062. doi: 10.1073/pnas.1912789117.
- Jonathan M. Hobbs, Amy Braverman, Noel Cressie, Robert Granat, and Michael Gunson (2017). Simulation Based Uncertainty Quantification for Estimating Atmospheric CO2 from Satellite Data, SIAM/ASA Journal on Uncertainty Quantification, Volume 5, Number 1, pages 956–985. doi: 10.1137/16M1060765.
- Amy Braverman, Jonathan M. Hobbs, Joaquim Teixeira, and Michael Gunson (2021), Post hoc Uncertainty Quantification for Remote Sensing Observing Systems, SIAM/ASA Journal on Uncertainty Quantification, Volume 9, Number 3, pages 1064–1093. doi: 10.1137/19M1304283.
14:40 – 15:20: Tan Bui-Thanh, The University of Texas at Austin.
Learn2Solve: A Deep Learning Framework for Real-Time Solutions of forward, inverse, and UQ Problems
Digital models (DMs) are designed to be replicas of systems and processes. At the core of a digital model (DM) is a physical/mathematical model that captures the behavior of the real system across temporal and spatial scales. One of the key roles of DMs is enabling “what if” scenario testing of hypothetical simulations to understand the implications at any point throughout the life cycle of the process, to monitor the process, to calibrate parameters to match the actual process and to quantify the uncertainties. In this talk, we will present various (faster than) real-time Scientific Deep Learning (SciDL) approaches for forward, inverse, and UQ problems. Both theoretical and numerical results for various problems including transport, heat, Burgers, (transonic and supersonic) Euler, and Navier-Stokes equations will be presented.
Coffee break (15:20 – 16:00)
16:00 – 16:40: Holger Fröning, Heidelberg University.
Bayesian Machines: Unlocking the Potential of Bayesian Neural Networks for Enhanced Uncertainty Reasoning
Deep neural networks (DNNs) [1] are a prominent approach for decision-making in scenarios involving uncertainty. These networks have significantly enhanced performance in various prediction tasks, such as image recognition, speech processing, and signal analysis. However, these networks require substantial computational resources and memory. While DNNs perform well under uncertainty, they lack the ability to reason about uncertainty itself. It is crucial to identify situations where a neural network cannot provide a reliable prediction.
In real-world scenarios, training data is rarely complete, necessitating reasoning about uncertainty when operating on out-of-distribution data. Similarly, sensor measurements often include noise, which is typically manageable under normal conditions but can degrade significantly under adverse circumstances, such as poor weather. In such cases, a model’s inability to deliver dependable predictions should manifest as increased prediction uncertainty.
This limitation has led to growing interest in probabilistic models. For example, Bayesian neural networks (BNNs) [2] offer a way to account for uncertainty but come with significantly higher computational demands. While traditional methods to accelerate DNNs focus on techniques like quantization and pruning [3], speeding up BNNs involves approximating their underlying probability distributions. This requires balancing cost and quality, a topic we will explore through various approaches.
Furthermore, while DNNs are well-suited for GPU acceleration [4], BNNs are less compatible with such hardware. Given this fundamental deployment challenge, we will examine the potential of specialized hardware for probabilistic tasks and introduce the concept of a ”Bayesian Machine” [5] as a first step toward addressing these challenges.
References
- Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
- L. V. Jospin, H. Laga, F. Boussaid, W. Buntine and M. Bennamoun, Hands-On Bayesian Neural Networks—A Tutorial for Deep Learning Users, IEEE Computational Intelligence Magazine, vol. 17, no. 2, pp. 29-48, May 2022, https://doi.org/10.1109/MCI.2022.3155327
- W. Roth, G. Schindler, B. Klein, R. Peharz, S. Tschiatschek, H. Fröning, F. Pernkopf and Z. Ghahramani, Resource-Efficient Neural Networks for Embedded Systems, Journal of Machine Learning Research, 25(50), 1–51, 2024, http://jmlr.org/papers/v25/18-566.html
- Sara Hooker. 2021. The hardware lottery. Commun. ACM 64, 12 (December 2021), 58–65. https://doi.org/10.1145/3467017
- F. Brückerhoff-Plückelmann, H. Borras, B. Klein, A. Varri, M. Becker, J. Dijkstra, M. Brückerhoff, C.D. Wright, M. Salinga, H. Bhaskaran, B. Risse, H. Fröning & W. Pernice, Probabilistic photonic computing with chaotic light, Nat Commun 15, 10445 (2024). https://doi.org/10.1038/s41467-024-54931-6
16:40 – 17:20: Michael Feischl, TU Wien.
Towards optimal hierarchical training of neural networks
Joint work with: Alexander Rieder (TU Wien), Fabian Zehetgruber (TU Wien)
We propose a hierarchical training algorithm for standard feed-forward neural networks that adaptively extends the network architecture as soon as the optimization reaches a stationary point. By solving small (low-dimensional) optimization problems, the extended network provably escapes any local minimum or stationary point. Under some assumptions on the approximability of the data with stable neural networks, we show that the algorithm achieves an optimal convergence rate s in the sense that loss is bounded by the number of parameters to the . As a byproduct, we obtain computable indicators which judge the optimality of the training state of a given network and derive a new notion of generalization error.
References
- Michael Feischl, Alexander Rieder, Fabian Zehetgruber (2024), Towards optimal hierarchical training of neural networks, arXiv:2407.02242
Wednesday, March 12, 2025
09:30 – 10:10: Dejan Slepčev, Carnegie Mellon University.
Interacting particle dynamics for sampling in high dimensions
Joint work with: Lantian Xu (Carnegie Mellon University) and Elias Hess-Childs (Carnegie Mellon University)
Motivated by the task of sampling measures in high dimensions we introduce a new geometry on the space of measures called Radon-Wasserstein geometry and show that gradient flows of Kullback-Leibler divergence with respect to the Radon-Wasserstein geometry can be approximated well by interacting particles in high dimensions. We will discuss the properties of the mean-field flow and its convergence towards the desired measure. We will also show that the flow of the interacting particle system can be computed accurately and efficiently in high dimensions using a slicing technique. Finally we will discuss the numerical performance of the method.
10:10 – 10:50: Damien Garreau, University of Würzburg.
Are Ensembles Getting Better all the Time?
Ensemble methods combine the predictions of several base models. Does including more models always improves their average performance? In this talk I will show that the answer depends on the kind of ensemble considered, as well as the predictive metric chosen. I will focus on situations where all members of the ensemble are a priori expected to perform as well, which is the case of several popular methods such as random forests or deep ensembles. In this setting, I will show that ensembles are getting better all the time if, and only if, the considered loss function is convex; when the loss function is nonconvex, ensembles of good models keep getting better, and ensembles of bad models keep getting worse.
Preprint: https://arxiv.org/abs/2311.17885
Coffee break (10:50 – 11:20)
11:20 – 12:00: Petr Knobloch, Charles University, Prague.
Computation of stabilization parameters using machine learning
Joint work with: Manoj Prakash (Charles University, Prague)
For various types of partial differential equations, standard finite element discretizations are often unstable, which can be cured by adding suitable stabilization terms. Typically, these terms contain user-chosen parameters whose optimal choice is usually not known but which considerably influence the quality of the approximate solution. In this talk, we will consider stabilized methods for steady convection-diffusion equations. A typical example is the streamline upwind Petrov–Galerkin (SUPG) method [2]. It is possible to compute the stabilization parameters a posteriori in an adaptive way by minimizing a target functional characterizing the quality of the approximate solution [3], however, this functional is often difficult to design. Moreover, the solution of this high-dimensional constrained nonlinear optimization problem is usually very time-consuming. Therefore, our aim is to develop methods based on techniques from machine learning in order to select (nearly) optimal stabilization parameters in a cheap way. The idea is to compute these parameters locally based on properties of the SUPG solution obtained with standard (nonoptimal) parameters. The training phase will use parameters computed by the mentioned minimization approach employing accurate approximate solutions which can be obtained using nonlinear (and hence again costly) approaches [1]. We will report our first experiences with this strategy.
References
- Gabriel R. Barrenechea, Volker John, and Petr Knobloch (2024), Finite element methods respecting the discrete maximum principle for convection-diffusion equations, SIAM Rev. 66, 3–88.
- A. Brooks and T. Hughes (1982), Streamline upwind/Petrov-Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier–Stokes equations, Comput. Methods Appl. Mech. Engrg. 32, 199–259.
- V. John, P. Knobloch, and S.B. Savescu (2011), A posteriori optimization of parameters in stabilized methods for convection–diffusion problems – Part I, Comput. Methods Appl. Mech. Engrg. 200, 2916–2929.
Lunch break (12:00 – 14:00)
14:00 – 14:40: Kathrin Hellmuth, University of Würzburg.
Experimental Design for Inverse Problems through Random Sampling of the Gauss-Newton Hessian
Joint work with: Christian Klingenberg (University of Würzburg), Qin Li (University of Wisconsin-Madison)
Numerical reconstructability of unknown parameters in inverse problems heavily relies on the chosen data. Therefore, it is crucial to design an experiment that yields data that is sensitive to the parameters. We introduce a novel perspective onto experimental designing as a down sampling task of the input to output map. Framing the inverse problem as a least squares optimization, we propose a general framework that provides an efficient down-sampling strategy that can select data that preserves the strict positivity of the Gauss-Newton Hessian at the global minimum point of the objective function. The choice of the sampling distribution is pivotal and is based on a matrix sketching technique from randomized linear algebra for the Hessian. Gradient free sampling methods are integrated to draw the samples from this distribution and thus execute the data selection. Numerical experiments demonstrate the effectiveness of this method in selecting sensor locations for Schrödinger potential reconstruction.
14:40 – 15:20: Zhi-Song Liu, Lappeenranta-Lahti University of Technology.
Iterative Inversion for 3D Point Clouds Upsampling
Iterative Inversion is a trending formulation for supervised signal restoration. It has been used in image processing and has shown great potential for solving fast image inverse problems. Different from images with pixels attached to a static grid, point clouds are unordered and irregular, invariant to their members’ permutations. Existing point cloud upsampling approaches learn the conditional noise-to-data diffusion mapping via L2 loss, resulting in slow inference and costly training computation. We argue that employing a data-to-data diffusion approach is more efficient for point cloud upsampling, especially with specific sensor data. Hence, we propose an efficient iterative inversion approach to directly upsampling sparse point clouds with denser ones. To build a tractable step-wise upsampling, we propose using the shortest path interpolation to bridge the sparse and dense point clouds. In experiments on various object-centric and real-world datasets, ours show promising results with notable margin improvements in upsampling quality and a significant reduction in running time. We also show that even though our model is trained on paired data, it can also overcome various noises for real applications.
Coffee break (15:20 – 16:00)
16:00 – 16:40: Satoru Iwasaki, Osaka University.
Surrogate Modeling for Thin Domain PDEs via Reduction Theory
Nonlinear partial differential equations (PDEs) are, in general, analytically intractable, prompting extensive investigations into methodologies for addressing such equations. One prominent approach in this context is spatial reduction theory, which simplifies a given PDE by mapping it onto a PDE in lower-dimensional spatial domain [1]. A central focus of spatial reduction theory lies in rigorously quantifying the error between the solutions (or attractors) of the original PDE and those of the reduced PDE defined on the lower-dimensional domain.
In practical applications, PDEs are addressed through numerical approximation methods due to the general infeasibility of obtaining analytical solutions. However, the numerical treatment of three-dimensional PDEs for real-world problems typically necessitates the discretization of the domain into a large number of voxels, resulting in prohibitive computational costs. Consequently, such analyses often require access to high-performance computational resources.
To overcome these challenges, numerous approaches have been proposed to accelerate the numerical computation of PDEs with a compromise in accuracy. These methods endeavor to achieve an optimal balance between computational efficiency and solution fidelity.
Surrogate models constitute a versatile class of machine learning frameworks designed to approximate the solutions of mathematical models more efficiently than conventional numerical approaches. Among these, latent space modeling has emerged as a notable technique for PDEs, whereby state variables are embedded into a reduced latent space, and the dynamics are modeled within this reduced framework. Nevertheless, this methodology is not without its limitations. First, the training process typically involves a large number of parameters, imposing significant computational demands. Second, the latent variables, being the output of an encoder network, are often challenging to interpret from a physical standpoint.
To address these limitations, this study proposes a surrogate modeling framework specifically designed for PDEs that are amenable to spatial reduction theory, enabling rigorous theoretical analysis. The architecture of the proposed model is depicted in Figure 1. By incorporating spatially reduced PDEs derived from reduction theory to define latent space dynamics, the proposed approach seeks to develop a surrogate model that combines computational efficiency with enhanced interpretability.
In the presentation, we will provide a detailed exposition of the surrogate model’s architecture and present the results of numerical experiments.
References
- G. Raugel (1995), Dynamics of partial differential equations on thin domains, 4th Chapter of Lecture Notes in Mathematics 1609, Springer, pp.208-315.

16:40 – 17:20: Jon Cockayne, University of Southampton.
Calibrated Computation-Aware Gaussian Processes
Gaussian processes are notorious for scaling cubically with the size of the training set, preventing application to very large regression problems. Computation-aware Gaussian processes (CAGPs) tackle this scaling issue by exploiting probabilistic linear solvers to reduce complexity, widening the posterior with additional computational uncertainty due to reduced computation. However, the most commonly used CAGP framework results in (sometimes dramatically) conservative uncertainty quantification, making the posterior unrealistic in practice. In this work, we prove that if the utilised probabilistic linear solver is calibrated, in a rigorous statistical sense, then so too is the induced CAGP. We thus propose a new CAGP framework, CAGP-GS, based on using Gauss-Seidel iterations for the underlying probabilistic linear solver. CAGP-GS performs favourably compared to existing approaches when the test set is low-dimensional and few iterations are performed. We test the calibratedness on a synthetic problem, and compare the performance to existing approaches on a large-scale global temperature regression problem.
Dinner (18:30)
Thursday, March 13, 2025
09:30 – 10:10: Andreas Hauptmann, University of Oulu, Finland; University College London, UK.
Learned iterative reconstructions with applications to linear and nonlinear inverse problems
In recent years, the paradigm of data-driven reconstruction has gathered considerable attention, due to its success in improving reconstruction quality, but also computational speed-up. Nevertheless, the majority of such data-driven approaches still come without a thorough mathematical understanding. While we cannot solve this shortcoming yet, we will provide a conceptual overview of data-driven approaches with an emphasis on learned iterative reconstructions and its varying flavours. We will also give examples of theoretical guarantees that can be achieved in this setting [1,2].
We will furthermore discuss how learned iterative reconstructions can be applied to practical inverse problems with experimental data and necessary modifications to the considered. Specifically, we provide examples for the linear problem of photoacoustic tomography as well as nonlinear diffuse optical tomography [3].
References
- Mukherjee, S., Hauptmann, A., Öktem, O., Pereyra, M., and Schönlieb, C. B. (2023). Learned reconstruction methods with convergence guarantees: A survey of concepts and applications. IEEE Signal Processing Magazine, 40(1), 164-182.
- Hauptmann, A., Mukherjee, S., Schönlieb, C. B., and Sherry, F. (2024). Convergent regularization in inverse problems and linear plug-and-play denoisers. Foundations of Computational Mathematics, 1-34.
- Mozumder, M., Hauptmann, A., Nissilä, I., Arridge, S. R., and Tarvainen, T. (2022). A model-based iterative learning approach for diffuse optical tomography. IEEE Transactions on Medical Imaging, 41(5), 1289-1299.
10;10 – 10:50: Alice Oberacker, Saarland University.
Reducing Motion Artifacts in Nano-CT Imaging with a Learned RESESOP-Kaczmarz Method
Tomographic X-ray imaging at the nano-scale helps reveal the structures of materials like alloys and biological tissue. However, environmental perturbances during data acquisition can cause motion between the object and scanner. To reduce noise in the back-projection, a learned version of the RESESOP-Kaczmarz method was investigated. The deep network was trained with simulated imaging data to unroll the iterative reconstruction process, allowing the network to learn the back-projected image after a fixed number of iterations.
Coffee break (10:50 – 11:20)
11:20 – 12:00: Chen Song, ABB Corporate Research Center.
Physics-Informed Machine Learning in Non-invasive Measurement Techniques
This presentation delves into the innovative use of Physics-Informed Neural Networks (PINNs) to enhance non-invasive temperature sensing in low-conductivity pipes. Traditional non-invasive temperature sensors often struggle with long response times when used with materials like fiberglass, leading to delays in process control and reduced efficiency.
PINNs offer a solution by embedding physical laws directly into the neural network’s learning process, significantly improving the accuracy and response time of these sensors. This approach leverages multiple sensor readings over time to provide a dynamic and precise estimation of fluid temperatures within pipes. The presentation will cover the principles behind PINNs, the methodology for solving inverse problems in temperature measurement, and the results of numerical experiments that demonstrate the effectiveness of this approach.
By integrating machine learning with fundamental physical principles, this work showcases the potential of PINNs to revolutionize industrial temperature sensing, offering faster and more reliable solutions for enhanced operational efficiency.
Lunch break (12:00 – 14:00)
14:00 – 14:40: Kota Takeda, Kyoto University.
Error analysis and numerical issues in data assimilation
Joint work with: Takashi Sakajo (Kyoto University)
Data assimilation is a theory for the seamless integration of data into numerical models. Typical application is numerical weather prediction, in which integrating observational data into atmospheric models enables accurate forecasting. In this talk, we consider the sequential state estimation problems from noisy observation for a class of nonlinear dynamical systems on Hilbert spaces, including the two-dimensional Navier-Stokes equations and atmospheric toy models. For such nonlinear dynamical systems, the ensemble Kalman filter (EnKF) is often used to approximate the mean and covariance of the probability distribution representing uncertainty in the state estimation. We review current results for the error analysis of the EnKF when applied to the nonlinear dynamical systems, including my result [1]. In addition, we discuss issues related to numerics appeared in theories and applications of data assimilation.
References
- K. Takeda and T. Sakajo (2024), Uniform error bounds of the ensemble transform Kalman filter for chaotic dynamics with multiplicative covariance inflation, SIAM/ASA Journal on Uncertainty Quantification, 12(4), 1315–1335.
14:40 – 15:20: Yuka Hashimoto, NTT Network Service Systems Laboratories / RIKEN AIP.
Reproducing kernel Hilbert C*-module and spectral truncation kernel
Reproducing kernel Hilbert C*-module (RKHM) is a generalization of Reproducing kernel Hilber space (RKHS) and is characterized by a C*-algebra-valued positive definite kernel and the inner product induced by this kernel. The application of RKHSs to data analysis has been investigated. The representation power and solid theoretical understanding of RKHSs made methods with RKHSs useful tools with wide applications, such as principal component analysis, support vector machine, and regression. In addition, their reproducing property enables us to implement algorithms in RKHSs easily.
In this talk, we generalize the data analysis in RKHSs to RKHMs. The theory of RKHM has been studied in mathematical physics and pure mathematics [1, 2]. However, its application to data analysis has not been discussed before our work. The advantages of applying RKHMs instead of RKHSs are that we can enlarge representation spaces and construct positive definite kernels using the product structure in the C*-algebra.
We first show fundamental theoretical properties in RKHMs, such as representer theorem [3, 4]. The representer theorem is an important theorem for data analysis to guarantee that solutions of a minimization problem are represented only with given data. We generalize the representer theorem for RKHSs to that for RKHMs. Since the C*-algebra can be infinite-dimensional, the analysis is not straightforward. For the generalization, we need an additional assumption regarding the closeness of a module. To remove this additional assumption, we also show an approximate version of the representer theorem, which guarantees that solutions of the minimization problem are sufficiently approximated by given data.
To further our fundamental understanding of analysis with C*-algebraic kernels, we propose a new class of positive definite kernels based on the spectral truncation [5]. Spectral truncation has been discussed in the fields of noncommutative geometry and C*-algebra, and it is characterized by Toeplitz matrices. We focus on kernels whose inputs and outputs are vectors or functions and generalize typical kernels by introducing the noncommutativity of the products appearing in the kernels. We show that the noncommutativity is a governing factor for capturing local and global dependencies of the outputs on the inputs. The flexibility of the proposed class of kernels allows us to go beyond previous kernels, addressing two of the foremost issues regarding learning in RKHSs for vector- or function-valued outputs, namely the choice of the kernel and the computational cost [6].
References
- Shigeru Itoh. Reproducing kernels in modules over C*-algebras and their applications. Journal of Mathematics in Nature Science, pages 1–20, 1990.
- Jaeseong Heo. Reproducing kernel Hilbert C*-modules and kernels associated with cocycles. Journal of Mathematical Physics, 49(10):103507, 2008.
- Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda, Fuyuta Komura, Takeshi Katsura, and Yoshinobu Kawahara. Reproducing kernel Hilbert C*-module and kernel mean embeddings. Journal of Machine Learning Research, 22(267):1–56, 2021 (update version: arXiv:2101.11410v2).
- Yuka Hashimoto, Fuyuta Komura, and Masahiro Ikeda. Hilbert C*-module for analyzing structured data. In Matrix and Operator Equations, pages 633–659, 2023.
- Walter D. van Suijlekom. Gromov–Hausdorff convergence of state spaces for spectral truncations. Journal of Geometry and Physics, 162:104075, 2021.
- Yuka Hashimoto, Ayoub Hafid Masahiro Ikeda, and Hachem Kadri. Spectral truncation kernels: noncommutativity in C*-algebraic kernel machines. arXiv:2405.17823, 2024.