Dong-Kyum Kim
Postdoc @ Max Planck Institute for Security and Privacy (MPI-SP)
I am a postdoctoral researcher at the Max Planck Institute for Security and Privacy (MPI-SP). I completed my Ph.D. in Physics at KAIST, where I focused on AI applications in nonequilibrium physics. My research interests lie at the intersection of AI, complex systems, and nonequilibrium physics, with a particular focus on deep learning approaches.
I am passionate about using interdisciplinary approaches to gain a better understanding of how AI algorithms function and how we can improve their performance. To this end, I am currently exploring the use of neuroscience methods to gain insight into the behavior and performance of deep learning algorithms. I believe that this approach, known as brain-inspired AI, has the potential to lead to significant advances in the field of AI.
For more information, check out my publications and CV. I am always open to discussing my research and potential collaboration opportunities.
publications & preprints
- ICLRBilinear relational structure fixes reversal curse and enables consistent model editingDong-Kyum Kim, Minsung Kim, Jea Kwon, Nakyeong Yang, and Meeyoung ChaIn The Fourteenth International Conference on Learning Representations 2026
The reversal curse – a language model’s (LM) inability to infer an unseen fact “B is A” from a learned fact “A is B” – is widely considered a fundamental limitation. We show that this is not an inherent failure but an artifact of how models encode knowledge. By training LMs from scratch on a synthetic dataset of relational knowledge graphs, we demonstrate that bilinear relational structure emerges in their hidden representations. This structure substantially alleviates the reversal curse, enabling LMs to infer unseen reverse facts. Crucially, we also find that this bilinear structure plays a key role in consistent model editing. When a fact is updated in a LM with this structure, the edit correctly propagates to its reverse and other logically dependent facts. In contrast, models lacking this representation not only suffer from the reversal curse but also fail to generalize edits, further introducing logical inconsistencies. Our results establish that training on a relational knowledge dataset induces the emergence of bilinear internal representations, which in turn enable LMs to behave in a logically consistent manner after editing. This implies that the success of model editing depends critically not just on editing algorithms but on the underlying representational geometry of the knowledge being modified.
@inproceedings{kim2026bilinear, title = {Bilinear relational structure fixes reversal curse and enables consistent model editing}, author = {Kim, Dong-Kyum and Kim, Minsung and Kwon, Jea and Yang, Nakyeong and Cha, Meeyoung}, year = {2026}, url = {https://openreview.net/forum?id=pdNaYcApbz}, booktitle = {The Fourteenth International Conference on Learning Representations}, } - ICLRErase or Hide? Suppressing Spurious Unlearning Neurons for Robust UnlearningNakyeong Yang, Dong-Kyum Kim, Jea Kwon, Minsung Kim, Kyomin Jung, and Meeyoung ChaIn The Fourteenth International Conference on Learning Representations 2026
Large language models trained on web-scale data can memorize private or sensitive knowledge, raising significant privacy risks. Although some unlearning methods mitigate these risks, they remain vulnerable to "relearning" during subsequent training, allowing a substantial portion of forgotten knowledge to resurface. In this paper, we show that widely used unlearning methods cause shallow alignment: instead of faithfully erasing target knowledge, they generate spurious unlearning neurons that amplify negative influence to hide it. To overcome this limitation, we introduce Ssiuu, a new class of unlearning methods that employs attribution-guided regularization to prevent spurious negative influence and faithfully remove target knowledge. Experimental results confirm that our method reliably erases target knowledge and outperforms strong baselines across two practical retraining scenarios: (1) adversarial injection of private data, and (2) benign attack using an instruction-following benchmark. Our findings highlight the necessity of robust and faithful unlearning methods for safe deployment of language models.
@inproceedings{kim2026unlearnneuron, title = {Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning}, author = {Yang, Nakyeong and Kim, Dong-Kyum and Kwon, Jea and Kim, Minsung and Jung, Kyomin and Cha, Meeyoung}, year = {2026}, url = {https://openreview.net/forum?id=z2zFk9jYpw}, booktitle = {The Fourteenth International Conference on Learning Representations}, } - PreprintHow Training Data Shapes the Use of Parametric and In-Context Knowledge in Language ModelsMinsung Kim, Dong-Kyum Kim, Jea Kwon, Nakyeong Yang, Kyomin Jung, and Meeyoung ChaarXiv preprint arXiv:2510.02370 2025
Large language models leverage not only parametric knowledge acquired during training but also in-context knowledge provided at inference time, despite the absence of explicit training objectives for using both sources. Prior work has further shown that when these knowledge sources conflict, models resolve the tension based on their internal confidence, preferring parametric knowledge for high-confidence facts while deferring to contextual information for less familiar ones. However, the training conditions that give rise to such knowledge utilization behaviors remain unclear. To address this gap, we conduct controlled experiments in which we train language models while systematically manipulating key properties of the training data. Our results reveal a counterintuitive finding: three properties commonly regarded as detrimental must co-occur for robust knowledge utilization and conflict resolution to emerge: (i) intra-document repetition of information, (ii) a moderate degree of within-document inconsistency, and (iii) a skewed knowledge frequency distribution. We further validate that the same training dynamics observed in our controlled setting also arise during real-world language model pretraining, and we analyze how post-training procedures can reshape models’ knowledge preferences. Together, our findings provide concrete empirical guidance for training language models that harmoniously integrate parametric and in-context knowledge.
@article{kim2026parametric, title = {How Training Data Shapes the Use of Parametric and In-Context Knowledge in Language Models}, author = {Kim, Minsung and Kim, Dong-Kyum and Kwon, Jea and Yang, Nakyeong and Jung, Kyomin and Cha, Meeyoung}, journal = {arXiv preprint arXiv:2510.02370}, year = {2025}, } - PreprintUncovering Emergent Physics Representations Learned In-Context by Large Language ModelsYeongwoo Song, Jaeyong Bae, Dong-Kyum Kim, and Hawoong JeongarXiv preprint arXiv:2508.12448 2025
Large language models (LLMs) exhibit impressive in-context learning (ICL) abilities, enabling them to solve wide range of tasks via textual prompts alone. As these capabilities advance, the range of applicable domains continues to expand significantly. However, identifying the precise mechanisms or internal structures within LLMs that allow successful ICL across diverse, distinct classes of tasks remains elusive. Physics-based tasks offer a promising testbed for probing this challenge. Unlike synthetic sequences such as basic arithmetic or symbolic equations, physical systems provide experimentally controllable, real-world data based on structured dynamics grounded in fundamental principles. This makes them particularly suitable for studying the emergent reasoning behaviors of LLMs in a realistic yet tractable setting. Here, we mechanistically investigate the ICL ability of LLMs, especially focusing on their ability to reason about physics. Using a dynamics forecasting task in physical systems as a proxy, we evaluate whether LLMs can learn physics in context. We first show that the performance of dynamics forecasting in context improves with longer input contexts. To uncover how such capability emerges in LLMs, we analyze the model’s residual stream activations using sparse autoencoders (SAEs). Our experiments reveal that the features captured by SAEs correlate with key physical variables, such as energy. These findings demonstrate that meaningful physical concepts are encoded within LLMs during in-context learning. In sum, our work provides a novel case study that broadens our understanding of how LLMs learn in context.
@article{song2025uncovering, title = {Uncovering Emergent Physics Representations Learned In-Context by Large Language Models}, author = {Song, Yeongwoo and Bae, Jaeyong and Kim, Dong-Kyum and Jeong, Hawoong}, journal = {arXiv preprint arXiv:2508.12448}, year = {2025}, } - ICLR BlogpostIn Search of the Engram in LLMs: A Neuroscience Perspective on the Memory Functions in AI ModelsMinsung Kim, Jea Kwon, Dong-Kyum Kim, and Meeyoung ChaIn The Fourth Blogpost Track at ICLR 2025 2025
Large Language Models (LLMs) are enhancing our daily lives but also pose risks like spreading misinformation and violating privacy, highlighting the importance of understanding how they process and store information. This blogpost offers a fresh look into a neuroscience-inspired perspective of LLM’s memory functions, based on the concept of engrams-the physical substrate of memory in living organism. We discuss a synergy between AI research and neuroscience, as both fields cover complexities of intelligent systems.
@inproceedings{kim2025search, title = {In Search of the Engram in LLMs: A Neuroscience Perspective on the Memory Functions in AI Models}, author = {Kim, Minsung and Kwon, Jea and Kim, Dong-Kyum and Cha, Meeyoung}, booktitle = {The Fourth Blogpost Track at ICLR 2025}, year = {2025}, } - Nat. Comm.Spontaneous emergence of rudimentary music detectors in deep neural networksGwangsu Kim, Dong-Kyum Kim, and Hawoong JeongNature Communications 2024
Music exists in almost every society, has universal acoustic features, and is processed by distinct neural circuits in humans even with no experience of musical training. However, it remains unclear how these innate characteristics emerge and what functions they serve. Here, using an artificial deep neural network that models the auditory information processing of the brain, we show that units tuned to music can spontaneously emerge by learning natural sound detection, even without learning music. The music-selective units encoded the temporal structure of music in multiple timescales, following the population-level response characteristics observed in the brain. We found that the process of generalization is critical for the emergence of music-selectivity and that music-selectivity can work as a functional basis for the generalization of natural sound, thereby elucidating its origin. These findings suggest that evolutionary adaptation to process natural sounds can provide an initial blueprint for our sense of music.
@article{kim2024music, author = {Kim, Gwangsu and Kim, Dong-Kyum and Jeong, Hawoong}, title = {Spontaneous emergence of rudimentary music detectors in deep neural networks}, year = {2024}, volume = {15}, issue = {148}, journal = {Nature Communications} } - NeurIPSTransformer as a hippocampal memory consolidation model based on NMDAR-inspired nonlinearityDong-Kyum Kim, Jea Kwon, Meeyoung Cha, and C. Justin LeeIn Thirty-seventh Conference on Neural Information Processing Systems 2023
The hippocampus plays a critical role in learning, memory, and spatial representation, processes that depend on the NMDA receptor (NMDAR). Here we build on recent findings comparing deep learning models to the hippocampus and develop a new nonlinear activation function based on NMDAR dynamics. We find that NMDAR-like nonlinearity is essential for shifting short-term working memory into long-term reference memory in transformers, thus enhancing a process that resembles memory consolidation in the mammalian brain. We design a navigation task assessing these two memory functions and show that manipulating the activation function (i.e., mimicking the Mg^2+-gating of NMDAR) disrupts long-term memory processes. Our experiments suggest that place cell-like functions and reference memory reside in the feed-forward network layer of transformers and that nonlinearity drives these processes. We discuss the role of NMDAR-like nonlinearity in establishing this striking resemblance between transformer architecture and hippocampal spatial representation.
@inproceedings{kim2023nmda, author = {Kim, Dong-Kyum and Kwon, Jea and Cha, Meeyoung and Lee, C. Justin}, title = {Transformer as a hippocampal memory consolidation model based on NMDAR-inspired nonlinearity}, year = {2023}, url = {https://openreview.net/forum?id=vKpVJxplmB}, booktitle = {Thirty-seventh Conference on Neural Information Processing Systems}, } - IJCVSUBTLE: An Unsupervised Platform with Temporal Link Embedding that Maps Animal BehaviorJea Kwon, Sunpil Kim, Dong-Kyum Kim, Jinhyeong Joo, SoHyung Kim, Meeyoung Cha, and C. Justin LeeInternational Journal of Computer Vision 2024
While huge strides have recently been made in language-based machine learning, the ability of artificial systems to comprehend the sequences that comprise animal behavior has been lagging behind. In contrast, humans instinctively recognize behaviors by finding similarities in behavioral sequences. Here, we develop an unsupervised behavior-mapping framework, SUBTLE (spectrogram-UMAP-based temporal-link embedding), to capture comparable behavioral repertoires from 3D action skeletons. To find the best embedding method, we devise a temporal proximity index (TPI) as a new metric to gauge temporal representation in the behavioral embedding space. The method achieves the best TPI score compared to current embedding strategies. Its spectrogram-based UMAP clustering not only identifies subtle inter-group differences but also matches human-annotated labels. SUBTLE framework automates the tasks of both identifying behavioral repertoires like walking, grooming, standing, and rearing, and profiling individual behavior signatures like subtle inter-group differences by age. SUBTLE highlights the importance of temporal representation in the behavioral embedding space for human-like behavioral categorization.
@article{Kwon2024, author = {Kwon, Jea and Kim, Sunpil and Kim, Dong-Kyum and Joo, Jinhyeong and Kim, SoHyung and Cha, Meeyoung and Lee, C. Justin}, title = {SUBTLE: An Unsupervised Platform with Temporal Link Embedding that Maps Animal Behavior}, year = {2024}, doi = {10.1007/s11263-024-02072-0}, journal = {International Journal of Computer Vision}, } - PRRMultidimensional entropic bound: Estimator of entropy production for Langevin dynamics with an arbitrary time-dependent protocolSangyun Lee, Dong-Kyum Kim, Jong-Min Park, Won Kyu Kim, Hyunggyu Park, and Jae Sung LeePhys. Rev. Research Mar 2023
Entropy production (EP) is a key quantity in thermodynamics, and yet measuring EP has remained a challenging task. Here we introduce an EP estimator, called multidimensional entropic bound (MEB), utilizing an ensemble of trajectories. The MEB can accurately estimate the EP of overdamped Langevin systems with an arbitrary time-dependent protocol. Moreover, it provides a unified platform to accurately estimate the EP of underdamped Langevin systems under certain conditions. In addition, the MEB is computationally efficient because optimization is unnecessary. We apply our developed estimator to three physical systems driven by time-dependent protocols pertaining to experiments using optical tweezers: A dragged Brownian particle, a pulling process of a harmonic chain, and an unfolding process of an RNA hairpin. Numerical simulations confirm the validity and efficiency of our method.
@article{PhysRevResearch.5.013194, title = {Multidimensional entropic bound: Estimator of entropy production for Langevin dynamics with an arbitrary time-dependent protocol}, author = {Lee, Sangyun and Kim, Dong-Kyum and Park, Jong-Min and Kim, Won Kyu and Park, Hyunggyu and Lee, Jae Sung}, journal = {Phys. Rev. Research}, volume = {5}, issue = {1}, pages = {013194}, numpages = {14}, year = {2023}, month = mar, publisher = {American Physical Society}, doi = {10.1103/PhysRevResearch.5.013194}, url = {https://link.aps.org/doi/10.1103/PhysRevResearch.5.013194}, } - BigCompNeural Classification of Terrestrial BiomesVyacheslav Shen, Dong-Kyum Kim, Elke Zeller, and Meeyoung ChaIn IEEE International Conference on Big Data and Smart Computing Mar 2023
Predicting vegetation changes under climate change is crucial because it will alter the distribution of different plants and have repercussions for ecosystems. To detect changes in vegetation, we employ biome classification that assigns vegetation distributions to specific biomes. Conventional methods have used empirical formulas or simple vegetation models. Based on previous research that showed the use of convolutional neural networks (CNN), this work employs multiple deep models to classify biomes with the goal of predicting future changes. Experiments over multiple datasets demonstrate that Transformer models can be a suitable alternative to the CNN model. In addition, we observe that the use of additional climate variables helps improve the prediction accuracy without overfitting the data, which previous studies have not considered. We discuss the future directions of machine learning for biome classification as a complement to traditional biome classification methods.
@inproceedings{shen2023biome, author = {Shen, Vyacheslav and Kim, Dong-Kyum and Zeller, Elke and Cha, Meeyoung}, title = {Neural Classification of Terrestrial Biomes}, year = {2023}, booktitle = {IEEE International Conference on Big Data and Smart Computing}, } - NeurIPS-WTransformer needs NMDA receptor nonlinearity for long-term memoryDong-Kyum Kim, Jea Kwon, Meeyoung Cha, and C. Justin LeeIn NeurIPS 2022 Memory in Artificial and Real Intelligence workshop Mar 2022
The NMDA receptor (NMDAR) in the hippocampus is essential for learning and memory. We find an interesting resemblance between deep models’ nonlinear activation function and the NMDAR’s nonlinear dynamics. In light of a recent study that compared the transformer architecture to the formation of hippocampal memory, this paper presents new findings that NMDAR-like nonlinearity may be essential for consolidating short-term working memory into long-term reference memory. We design a navigation task assessing these two memory functions and show that manipulating the activation function (i.e., mimicking the Mg^2+-gating of NMDAR) disrupts long-term memory formation. Our experimental data suggest that the concept of place cells and reference memory may reside in the feed-forward network and that nonlinearity plays a key role in these processes. Our findings propose that the transformer architecture and hippocampal spatial representation resemble by sharing the overlapping concept of NMDAR nonlinearity.
- PRREstimating entropy production with odd-parity state variables via machine learningDong-Kyum Kim, Sangyun Lee, and Hawoong JeongPhys. Rev. Research Apr 2022
Entropy production (EP) is a central measure in nonequilibrium thermodynamics, as it can quantify the irreversibility of a process as well as its energy dissipation in special cases. Using the time-reversal asymmetry in a system’s path probability distribution, many methods have been developed to estimate EP from only trajectory data. However, for systems with odd-parity variables that prevail in nonequilibrium systems, EP estimation via machine learning has not been covered. In this study, we develop a machine-learning method for estimating the EP in a stochastic system with odd-parity variables through multiple neural networks, which enables us to measure EP with only trajectory data and parity information. We demonstrate our method with two systems, an underdamped bead-spring model and a one-particle odd-parity Markov jump process.
@article{PhysRevResearch.4.023051, title = {Estimating entropy production with odd-parity state variables via machine learning}, author = {Kim, Dong-Kyum and Lee, Sangyun and Jeong, Hawoong}, journal = {Phys. Rev. Research}, volume = {4}, issue = {2}, pages = {023051}, numpages = {7}, year = {2022}, month = apr, publisher = {American Physical Society}, doi = {10.1103/PhysRevResearch.4.023051}, } - PRRInferring dissipation maps from videos using convolutional neural networksYoungkyoung Bae, Dong-Kyum Kim, and Hawoong JeongPhys. Rev. Research Aug 2022
In the study of living organisms at mesoscopic scales, attaining a measure of dissipation or entropy production (EP) is essential to gain an understanding of their nonequilibrium dynamics. However, when tracking the relevant variables is impractical, it is challenging to figure out where and to what extent dissipation occurs from recorded time-series images from experiments. In this paper we develop an estimator that can, without detailed knowledge of the given systems, quantify the stochastic EP and produce a spatiotemporal pattern of the EP (or dissipation map) from videos through an unsupervised learning algorithm. Applying a convolutional neural network (CNN), our estimator allows us to visualize where the dissipation occurs as well as its time evolution in a video by looking at an attention map of the CNN’s last layer. We demonstrate that our estimator accurately measures the stochastic EP and provides a locally heterogeneous dissipation map, which is mainly concentrated in the origins of a nonequilibrium state, from generated Brownian videos of various models. We further confirm high performance even with noisy, low-spatial-resolution data and partially observed situations. Our method will provide a practical way to obtain dissipation maps and ultimately contribute to uncovering the source and the dissipation mechanisms of complex nonequilibrium phenomena.
@article{PhysRevResearch.4.033094, title = {Inferring dissipation maps from videos using convolutional neural networks}, author = {Bae, Youngkyoung and Kim, Dong-Kyum and Jeong, Hawoong}, journal = {Phys. Rev. Research}, volume = {4}, issue = {3}, pages = {033094}, numpages = {9}, year = {2022}, month = aug, publisher = {American Physical Society}, doi = {10.1103/PhysRevResearch.4.033094}, } - PRRDeep reinforcement learning for feedback control in a collective flashing ratchetDong-Kyum Kim, and Hawoong JeongPhys. Rev. Research Apr 2021
A collective flashing ratchet transports Brownian particles using a spatially periodic, asymmetric, and time-dependent on-off switchable potential. The net current of the particles in this system can be substantially increased by feedback control based on the particle positions. Several feedback policies for maximizing the current have been proposed, but optimal policies have not been found for a moderate number of particles. Here, we use deep reinforcement learning (RL) to find optimal policies, with results showing that policies built with a suitable neural network architecture outperform the previous policies. Moreover, even in a time-delayed feedback situation where the on-off switching of the potential is delayed, we demonstrate that the policies provided by deep RL provide higher currents than the previous strategies.
@article{PhysRevResearch.3.L022002, title = {Deep reinforcement learning for feedback control in a collective flashing ratchet}, author = {Kim, Dong-Kyum and Jeong, Hawoong}, journal = {Phys. Rev. Research}, volume = {3}, issue = {2}, pages = {L022002}, numpages = {6}, year = {2021}, month = apr, publisher = {American Physical Society}, doi = {10.1103/PhysRevResearch.3.L022002}, } - PRLLearning Entropy Production via Neural NetworksDong-Kyum Kim, Youngkyoung Bae, Sangyun Lee, and Hawoong JeongPhys. Rev. Lett. Oct 2020
This Letter presents a neural estimator for entropy production (NEEP), that estimates entropy production (EP) from trajectories of relevant variables without detailed information on the system dynamics. For steady state, we rigorously prove that the estimator, which can be built up from different choices of deep neural networks, provides stochastic EP by optimizing the objective function proposed here. We verify the NEEP with the stochastic processes of the bead spring and discrete flashing ratchet models and also demonstrate that our method is applicable to high-dimensional data and can provide coarse-grained EP for Markov systems with unobservable states.
@article{kim2020neep, title = {Learning Entropy Production via Neural Networks}, author = {Kim, Dong-Kyum and Bae, Youngkyoung and Lee, Sangyun and Jeong, Hawoong}, journal = {Phys. Rev. Lett.}, volume = {125}, issue = {14}, pages = {140604}, numpages = {6}, year = {2020}, month = oct, publisher = {American Physical Society}, doi = {10.1103/PhysRevLett.125.140604}, } - JKPSMulti-label classification of historical documents by using hierarchical attention networksDong-Kyum Kim, Byunghwee Lee, Daniel Kim, and Hawoong JeongJournal of the Korean Physical Society Oct 2020
The quantitative analysis of digitized historical documents has begun in earnest in recent years. Text classification is of particular importance for quantitative historical analysis because it helps to search literature efficiently and to determine the important subjects of a particular age. While numerous historians have joined together to classify large-scale historical documents, consistent classification among individual researchers has not been achieved. In this study, we present a classification method for large-scale historical data that uses a recently developed supervised learning algorithm called the Hierarchical Attention Network (HAN). By applying various classification methods to the Annals of the Joseon Dynasty (AJD), we show that HAN is more accurate than conventional techniques with word-frequency-based features. HAN provides the extent that a particular sentence or word contributes to the classification process through a quantitative value called ’attention’. We extract the representative keywords from various categories by using the attention mechanism and show the evolution of the keywords over the 472-year span of the AJD. Our results reveal that largely two groups of event categories are found in the AJD. In one group, the representative keywords of the categories were stable over long periods while the keywords in the other group varied rapidly, exhibiting repeatedly changing characteristics of the categories. Observing such macroscopic changes of representative words may provide insight into how a particular topic changes over a historical period.
@article{kim2020multi, title = {Multi-label classification of historical documents by using hierarchical attention networks}, author = {Kim, Dong-Kyum and Lee, Byunghwee and Kim, Daniel and Jeong, Hawoong}, journal = {Journal of the Korean Physical Society}, volume = {76}, number = {5}, pages = {368--377}, year = {2020}, publisher = {Springer}, }