Post-training methods for pre-trained language models (LMs) depend on human supervision through demonstrations or preference feedback to specify desired behaviors. However, this approach faces critical limitations as tasks and model…
Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs
