It is difficult to provide a definitive answer as research in this area is ongoing, and the performance of models can be highly dependent on the specific task, data, and evaluation metrics. However, some recent studies have proposed effective LSTM-based approaches for time series forecasting tasks.
In 2017, the paper "DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks" introduced a novel LSTM-based architecture called DeepAR, which combines a recurrent neural network (RNN) with an autoregressive model to improve the accuracy and uncertainty estimates in time series forecasting. This approach achieved state-of-the-art results on a range of time series datasets, and you can find some implemetnations online for the Airline dataset.
Another study proposed an LSTM-based approach called "Multi-Horizon Time Series Forecasting with Hierarchical Attention Recurrent Neural Networks" (HARNet). HARNet uses a hierarchical attention mechanism to selectively attend to relevant temporal features and has achieved good results on several benchmark time series datasets, including the M4 competition dataset, which is a large-scale forecasting competition that includes various types of time series data.
Regarding why the state-of-the-art solutions for the Airline Passengers dataset may not be as widely reported as those for other datasets like ImageNet, is the fact that time series forecasting is a more specialized compared to image recognition, which is a more general and widely studied area in machine learning. Additionally, the performance of a model on a particular dataset can vary depending on factors such as the size and complexity of the dataset, the quality of the data, and the evaluation metrics used, among others. Some models just do better than others, depending on the problem.
In terms of other well-known time series forecasting datasets that one can benchmark LSTM solutions against, there are several popular benchmarks, including the M4 competition dataset mentioned earlier, as well as the M3 and M5 competitions. These datasets include a diverse range of time series data, including economic and financial indicators, energy demand, and retail sales, among others.