During compilation, inlining occurs when a call to a function or constructor is directly replaced with the body of the item being called. Good inlining may expose opportunities for further optimizations, reduce the overhead of jumping to functions, decrease cache misses, prevent thrashing, and result in a faster and smaller executable. Too much inlining, however, can cause code bloat, register spilling, and slower execution. Deciding which items to inline is itself NP-hard and reducible to the knapsack problem. This is because benefits or detriments of inlining may not manifest themselves until subsequent passes. In most compilers, the inlining decision is made by a small number of hand-tuned heuristics, which are a “best guess” effort from the compilers’ developers. The decisions made, because of the problem complexity, are not optimal. This proposal will explain my efforts to explore the performance of the Glasgow Haskell Compiler’s inliner by examining real-world data gathered from Hackage where developers have identified poorly inlined code with compiler pragmas. Combining these examples with existing benchmarks, the inlining problem can be formulated as a reinforcement learning task—in which a software agent (the inliner) takes actions (whether or not to inline) in an environment (the compilation pass) in order to maximize a cumulative reward (execution time). For this, deep learning would be well suited. Particularly, genetic algorithms may be a good approach as an alternative to gradient descent methods, as recent work suggests they are robust against delayed rewards, sparse rewards, and local minima; and, they can also be parallelized.