Deep neural networks (DNNs) increasingly rely on parallel structures to improve performance and efficiency. However, current machine learning compilers (MLCs) face challenges in optimizing these structures due to limited parallel fusion scope and insufficient consideration of intra-operator information. This paper presents Magneto, a novel framework that accelerates parallel structures in DNNs through the co-optimization of parallel operators. By expanding the scope of parallel operator fusion and introducing a dedicated co-tuning algorithm, Magneto unlocks new opportunities for co-optimization. Our approach addresses the unique challenges inherent in optimizing parallel structures, enabling significant performance improvements across various hardware platforms. Experimental results show that Magneto outperforms state-of-the-art NVIDIA TensorRT and AMD MIGraphX, achieving speedups of 6.30× and 8.69×, respectively, on RNNs. For multi-branch CNNs and MOEs, Magneto consistently delivers speedups of up to 1.89× over TensorRT and 2.39× over MIGraphX.