A hardware accelerator includes a processing core including a plurality of multipliers configured to perform one-dimensional (1D) sub-word parallelization between symbols and mantissas of a first tensor and symbols and mantissas of a second tensor, a first processing device configured to operate in a two-dimensional (2D) mode of operation in which the first tensor and the second tensor are coupled to each other, and a second processing device configured to operate in a two-dimensional (2D) mode of operation in which the first tensor and the second tensor are coupled to each other. And a second processing device configured to operate in a three-dimensional (3D) operation mode in which the calculation results of the plurality of multipliers are accumulated in a channel direction, and then a result of accumulating the calculation results is output.