Apple M4 pro vs M1 학습 속도 비교(GPU, CPU)

개요

Apple M4 Pro 칩(14코어 CPU, 20코어 GPU, 16코어 Neural Engine) 을 장착한 Macbook Pro 14 에서 tensorflow 로 구현한 ResNet 과 VGG16 의 CPU 와 GPU 의 학습 속도를 비교해 보고 M1 MacMini 와의 차이는 어느 정도인지 확인한다.

살펴보기

MPS (Metal Performance Shaders)

MPS (Metal Performance Shaders)는 Apple의 GPU 가속 프레임워크인 Metal API를 기반으로 한 고성능 컴퓨팅 라이브러리입니다. 주로 Apple Silicon(M1, M2, M3, M4 등)과 macOS에서 머신 러닝 및 그래픽 연산을 가속화하기 위해 사용됩니다.

MPS의 주요 특징

1. Apple의 GPU 최적화 프레임워크

• MPS는 Apple Silicon 및 Metal GPU에 최적화되어 GPU의 성능을 최대한 활용합니다.

2. 머신러닝 및 그래픽 처리 가속화

• 텐서 연산, 행렬 곱셈, 합성곱 등 다양한 머신러닝 연산을 GPU에서 효율적으로 처리합니다.

3. Metal API 기반

• Metal은 Apple의 저수준 그래픽 API로, CPU와 GPU 간의 통신 오버헤드를 최소화합니다.

4. TensorFlow 및 PyTorch 지원

• TensorFlow와 PyTorch는 MPS를 통해 GPU 가속을 지원합니다.

• TensorFlow: tensorflow-metal

• PyTorch: torch.backends.mps

5. macOS에서 네이티브 지원

• macOS 및 iOS 환경에서 Apple Silicon을 위한 네이티브 GPU 지원을 제공합니다.

MPS의 장단점

✅ 장점:

1. Apple Silicon 최적화: GPU 성능을 극대화.

2. 낮은 오버헤드: Metal API 기반으로 효율적인 데이터 전송.

3. TensorFlow 및 PyTorch 지원: 기존 머신러닝 프레임워크와의 호환성.

❌ 단점:

1. 최적화 한계: NVIDIA CUDA와 비교하면 아직 최적화 수준이 부족함.

2. 호환성 문제: 일부 TensorFlow 연산이 완전히 지원되지 않을 수 있음.

3. 소규모 커뮤니티: CUDA 대비 사용자 및 문서화가 부족함.

설치된 Tensorflow 및 metal 버전

(tf) % pip list | grep tensorflow
tensorflow-addons             0.18.0
tensorflow-datasets           4.6.0
tensorflow-estimator          2.12.0
tensorflow-hub                0.12.0
tensorflow-io-gcs-filesystem  0.37.1
tensorflow-macos              2.12.0
tensorflow-metadata           1.10.0
tensorflow-metal              0.8.0
tensorflow-model-optimization 0.7.3
(tf) % pip list | grep metal
tensorflow-metal              0.8.0

코드 구현

import tensorflow as tf

# TensorFlow 버전 확인
print("✅ TensorFlow Version:", tf.__version__)

# GPU 디바이스 확인
print("✅ GPU Devices:", tf.config.list_physical_devices('GPU'))

✅ TensorFlow Version: 2.12.0
✅ GPU Devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

# GPU에서 연산 수행
with tf.device('/GPU:0'):
    a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
    b = tf.constant([[1.0, 1.0], [0.0, 1.0]])
    result = tf.matmul(a, b)
    print("✅ GPU (MPS) 연산 결과:", result)

Metal device set to: Apple M4 Pro

systemMemory: 24.00 GB
maxCacheSize: 8.00 GB

✅ GPU (MPS) 연산 결과: tf.Tensor(
[[1. 3.]
 [3. 7.]], shape=(2, 2), dtype=float32)

import time

def benchmark(device, size):
    with tf.device(device):
        x = tf.random.normal([size, size])
        y = tf.random.normal([size, size])
        start_time = time.time()
        result = tf.matmul(x, y)
        print(f"{device} Time for {size}x{size}: {time.time() - start_time:.4f} sec")

print("🚀 Benchmarking CPU with large matrix...")
benchmark('/CPU:0', 20000)

print("🚀 Benchmarking GPU (MPS) with large matrix...")
benchmark('/GPU:0', 20000)

🚀 Benchmarking CPU with large matrix...
/CPU:0 Time for 20000x20000: 13.3907 sec
🚀 Benchmarking GPU (MPS) with large matrix...
/GPU:0 Time for 20000x20000: 0.0357 sec

# GPU 캐시 초기화
import gc
tf.keras.backend.clear_session()
gc.collect()

ResNet 모델 및 학습 구현

from tensorflow.keras import layers, models, optimizers, losses, datasets
import matplotlib.pyplot as plt
import pandas as pd

# ✅ 디바이스 설정
DEVICE_CPU = '/CPU:0'
DEVICE_GPU = '/GPU:0' if tf.config.list_physical_devices('GPU') else '/CPU:0'

# ✅ 데이터셋 로드
def load_mnist():
    (x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    x_train = x_train[..., tf.newaxis]
    x_test = x_test[..., tf.newaxis]
    return (x_train, y_train), (x_test, y_test)

# ✅ ResNet 모델 정의
def create_resnet_model():
    inputs = layers.Input(shape=(28, 28, 1))
    x = layers.Conv2D(32, 3, activation='relu', padding='same')(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPooling2D()(x)

    for _ in range(2):
        shortcut = x
        x = layers.Conv2D(32, 3, activation='relu', padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Conv2D(32, 3, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Add()([x, shortcut])
        x = layers.Activation('relu')(x)

    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(10, activation='softmax')(x)

    model = models.Model(inputs, outputs)
    model.compile(optimizer=optimizers.Adam(0.001),
                  loss=losses.SparseCategoricalCrossentropy(),
                  metrics=['accuracy'])
    return model

# ✅ 학습 함수
def train_model(device, x_train, y_train, x_test, y_test, epochs=5):
    with tf.device(device):
        model = create_resnet_model()
        start_time = time.time()
        history = model.fit(x_train, y_train, epochs=epochs, validation_data=(x_test, y_test), verbose=1)
        training_time = time.time() - start_time

    test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
    return history, training_time, test_loss, test_accuracy

# ✅ 데이터 로드
(x_train, y_train), (x_test, y_test) = load_mnist()

# ✅ CPU 학습
print("\n🚀 Training on CPU...")
history_cpu, cpu_time, cpu_loss, cpu_accuracy = train_model(DEVICE_CPU, x_train, y_train, x_test, y_test)

# ✅ GPU 학습
print("\n🚀 Training on GPU (MPS)...")
history_gpu, gpu_time, gpu_loss, gpu_accuracy = train_model(DEVICE_GPU, x_train, y_train, x_test, y_test)

# ✅ 결과 비교
results = {
    'Environment': ['CPU', 'GPU (MPS)'],
    'Training Time (s)': [cpu_time, gpu_time],
    'Final Loss': [cpu_loss, gpu_loss],
    'Final Accuracy': [cpu_accuracy, gpu_accuracy]
}

df_results = pd.DataFrame(results)
print("\n📊 Final Results:")
print(df_results)

# ✅ 시각화
epochs = range(1, len(history_cpu.history['loss']) + 1)

plt.figure(figsize=(12, 5))

# 📈 손실 비교
plt.subplot(1, 2, 1)
plt.plot(epochs, history_cpu.history['loss'], label='CPU Loss')
plt.plot(epochs, history_gpu.history['loss'], label='GPU Loss')
plt.title('Training Loss Comparison')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# 📊 정확도 비교
plt.subplot(1, 2, 2)
plt.plot(epochs, history_cpu.history['accuracy'], label='CPU Accuracy')
plt.plot(epochs, history_gpu.history['accuracy'], label='GPU Accuracy')
plt.title('Training Accuracy Comparison')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()
plt.show()

# ✅ 결과 요약 출력
from tabulate import tabulate

print("\n📊 Final Results Table:")
print(tabulate(df_results, headers='keys', tablefmt='grid'))
print(f"Speedup (GPU over CPU): {cpu_time / gpu_time:.2f}x")

M4 pro 결과

WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.

🚀 Training on CPU...
Epoch 1/5
2025-01-03 21:28:13.375533: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
1875/1875 [==============================] - 22s 12ms/step - loss: 0.2466 - accuracy: 0.9476 - val_loss: 0.1070 - val_accuracy: 0.9679
Epoch 2/5
1875/1875 [==============================] - 26s 14ms/step - loss: 0.0606 - accuracy: 0.9837 - val_loss: 0.0547 - val_accuracy: 0.9859
Epoch 3/5
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0440 - accuracy: 0.9879 - val_loss: 0.0510 - val_accuracy: 0.9849
Epoch 4/5
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0368 - accuracy: 0.9891 - val_loss: 0.1195 - val_accuracy: 0.9619
Epoch 5/5
1875/1875 [==============================] - 28s 15ms/step - loss: 0.0323 - accuracy: 0.9904 - val_loss: 0.1020 - val_accuracy: 0.9667

🚀 Training on GPU (MPS)...
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
Epoch 1/5
1875/1875 [==============================] - 14s 7ms/step - loss: 0.2340 - accuracy: 0.9531 - val_loss: 0.1176 - val_accuracy: 0.9687
Epoch 2/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.0602 - accuracy: 0.9838 - val_loss: 0.0705 - val_accuracy: 0.9805
Epoch 3/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.0444 - accuracy: 0.9871 - val_loss: 0.1368 - val_accuracy: 0.9596
Epoch 4/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.0370 - accuracy: 0.9887 - val_loss: 0.0362 - val_accuracy: 0.9883
Epoch 5/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.0322 - accuracy: 0.9901 - val_loss: 0.0778 - val_accuracy: 0.9757

📊 Final Results:
  Environment  Training Time (s)  Final Loss  Final Accuracy
0         CPU         135.716913    0.102032          0.9667
1   GPU (MPS)          65.438438    0.077789          0.9757

📊 Final Results Table:
+----+---------------+---------------------+--------------+------------------+
|    | Environment   |   Training Time (s) |   Final Loss |   Final Accuracy |
+====+===============+=====================+==============+==================+
|  0 | CPU           |            135.717  |    0.102032  |           0.9667 |
+----+---------------+---------------------+--------------+------------------+
|  1 | GPU (MPS)     |             65.4384 |    0.0777885 |           0.9757 |
+----+---------------+---------------------+--------------+------------------+
Speedup (GPU over CPU): 2.07x

M1 결과

WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.

🚀 Training on CPU...
Epoch 1/5
2025-01-03 21:28:31.091589: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
1875/1875 [==============================] - 36s 19ms/step - loss: 0.2559 - accuracy: 0.9476 - val_loss: 0.2412 - val_accuracy: 0.9204
Epoch 2/5
1875/1875 [==============================] - 37s 19ms/step - loss: 0.0616 - accuracy: 0.9835 - val_loss: 0.1210 - val_accuracy: 0.9618
Epoch 3/5
1875/1875 [==============================] - 40s 21ms/step - loss: 0.0467 - accuracy: 0.9868 - val_loss: 0.0411 - val_accuracy: 0.9873
Epoch 4/5
1875/1875 [==============================] - 38s 20ms/step - loss: 0.0376 - accuracy: 0.9888 - val_loss: 0.0423 - val_accuracy: 0.9877
Epoch 5/5
1875/1875 [==============================] - 39s 21ms/step - loss: 0.0321 - accuracy: 0.9905 - val_loss: 0.0593 - val_accuracy: 0.9823
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.

🚀 Training on GPU (MPS)...
Epoch 1/5
1875/1875 [==============================] - 19s 10ms/step - loss: 0.2394 - accuracy: 0.9498 - val_loss: 0.0988 - val_accuracy: 0.9729
Epoch 2/5
1875/1875 [==============================] - 19s 10ms/step - loss: 0.0594 - accuracy: 0.9842 - val_loss: 0.0881 - val_accuracy: 0.9726
Epoch 3/5
1875/1875 [==============================] - 18s 10ms/step - loss: 0.0429 - accuracy: 0.9875 - val_loss: 0.0700 - val_accuracy: 0.9775
Epoch 4/5
1875/1875 [==============================] - 18s 10ms/step - loss: 0.0354 - accuracy: 0.9897 - val_loss: 0.0606 - val_accuracy: 0.9809
Epoch 5/5
1875/1875 [==============================] - 18s 10ms/step - loss: 0.0321 - accuracy: 0.9904 - val_loss: 0.0706 - val_accuracy: 0.9776

📊 Final Results:
  Environment  Training Time (s)  Final Loss  Final Accuracy
0         CPU         189.489760    0.059286          0.9823
1   GPU (MPS)          93.482331    0.070639          0.9776

📊 Final Results Table:
+----+---------------+---------------------+--------------+------------------+
|    | Environment   |   Training Time (s) |   Final Loss |   Final Accuracy |
+====+===============+=====================+==============+==================+
|  0 | CPU           |            189.49   |    0.0592865 |           0.9823 |
+----+---------------+---------------------+--------------+------------------+
|  1 | GPU (MPS)     |             93.4823 |    0.0706386 |           0.9776 |
+----+---------------+---------------------+--------------+------------------+
Speedup (GPU over CPU): 2.03x

VGG16 모델 및 학습 구현

# ✅ VGG16 모델 정의
def create_vgg16_model():
    model = models.Sequential([
        layers.Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D(pool_size=(2, 2)),

        layers.Flatten(),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ])

    model.compile(
        optimizer=optimizers.Adam(0.001),
        loss=losses.SparseCategoricalCrossentropy(),
        metrics=['accuracy']
    )
    return model

# ✅ 학습 함수
def train_model(device, x_train, y_train, x_test, y_test, epochs=5):
    with tf.device(device):
        model = create_vgg16_model()
        start_time = time.time()
        history = model.fit(x_train, y_train, epochs=epochs, validation_data=(x_test, y_test), verbose=1)
        training_time = time.time() - start_time

    test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
    return history, training_time, test_loss, test_accuracy

# ✅ 데이터 로드
(x_train, y_train), (x_test, y_test) = load_mnist()

# ✅ CPU 학습
print("\n🚀 Training on CPU...")
history_cpu, cpu_time, cpu_loss, cpu_accuracy = train_model(DEVICE_CPU, x_train, y_train, x_test, y_test)

# ✅ GPU 학습
print("\n🚀 Training on GPU (MPS)...")
history_gpu, gpu_time, gpu_loss, gpu_accuracy = train_model(DEVICE_GPU, x_train, y_train, x_test, y_test)

# ✅ 결과 비교
results = {
    'Environment': ['CPU', 'GPU (MPS)'],
    'Training Time (s)': [cpu_time, gpu_time],
    'Final Loss': [cpu_loss, gpu_loss],
    'Final Accuracy': [cpu_accuracy, gpu_accuracy]
}

df_results = pd.DataFrame(results)
print("\n📊 Final Results:")
print(df_results)

# ✅ 시각화
epochs = range(1, len(history_cpu.history['loss']) + 1)

plt.figure(figsize=(12, 5))

# 📈 손실 비교
plt.subplot(1, 2, 1)
plt.plot(epochs, history_cpu.history['loss'], label='CPU Loss')
plt.plot(epochs, history_gpu.history['loss'], label='GPU Loss')
plt.title('Training Loss Comparison')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# 📊 정확도 비교
plt.subplot(1, 2, 2)
plt.plot(epochs, history_cpu.history['accuracy'], label='CPU Accuracy')
plt.plot(epochs, history_gpu.history['accuracy'], label='GPU Accuracy')
plt.title('Training Accuracy Comparison')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()
plt.show()

# ✅ 결과 요약 출력
from tabulate import tabulate

print("\n📊 Final Results Table:")
print(tabulate(df_results, headers='keys', tablefmt='grid'))
# ✅ 결과 비교
print("\n📊 Final Results:")
print(f"Speedup (GPU over CPU): {cpu_time / gpu_time:.2f}x")

M4 pro 결과

🚀 Training on CPU...
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
Epoch 1/5
1875/1875 [==============================] - 106s 56ms/step - loss: 0.2618 - accuracy: 0.9158 - val_loss: 0.0540 - val_accuracy: 0.9853
Epoch 2/5
1875/1875 [==============================] - 105s 56ms/step - loss: 0.0712 - accuracy: 0.9819 - val_loss: 0.0397 - val_accuracy: 0.9894
Epoch 3/5
1875/1875 [==============================] - 105s 56ms/step - loss: 0.0566 - accuracy: 0.9859 - val_loss: 0.0331 - val_accuracy: 0.9909
Epoch 4/5
1875/1875 [==============================] - 106s 56ms/step - loss: 0.0480 - accuracy: 0.9879 - val_loss: 0.0343 - val_accuracy: 0.9908
Epoch 5/5
1875/1875 [==============================] - 105s 56ms/step - loss: 0.0448 - accuracy: 0.9889 - val_loss: 0.0531 - val_accuracy: 0.9884
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.

🚀 Training on GPU (MPS)...
Epoch 1/5
1875/1875 [==============================] - 20s 10ms/step - loss: 2.3020 - accuracy: 0.1118 - val_loss: 2.3012 - val_accuracy: 0.1135
Epoch 2/5
1875/1875 [==============================] - 18s 10ms/step - loss: 2.3015 - accuracy: 0.1124 - val_loss: 2.3011 - val_accuracy: 0.1135
Epoch 3/5
1875/1875 [==============================] - 17s 9ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3012 - val_accuracy: 0.1135
Epoch 4/5
1875/1875 [==============================] - 17s 9ms/step - loss: 2.3013 - accuracy: 0.1124 - val_loss: 2.3011 - val_accuracy: 0.1135
Epoch 5/5
1875/1875 [==============================] - 17s 9ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3010 - val_accuracy: 0.1135

📊 Final Results:
  Environment  Training Time (s)  Final Loss  Final Accuracy
0         CPU         527.098848    0.053075          0.9884
1   GPU (MPS)          89.892101    2.301009          0.1135

📊 Final Results Table:
+----+---------------+---------------------+--------------+------------------+
|    | Environment   |   Training Time (s) |   Final Loss |   Final Accuracy |
+====+===============+=====================+==============+==================+
|  0 | CPU           |            527.099  |    0.0530755 |           0.9884 |
+----+---------------+---------------------+--------------+------------------+
|  1 | GPU (MPS)     |             89.8921 |    2.30101   |           0.1135 |
+----+---------------+---------------------+--------------+------------------+

📊 Final Results:
Speedup (GPU over CPU): 5.86x

M1 결과

WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.

🚀 Training on CPU...
Epoch 1/5
1875/1875 [==============================] - 252s 134ms/step - loss: 0.2832 - accuracy: 0.9070 - val_loss: 0.0618 - val_accuracy: 0.9843
Epoch 2/5
1875/1875 [==============================] - 251s 134ms/step - loss: 0.0705 - accuracy: 0.9818 - val_loss: 0.0448 - val_accuracy: 0.9871
Epoch 3/5
1875/1875 [==============================] - 250s 133ms/step - loss: 0.0544 - accuracy: 0.9865 - val_loss: 0.0426 - val_accuracy: 0.9887
Epoch 4/5
1875/1875 [==============================] - 250s 133ms/step - loss: 0.0464 - accuracy: 0.9884 - val_loss: 0.0325 - val_accuracy: 0.9913
Epoch 5/5
1875/1875 [==============================] - 251s 134ms/step - loss: 0.0423 - accuracy: 0.9900 - val_loss: 0.0413 - val_accuracy: 0.9902
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.

🚀 Training on GPU (MPS)...
Epoch 1/5
1875/1875 [==============================] - 55s 29ms/step - loss: 2.3020 - accuracy: 0.1105 - val_loss: 2.3011 - val_accuracy: 0.1135
Epoch 2/5
1875/1875 [==============================] - 55s 29ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3013 - val_accuracy: 0.1135
Epoch 3/5
1875/1875 [==============================] - 55s 29ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3010 - val_accuracy: 0.1135
Epoch 4/5
1875/1875 [==============================] - 54s 29ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3009 - val_accuracy: 0.1135
Epoch 5/5
1875/1875 [==============================] - 54s 29ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3011 - val_accuracy: 0.1135

📊 Final Results:
  Environment  Training Time (s)  Final Loss  Final Accuracy
0         CPU        1254.501517    0.041344          0.9902
1   GPU (MPS)         273.759162    2.301116          0.1135

📊 Final Results Table:
+----+---------------+---------------------+--------------+------------------+
|    | Environment   |   Training Time (s) |   Final Loss |   Final Accuracy |
+====+===============+=====================+==============+==================+
|  0 | CPU           |            1254.5   |     0.041344 |           0.9902 |
+----+---------------+---------------------+--------------+------------------+
|  1 | GPU (MPS)     |             273.759 |     2.30112  |           0.1135 |
+----+---------------+---------------------+--------------+------------------+

📊 Final Results:
Speedup (GPU over CPU): 4.58x

결과 요약

📊 M4 Pro vs M1 성능 차이 (ResNet 학습 속도)
	1. CPU 학습 시간 비교:
		• M4 Pro가 M1보다 약 28.38% 더 빠름
	2. GPU 학습 시간 비교:
		• M4 Pro가 M1보다 약 30.00% 더 빠름

📊 M4 Pro vs M1 성능 차이 (VGG16 학습 속도)
	1. CPU 학습 시간 비교:
		• M4 Pro가 M1보다 약 58.0% 더 빠름
	2. GPU 학습 시간 비교:
		• M4 Pro가 M1보다 약 67.2% 더 빠름

저작자표시 비영리 변경금지 (새창열림)

'AI' 카테고리의 다른 글

Temporal Fusion Transformers 활용한 보행 행동 예측 아이디어 (1)	2025.03.24
Apple M4 pro chip 에서 keras 의 Stable Diffusion 모델 사용하기 (6)	2025.01.07
Residual Network 구현 및 학습 (2)	2024.11.24
DenseNet 구현 및 학습 (2)	2024.11.22
TensorFlow 함수형 API 로 VGGNet 논문 구현 (2)	2024.11.19

GUGA 의 Time machine

Apple M4 pro vs M1 학습 속도 비교(GPU, CPU)

개요

살펴보기

설치된 Tensorflow 및 metal 버전

코드 구현

ResNet 모델 및 학습 구현

VGG16 모델 및 학습 구현

결과 요약

'AI' 카테고리의 다른 글

티스토리툴바

Apple M4 pro vs M1 학습 속도 비교(GPU, CPU)

개요

살펴보기

설치된 Tensorflow 및 metal 버전

코드 구현

ResNet 모델 및 학습 구현

VGG16 모델 및 학습 구현

결과 요약

'AI' 카테고리의 다른 글

'AI' Related Articles

티스토리툴바

설치된 Tensorflow 및 metal 버전