개요
Apple M4 Pro 칩(14코어 CPU, 20코어 GPU, 16코어 Neural Engine) 을 장착한 Macbook Pro 14 에서 tensorflow 로 구현한 ResNet 과 VGG16 의 CPU 와 GPU 의 학습 속도를 비교해 보고 M1 MacMini 와의 차이는 어느 정도인지 확인한다.
살펴보기
MPS (Metal Performance Shaders)
MPS (Metal Performance Shaders)는 Apple의 GPU 가속 프레임워크인 Metal API를 기반으로 한 고성능 컴퓨팅 라이브러리입니다. 주로 Apple Silicon(M1, M2, M3, M4 등)과 macOS에서 머신 러닝 및 그래픽 연산을 가속화하기 위해 사용됩니다.
MPS의 주요 특징
1. Apple의 GPU 최적화 프레임워크
• MPS는 Apple Silicon 및 Metal GPU에 최적화되어 GPU의 성능을 최대한 활용합니다.
2. 머신러닝 및 그래픽 처리 가속화
• 텐서 연산, 행렬 곱셈, 합성곱 등 다양한 머신러닝 연산을 GPU에서 효율적으로 처리합니다.
3. Metal API 기반
• Metal은 Apple의 저수준 그래픽 API로, CPU와 GPU 간의 통신 오버헤드를 최소화합니다.
4. TensorFlow 및 PyTorch 지원
• TensorFlow와 PyTorch는 MPS를 통해 GPU 가속을 지원합니다.
• TensorFlow: tensorflow-metal
• PyTorch: torch.backends.mps
5. macOS에서 네이티브 지원
• macOS 및 iOS 환경에서 Apple Silicon을 위한 네이티브 GPU 지원을 제공합니다.
MPS의 장단점
✅ 장점:
1. Apple Silicon 최적화: GPU 성능을 극대화.
2. 낮은 오버헤드: Metal API 기반으로 효율적인 데이터 전송.
3. TensorFlow 및 PyTorch 지원: 기존 머신러닝 프레임워크와의 호환성.
❌ 단점:
1. 최적화 한계: NVIDIA CUDA와 비교하면 아직 최적화 수준이 부족함.
2. 호환성 문제: 일부 TensorFlow 연산이 완전히 지원되지 않을 수 있음.
3. 소규모 커뮤니티: CUDA 대비 사용자 및 문서화가 부족함.
설치된 Tensorflow 및 metal 버전
(tf) % pip list | grep tensorflow
tensorflow-addons 0.18.0
tensorflow-datasets 4.6.0
tensorflow-estimator 2.12.0
tensorflow-hub 0.12.0
tensorflow-io-gcs-filesystem 0.37.1
tensorflow-macos 2.12.0
tensorflow-metadata 1.10.0
tensorflow-metal 0.8.0
tensorflow-model-optimization 0.7.3
(tf) % pip list | grep metal
tensorflow-metal 0.8.0
코드 구현
import tensorflow as tf
# TensorFlow 버전 확인
print("✅ TensorFlow Version:", tf.__version__)
# GPU 디바이스 확인
print("✅ GPU Devices:", tf.config.list_physical_devices('GPU'))
✅ TensorFlow Version: 2.12.0
✅ GPU Devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
# GPU에서 연산 수행
with tf.device('/GPU:0'):
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[1.0, 1.0], [0.0, 1.0]])
result = tf.matmul(a, b)
print("✅ GPU (MPS) 연산 결과:", result)
Metal device set to: Apple M4 Pro
systemMemory: 24.00 GB
maxCacheSize: 8.00 GB
✅ GPU (MPS) 연산 결과: tf.Tensor(
[[1. 3.]
[3. 7.]], shape=(2, 2), dtype=float32)
import time
def benchmark(device, size):
with tf.device(device):
x = tf.random.normal([size, size])
y = tf.random.normal([size, size])
start_time = time.time()
result = tf.matmul(x, y)
print(f"{device} Time for {size}x{size}: {time.time() - start_time:.4f} sec")
print("🚀 Benchmarking CPU with large matrix...")
benchmark('/CPU:0', 20000)
print("🚀 Benchmarking GPU (MPS) with large matrix...")
benchmark('/GPU:0', 20000)
🚀 Benchmarking CPU with large matrix...
/CPU:0 Time for 20000x20000: 13.3907 sec
🚀 Benchmarking GPU (MPS) with large matrix...
/GPU:0 Time for 20000x20000: 0.0357 sec
# GPU 캐시 초기화
import gc
tf.keras.backend.clear_session()
gc.collect()
ResNet 모델 및 학습 구현
from tensorflow.keras import layers, models, optimizers, losses, datasets
import matplotlib.pyplot as plt
import pandas as pd
# ✅ 디바이스 설정
DEVICE_CPU = '/CPU:0'
DEVICE_GPU = '/GPU:0' if tf.config.list_physical_devices('GPU') else '/CPU:0'
# ✅ 데이터셋 로드
def load_mnist():
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]
return (x_train, y_train), (x_test, y_test)
# ✅ ResNet 모델 정의
def create_resnet_model():
inputs = layers.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation='relu', padding='same')(inputs)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D()(x)
for _ in range(2):
shortcut = x
x = layers.Conv2D(32, 3, activation='relu', padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Conv2D(32, 3, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Add()([x, shortcut])
x = layers.Activation('relu')(x)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = models.Model(inputs, outputs)
model.compile(optimizer=optimizers.Adam(0.001),
loss=losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
return model
# ✅ 학습 함수
def train_model(device, x_train, y_train, x_test, y_test, epochs=5):
with tf.device(device):
model = create_resnet_model()
start_time = time.time()
history = model.fit(x_train, y_train, epochs=epochs, validation_data=(x_test, y_test), verbose=1)
training_time = time.time() - start_time
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
return history, training_time, test_loss, test_accuracy
# ✅ 데이터 로드
(x_train, y_train), (x_test, y_test) = load_mnist()
# ✅ CPU 학습
print("\n🚀 Training on CPU...")
history_cpu, cpu_time, cpu_loss, cpu_accuracy = train_model(DEVICE_CPU, x_train, y_train, x_test, y_test)
# ✅ GPU 학습
print("\n🚀 Training on GPU (MPS)...")
history_gpu, gpu_time, gpu_loss, gpu_accuracy = train_model(DEVICE_GPU, x_train, y_train, x_test, y_test)
# ✅ 결과 비교
results = {
'Environment': ['CPU', 'GPU (MPS)'],
'Training Time (s)': [cpu_time, gpu_time],
'Final Loss': [cpu_loss, gpu_loss],
'Final Accuracy': [cpu_accuracy, gpu_accuracy]
}
df_results = pd.DataFrame(results)
print("\n📊 Final Results:")
print(df_results)
# ✅ 시각화
epochs = range(1, len(history_cpu.history['loss']) + 1)
plt.figure(figsize=(12, 5))
# 📈 손실 비교
plt.subplot(1, 2, 1)
plt.plot(epochs, history_cpu.history['loss'], label='CPU Loss')
plt.plot(epochs, history_gpu.history['loss'], label='GPU Loss')
plt.title('Training Loss Comparison')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
# 📊 정확도 비교
plt.subplot(1, 2, 2)
plt.plot(epochs, history_cpu.history['accuracy'], label='CPU Accuracy')
plt.plot(epochs, history_gpu.history['accuracy'], label='GPU Accuracy')
plt.title('Training Accuracy Comparison')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()
# ✅ 결과 요약 출력
from tabulate import tabulate
print("\n📊 Final Results Table:")
print(tabulate(df_results, headers='keys', tablefmt='grid'))
print(f"Speedup (GPU over CPU): {cpu_time / gpu_time:.2f}x")
M4 pro 결과
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
🚀 Training on CPU...
Epoch 1/5
2025-01-03 21:28:13.375533: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
1875/1875 [==============================] - 22s 12ms/step - loss: 0.2466 - accuracy: 0.9476 - val_loss: 0.1070 - val_accuracy: 0.9679
Epoch 2/5
1875/1875 [==============================] - 26s 14ms/step - loss: 0.0606 - accuracy: 0.9837 - val_loss: 0.0547 - val_accuracy: 0.9859
Epoch 3/5
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0440 - accuracy: 0.9879 - val_loss: 0.0510 - val_accuracy: 0.9849
Epoch 4/5
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0368 - accuracy: 0.9891 - val_loss: 0.1195 - val_accuracy: 0.9619
Epoch 5/5
1875/1875 [==============================] - 28s 15ms/step - loss: 0.0323 - accuracy: 0.9904 - val_loss: 0.1020 - val_accuracy: 0.9667
🚀 Training on GPU (MPS)...
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
Epoch 1/5
1875/1875 [==============================] - 14s 7ms/step - loss: 0.2340 - accuracy: 0.9531 - val_loss: 0.1176 - val_accuracy: 0.9687
Epoch 2/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.0602 - accuracy: 0.9838 - val_loss: 0.0705 - val_accuracy: 0.9805
Epoch 3/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.0444 - accuracy: 0.9871 - val_loss: 0.1368 - val_accuracy: 0.9596
Epoch 4/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.0370 - accuracy: 0.9887 - val_loss: 0.0362 - val_accuracy: 0.9883
Epoch 5/5
1875/1875 [==============================] - 13s 7ms/step - loss: 0.0322 - accuracy: 0.9901 - val_loss: 0.0778 - val_accuracy: 0.9757
📊 Final Results:
Environment Training Time (s) Final Loss Final Accuracy
0 CPU 135.716913 0.102032 0.9667
1 GPU (MPS) 65.438438 0.077789 0.9757
📊 Final Results Table:
+----+---------------+---------------------+--------------+------------------+
| | Environment | Training Time (s) | Final Loss | Final Accuracy |
+====+===============+=====================+==============+==================+
| 0 | CPU | 135.717 | 0.102032 | 0.9667 |
+----+---------------+---------------------+--------------+------------------+
| 1 | GPU (MPS) | 65.4384 | 0.0777885 | 0.9757 |
+----+---------------+---------------------+--------------+------------------+
Speedup (GPU over CPU): 2.07x
M1 결과
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
🚀 Training on CPU...
Epoch 1/5
2025-01-03 21:28:31.091589: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
1875/1875 [==============================] - 36s 19ms/step - loss: 0.2559 - accuracy: 0.9476 - val_loss: 0.2412 - val_accuracy: 0.9204
Epoch 2/5
1875/1875 [==============================] - 37s 19ms/step - loss: 0.0616 - accuracy: 0.9835 - val_loss: 0.1210 - val_accuracy: 0.9618
Epoch 3/5
1875/1875 [==============================] - 40s 21ms/step - loss: 0.0467 - accuracy: 0.9868 - val_loss: 0.0411 - val_accuracy: 0.9873
Epoch 4/5
1875/1875 [==============================] - 38s 20ms/step - loss: 0.0376 - accuracy: 0.9888 - val_loss: 0.0423 - val_accuracy: 0.9877
Epoch 5/5
1875/1875 [==============================] - 39s 21ms/step - loss: 0.0321 - accuracy: 0.9905 - val_loss: 0.0593 - val_accuracy: 0.9823
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
🚀 Training on GPU (MPS)...
Epoch 1/5
1875/1875 [==============================] - 19s 10ms/step - loss: 0.2394 - accuracy: 0.9498 - val_loss: 0.0988 - val_accuracy: 0.9729
Epoch 2/5
1875/1875 [==============================] - 19s 10ms/step - loss: 0.0594 - accuracy: 0.9842 - val_loss: 0.0881 - val_accuracy: 0.9726
Epoch 3/5
1875/1875 [==============================] - 18s 10ms/step - loss: 0.0429 - accuracy: 0.9875 - val_loss: 0.0700 - val_accuracy: 0.9775
Epoch 4/5
1875/1875 [==============================] - 18s 10ms/step - loss: 0.0354 - accuracy: 0.9897 - val_loss: 0.0606 - val_accuracy: 0.9809
Epoch 5/5
1875/1875 [==============================] - 18s 10ms/step - loss: 0.0321 - accuracy: 0.9904 - val_loss: 0.0706 - val_accuracy: 0.9776
📊 Final Results:
Environment Training Time (s) Final Loss Final Accuracy
0 CPU 189.489760 0.059286 0.9823
1 GPU (MPS) 93.482331 0.070639 0.9776
📊 Final Results Table:
+----+---------------+---------------------+--------------+------------------+
| | Environment | Training Time (s) | Final Loss | Final Accuracy |
+====+===============+=====================+==============+==================+
| 0 | CPU | 189.49 | 0.0592865 | 0.9823 |
+----+---------------+---------------------+--------------+------------------+
| 1 | GPU (MPS) | 93.4823 | 0.0706386 | 0.9776 |
+----+---------------+---------------------+--------------+------------------+
Speedup (GPU over CPU): 2.03x
VGG16 모델 및 학습 구현
# ✅ VGG16 모델 정의
def create_vgg16_model():
model = models.Sequential([
layers.Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(10, activation='softmax')
])
model.compile(
optimizer=optimizers.Adam(0.001),
loss=losses.SparseCategoricalCrossentropy(),
metrics=['accuracy']
)
return model
# ✅ 학습 함수
def train_model(device, x_train, y_train, x_test, y_test, epochs=5):
with tf.device(device):
model = create_vgg16_model()
start_time = time.time()
history = model.fit(x_train, y_train, epochs=epochs, validation_data=(x_test, y_test), verbose=1)
training_time = time.time() - start_time
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
return history, training_time, test_loss, test_accuracy
# ✅ 데이터 로드
(x_train, y_train), (x_test, y_test) = load_mnist()
# ✅ CPU 학습
print("\n🚀 Training on CPU...")
history_cpu, cpu_time, cpu_loss, cpu_accuracy = train_model(DEVICE_CPU, x_train, y_train, x_test, y_test)
# ✅ GPU 학습
print("\n🚀 Training on GPU (MPS)...")
history_gpu, gpu_time, gpu_loss, gpu_accuracy = train_model(DEVICE_GPU, x_train, y_train, x_test, y_test)
# ✅ 결과 비교
results = {
'Environment': ['CPU', 'GPU (MPS)'],
'Training Time (s)': [cpu_time, gpu_time],
'Final Loss': [cpu_loss, gpu_loss],
'Final Accuracy': [cpu_accuracy, gpu_accuracy]
}
df_results = pd.DataFrame(results)
print("\n📊 Final Results:")
print(df_results)
# ✅ 시각화
epochs = range(1, len(history_cpu.history['loss']) + 1)
plt.figure(figsize=(12, 5))
# 📈 손실 비교
plt.subplot(1, 2, 1)
plt.plot(epochs, history_cpu.history['loss'], label='CPU Loss')
plt.plot(epochs, history_gpu.history['loss'], label='GPU Loss')
plt.title('Training Loss Comparison')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
# 📊 정확도 비교
plt.subplot(1, 2, 2)
plt.plot(epochs, history_cpu.history['accuracy'], label='CPU Accuracy')
plt.plot(epochs, history_gpu.history['accuracy'], label='GPU Accuracy')
plt.title('Training Accuracy Comparison')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()
# ✅ 결과 요약 출력
from tabulate import tabulate
print("\n📊 Final Results Table:")
print(tabulate(df_results, headers='keys', tablefmt='grid'))
# ✅ 결과 비교
print("\n📊 Final Results:")
print(f"Speedup (GPU over CPU): {cpu_time / gpu_time:.2f}x")
M4 pro 결과
🚀 Training on CPU...
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
Epoch 1/5
1875/1875 [==============================] - 106s 56ms/step - loss: 0.2618 - accuracy: 0.9158 - val_loss: 0.0540 - val_accuracy: 0.9853
Epoch 2/5
1875/1875 [==============================] - 105s 56ms/step - loss: 0.0712 - accuracy: 0.9819 - val_loss: 0.0397 - val_accuracy: 0.9894
Epoch 3/5
1875/1875 [==============================] - 105s 56ms/step - loss: 0.0566 - accuracy: 0.9859 - val_loss: 0.0331 - val_accuracy: 0.9909
Epoch 4/5
1875/1875 [==============================] - 106s 56ms/step - loss: 0.0480 - accuracy: 0.9879 - val_loss: 0.0343 - val_accuracy: 0.9908
Epoch 5/5
1875/1875 [==============================] - 105s 56ms/step - loss: 0.0448 - accuracy: 0.9889 - val_loss: 0.0531 - val_accuracy: 0.9884
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
🚀 Training on GPU (MPS)...
Epoch 1/5
1875/1875 [==============================] - 20s 10ms/step - loss: 2.3020 - accuracy: 0.1118 - val_loss: 2.3012 - val_accuracy: 0.1135
Epoch 2/5
1875/1875 [==============================] - 18s 10ms/step - loss: 2.3015 - accuracy: 0.1124 - val_loss: 2.3011 - val_accuracy: 0.1135
Epoch 3/5
1875/1875 [==============================] - 17s 9ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3012 - val_accuracy: 0.1135
Epoch 4/5
1875/1875 [==============================] - 17s 9ms/step - loss: 2.3013 - accuracy: 0.1124 - val_loss: 2.3011 - val_accuracy: 0.1135
Epoch 5/5
1875/1875 [==============================] - 17s 9ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3010 - val_accuracy: 0.1135
📊 Final Results:
Environment Training Time (s) Final Loss Final Accuracy
0 CPU 527.098848 0.053075 0.9884
1 GPU (MPS) 89.892101 2.301009 0.1135
📊 Final Results Table:
+----+---------------+---------------------+--------------+------------------+
| | Environment | Training Time (s) | Final Loss | Final Accuracy |
+====+===============+=====================+==============+==================+
| 0 | CPU | 527.099 | 0.0530755 | 0.9884 |
+----+---------------+---------------------+--------------+------------------+
| 1 | GPU (MPS) | 89.8921 | 2.30101 | 0.1135 |
+----+---------------+---------------------+--------------+------------------+
📊 Final Results:
Speedup (GPU over CPU): 5.86x
M1 결과
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
🚀 Training on CPU...
Epoch 1/5
1875/1875 [==============================] - 252s 134ms/step - loss: 0.2832 - accuracy: 0.9070 - val_loss: 0.0618 - val_accuracy: 0.9843
Epoch 2/5
1875/1875 [==============================] - 251s 134ms/step - loss: 0.0705 - accuracy: 0.9818 - val_loss: 0.0448 - val_accuracy: 0.9871
Epoch 3/5
1875/1875 [==============================] - 250s 133ms/step - loss: 0.0544 - accuracy: 0.9865 - val_loss: 0.0426 - val_accuracy: 0.9887
Epoch 4/5
1875/1875 [==============================] - 250s 133ms/step - loss: 0.0464 - accuracy: 0.9884 - val_loss: 0.0325 - val_accuracy: 0.9913
Epoch 5/5
1875/1875 [==============================] - 251s 134ms/step - loss: 0.0423 - accuracy: 0.9900 - val_loss: 0.0413 - val_accuracy: 0.9902
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
🚀 Training on GPU (MPS)...
Epoch 1/5
1875/1875 [==============================] - 55s 29ms/step - loss: 2.3020 - accuracy: 0.1105 - val_loss: 2.3011 - val_accuracy: 0.1135
Epoch 2/5
1875/1875 [==============================] - 55s 29ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3013 - val_accuracy: 0.1135
Epoch 3/5
1875/1875 [==============================] - 55s 29ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3010 - val_accuracy: 0.1135
Epoch 4/5
1875/1875 [==============================] - 54s 29ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3009 - val_accuracy: 0.1135
Epoch 5/5
1875/1875 [==============================] - 54s 29ms/step - loss: 2.3014 - accuracy: 0.1124 - val_loss: 2.3011 - val_accuracy: 0.1135
📊 Final Results:
Environment Training Time (s) Final Loss Final Accuracy
0 CPU 1254.501517 0.041344 0.9902
1 GPU (MPS) 273.759162 2.301116 0.1135
📊 Final Results Table:
+----+---------------+---------------------+--------------+------------------+
| | Environment | Training Time (s) | Final Loss | Final Accuracy |
+====+===============+=====================+==============+==================+
| 0 | CPU | 1254.5 | 0.041344 | 0.9902 |
+----+---------------+---------------------+--------------+------------------+
| 1 | GPU (MPS) | 273.759 | 2.30112 | 0.1135 |
+----+---------------+---------------------+--------------+------------------+
📊 Final Results:
Speedup (GPU over CPU): 4.58x
결과 요약
📊 M4 Pro vs M1 성능 차이 (ResNet 학습 속도)
1. CPU 학습 시간 비교:
• M4 Pro가 M1보다 약 28.38% 더 빠름
2. GPU 학습 시간 비교:
• M4 Pro가 M1보다 약 30.00% 더 빠름
📊 M4 Pro vs M1 성능 차이 (VGG16 학습 속도)
1. CPU 학습 시간 비교:
• M4 Pro가 M1보다 약 58.0% 더 빠름
2. GPU 학습 시간 비교:
• M4 Pro가 M1보다 약 67.2% 더 빠름
'AI' 카테고리의 다른 글
Temporal Fusion Transformers 활용한 보행 행동 예측 아이디어 (1) | 2025.03.24 |
---|---|
Apple M4 pro chip 에서 keras 의 Stable Diffusion 모델 사용하기 (6) | 2025.01.07 |
Residual Network 구현 및 학습 (2) | 2024.11.24 |
DenseNet 구현 및 학습 (2) | 2024.11.22 |
TensorFlow 함수형 API 로 VGGNet 논문 구현 (2) | 2024.11.19 |