Cloud Access
Cloud access is an important aspect of data engineering to enable data aggregation, storage, retrieval, and enterprise scale-up. The three largest cloud service providers are Amazon, Microsoft, and Google. The services relate to hosting servers and containers, data storage as files or databases, and network and access functions. Cloud-based storage and computing scales-up as more compute resources or space is needed by an application.
Service | Amazon Web Services (AWS) | Microsoft Azure | Google Cloud Platform (GCP) |
---|---|---|---|
Virtual Servers | Elastic Cloud Compute | Virtual Machines | Google Compute Engine |
Serverless Computing | Lambda | Azure Functions | Cloud Functions |
Kubernetes Management | Elastic Kubernetes Service | Kubernetes Service | Kubernetes Engine |
Object Storage | Simple Storage Service | Azure Blob | Cloud Storage |
File Storage | Elastic File Storage | Azure Files | Filestore |
Block Storage | Elastic Block Storage | Azure Disk | Persistent Disk |
Relational Database | Relational Database Service | SQL Database | Cloud SQL |
NoSQL Database | DynamoDB | Cosmos DB | Firestore |
Virtual Network | Virtual Private Cloud | Azure VNet | Virtual Private Network |
Content Delivery Network | CloudFront | Azure CDN | Cloud CDN |
DNS Service | Route 53 | Traffic Manager | Cloud DNS |
Authentication and Authorization | IAM | Azure Active Directory | Cloud IAM |
Key Management | KMS | Azure Key Vault | KMS |
Network Security | AWS WAF | Application Gateway | Cloud Armor |
Tutorials for specific data access to each of the platforms is best obtained from the cloud hosting provider. Each provider has tutorials on accessing data with Python with several excellent tutorials for beginners. Students are typically given free or reduced-cost access to the platforms for learning. Each requires registration and an account to use the services.
Cloud services start with definition of public, private, or hybrid cloud access. The base Infrastructure as a Service (IaaS) includes bare metal servers, virtual machines, disk space, networking, and load balancers. The next level is the Platform as a Service (Paas) with platforms that utilize IaaS to run applications, databases, servers, and data lakes that span multiple storage units. These compute, storage, and networking services are further built upon to create Cloud applications or Software as a Service (SaaS) as a complete solution. Clients (computer browsers, mobile apps, IoT devices) connect to the SaaS applications to send data, retrieve results, and utilize the functions they provide. Applications are designed to scale-up resources as needed and provide distributed computing and data storage for SaaS resilience to unplanned outages.
✅ Activity
Create a Python program to monitor a directory for a photo and remove the background with the rembg Python package. Deploy the application as a service to process photos.
Service to remove image background
The rembg package installation downgrades a number of packages such as numpy so it is recommended to set up a virtual environment (venv) for the installation as shown with Install Packages.
source bckgrm/bin/activate
With the virtual environment activated, install rembg. The extra [gpu] option can be used if running on Google Colab with pip install rembg[gpu]. The first time rembg runs, it downloads a 176 MB machine learned model.
Test the rembg application with the remove() function. Use other photos to test the performance. Replace image_input.jpg with the path to the input image.
from PIL import Image
input_path = 'image_input.jpg'
output_path = 'image_output.png'
# download image
import urllib.request
url = 'http://apmonitor.com/dde/uploads/Main/'+input_path
urllib.request.urlretrieve(url, input_path)
input = Image.open(input_path)
output = remove(input)
output.save(output_path)
Process Local Folder Images to Remove Background
This solution demonstrates how to monitor a local computer folder and automatically remove the background from any images placed into the input folder. This input folder could be a Dropbox or Google Drive folder where image files are placed for automatic background removal.
Create a virtual environment and install rembg.
source backgrm/bin/activate
python3 -m pip install rembg
Create a new file to run the service. The program runs for 200 sec and checks for new files in the fpath_in folder. If an image file is found, it removes the background from the image and saves the new image to the fpath_out folder. Finally, it deletes the input image. This simple program can be used to receive online image submissions and display the output through a webpage.
import time
import glob
from rembg import remove
from PIL import Image
# create directories to store images
fpath_in = './input'
fpath_out = './output'
for f in [fpath_in,fpath_out]:
try:
os.mkdir(f)
except:
continue
i=0
while i<=200:
# scan input directory every second
time.sleep(1.0); i+=1
fp = glob.glob(fpath_in+'/*')
for f in fp:
print(f)
# open image
img = Image.open(f)
# remove background
out = remove(img)
# get file name
img_name = os.path.basename(f)
# save to output folder
out.save(fpath_out+'/'+img_name+'.png')
# remove input folder image
os.remove(f)
Web API with Websockets to Remove Background
Create a virtual environment and install rembg and websockets.
source backgrm/bin/activate
python3 -m pip install rembg websockets
Save as server.py and run in the virtual python environment.
import websockets
import io
from PIL import Image
from rembg import remove
async def handle_image(websocket, path):
image_bytes = await websocket.recv()
image = Image.open(io.BytesIO(image_bytes))
# remove image background
image = remove(image)
new_image_bytes = io.BytesIO()
image.save(new_image_bytes, format='PNG')
await websocket.send(new_image_bytes.getvalue())
start_server = websockets.serve(handle_image, "localhost", 8765)
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()
Save as client.py and run on the local computer or on another networked computer.
import websockets
import urllib.request
# download image or replace with another image file
input_path = 'image_input.jpg'
url = 'http://apmonitor.com/dde/uploads/Main/'+input_path
urllib.request.urlretrieve(url, input_path)
async def send_image():
async with websockets.connect("ws://localhost:8765") as websocket:
with open("image_input.jpg", "rb") as image_file:
image_bytes = image_file.read()
await websocket.send(image_bytes)
new_image_bytes = await websocket.recv()
with open("new_image.png", "wb") as new_image_file:
new_image_file.write(new_image_bytes)
asyncio.run(send_image())
See WebSocket Transfer for more information about data transfer.
Docker Container to Remove Background
Create a folder to store files Dockerfile, server.py, and requirements.txt. First, create Dockerfile as a text file with no file extension.
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Open port 8765
EXPOSE 8765
COPY server.py ./
CMD [ "python", "./server.py" ]
Create requirements.txt and save in the docker build folder.
websockets
Create server.py and save in the docker build folder.
import websockets
import io
from PIL import Image
from rembg import remove
async def handle_image(websocket, path):
image_bytes = await websocket.recv()
image = Image.open(io.BytesIO(image_bytes))
# remove image background
image = remove(image)
new_image_bytes = io.BytesIO()
image.save(new_image_bytes, format='PNG')
await websocket.send(new_image_bytes.getvalue())
start_server = websockets.serve(handle_image, "localhost", 8765)
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()
Build the docker image.
Run the docker image.
After the docker image is running, create client.py and run on the local computer or on another networked computer.
import websockets
import urllib.request
# download image or replace with another image file
input_path = 'image_input.jpg'
url = 'http://apmonitor.com/dde/uploads/Main/'+input_path
urllib.request.urlretrieve(url, input_path)
async def send_image():
async with websockets.connect("ws://localhost:8765") as websocket:
with open("image_input.jpg", "rb") as image_file:
image_bytes = image_file.read()
await websocket.send(image_bytes)
new_image_bytes = await websocket.recv()
with open("new_image.png", "wb") as new_image_file:
new_image_file.write(new_image_bytes)
asyncio.run(send_image())
See WebSocket Transfer for more information about data transfer.
✅ Knowledge Check
1. Which of the following is NOT a cloud service provider mentioned in this review of cloud service providers?
- Incorrect. Google is one of the mentioned cloud service providers with its Google Cloud Platform that has 11% market share in 2023.
- Correct. IBM is not listed as one of the three major cloud service providers. In 2023, it has 3% of the cloud computing market share. Other notable cloud service providers are Alibaba (4%), Salesforce (3%), Oracle (2%), and Tencent (2%).
- Incorrect. Microsoft is one of the mentioned cloud service providers with its Microsoft Azure that has 22% market share in 2023.
- Incorrect. Amazon is one of the mentioned cloud service providers with its Amazon Web Services (AWS) with 32% market share in 2023.
2. What does the rembg package primarily help with?
- Incorrect. The rembg package is not for deploying applications but for removing image backgrounds.
- Incorrect. The rembg package does not monitor directories but removes image backgrounds.
- Correct. The primary function of the rembg package is to remove the background from images. It is deployed as a cloud service in the activity.
- Incorrect. While the content suggests setting up a virtual environment before installing rembg, the package itself does not set up virtual environments.