···11+MIT No Attribution
22+33+Copyright <year> Sona Tau Estrada Rivera <sona@stau.space>
44+55+Permission is hereby granted, free of charge, to any person obtaining a copy of this
66+software and associated documentation files (the "Software"), to deal in the Software
77+without restriction, including without limitation the rights to use, copy, modify,
88+merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
99+permit persons to whom the Software is furnished to do so.
1010+1111+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
1212+INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
1313+PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
1414+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
1515+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
1616+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+333
README.md
···11+# Project
22+33+To run this project use:
44+55+```sh
66+make run
77+```
88+99+To compile this project use:
1010+1111+```sh
1212+make build
1313+```
1414+1515+# TODO
1616+- [ ] createdb.py
1717+- [ ] testdb.py
1818+- [ ] ls.py
1919+- [ ] meta-data.py
2020+ - [ ] implementar `reg` para recibir informacion de los data nodes
2121+2222+2323+# Assignment 04: Distributed File systems
2424+2525+The components to implement are:
2626+2727+* **Metadata server**, which will function as an inodes repository
2828+* **Data servers**, that will serve as the disk space for file data blocks
2929+* **List client**, that will list the files available in the DFS
3030+* **Copy client**, that will copy files from and to the DFS
3131+3232+# Objectives
3333+3434+* Study the main components of a distributed file system
3535+* Get familiarized with File Management
3636+* Implementation of a distributed system
3737+3838+# Prerequisites
3939+4040+* Python:
4141+ * [www.python.org](http://www.python.org/)
4242+* Python SocketServer library: for **TCP** socket communication.
4343+ * https://docs.python.org/3/library/socketserver.html
4444+* uuid: to generate unique IDs for the data blocks
4545+ * https://docs.python.org/3/library/uuid.html
4646+* **Optionally** you may read about the json and sqlite3 libraries used in the
4747+skeleton of the program.
4848+ * https://docs.python.org/3/library/json.html
4949+ * https://docs.python.org/3/library/sqlite3.html
5050+5151+### **The metadata server's database manipulation functions.**
5252+5353+No expertise in database management is required to accomplish this project.
5454+However sqlite3 is used to store the file inodes in the metadata server. You
5555+don't need to understand the functions but you need to read the documentation
5656+of the functions that interact with the database. The metadata server database
5757+functions are defined in file mds\_db.py.
5858+5959+#### **Inode**
6060+6161+For this implementation an **inode** consists of:
6262+6363+* File name
6464+* File size
6565+* List of blocks
6666+6767+#### **Block List**
6868+6969+The **block list** consists of a list of:
7070+7171+* data node address \- to know the data node the block is stored
7272+* data node port \- to know the service port of the data node
7373+* data node block\_id \- the id assigned to the block
7474+7575+Functions:
7676+7777+* AddDataNode(address, port): Adds new data node to the metadata server
7878+Receives IP address and port. I.E. the information to connect to the data node.
7979+8080+* GetDataNodes(): Returns a list of data node tuples **(address, port)**
8181+registered. Useful to know to which data nodes the data blocks can be sent.
8282+* InsertFile(filename, fsize): Insert a filename with its file size into the
8383+database.
8484+* GetFiles(): Returns a list of the attributes of the files stored in the DFS.
8585+(addr, file size)
8686+* AddBlockToInode(filename, blocks): Add the list of data blocks information of
8787+a file. The data block information consists of (address, port, block\_id)
8888+* GetFileInode(filename): Returns the file size, and the list of data block
8989+information of a file. (fsize, block\_list)
9090+9191+### **The packet manipulation functions:**
9292+9393+The packet library is designed to serialize the communication data using the
9494+json library. No expertise with json is required to accomplish this assignment.
9595+These functions were developed to ease the packet generation process of the
9696+project. The packet library is defined in file Packet.py.
9797+9898+In this project all packet objects have a packet type among the following
9999+command type options:
100100+101101+* reg: to register a data node
102102+* list: to ask for a list of files
103103+* put: to put a files in the DFS
104104+* get: to get files from the DFS
105105+* dblks: to add the data block ids to the files.
106106+107107+#### **Functions:**
108108+109109+##### **General Functions**
110110+111111+* getEncodedPacket(): returns a serialized packet ready to send through the
112112+network. First you need to build the packets. See Build**\<X\>**Packet
113113+functions.
114114+* DecodePacket(packet): Receives a serialized message and turns it into a
115115+packet object.
116116+* getCommand(): Returns the command type of the packet
117117+118118+##### **Packet Registration Functions**
119119+120120+* BuildRegPacket(addr, port): Builds a registration packet.
121121+* getAddr(): Returns the IP address of a server. Useful for registration
122122+packets
123123+* getPort(): Returns the Port number of a server. Useful for registration
124124+packets
125125+126126+##### **Packet List Functions**
127127+128128+* BuildListPacket(): Builds a list packet for file listing
129129+* BuildListResponse(filelist): Builds a list response packet with the list of
130130+files.
131131+* getFileArray(): Returns a list of files
132132+133133+##### **Get Packet Functions**
134134+135135+* BuildGetPacket(fname): Builds a get packet to get a file name.
136136+* BuildGetResponse(metalist, fsize): Builds a list of data node servers with
137137+the blocks of a file, and the file size.
138138+* getFileName(): Returns the file name in a packet.
139139+* getDataNodes(): Returns a list of data servers.
140140+141141+##### **Put Packet Functions (Put Blocks)**
142142+143143+* BuildPutPacket(fname, size): Builds a put packet to put fname and file size
144144+in the metadata server.
145145+* getFileInfo(): Returns the file info in a packet.
146146+* BuildPutResponse(metalist): Builds a list of data node servers where the data
147147+blocks of a file can be stored. I.E a list of available data servers.
148148+* BuildDataBlockPacket(fname, block\_list): Builds a data block packet.
149149+Contains the file name and the list of blocks for the file. See [block
150150+list](http://ccom.uprrp.edu/~jortiz/clases/ccom4017/asig04/#block_list) to
151151+review the content of a block list.
152152+* getDataBlocks(): Returns a list of data blocks
153153+154154+##### **Get Data block Functions (Get Blocks)**
155155+156156+* BuildGetDataBlockPacket(blockid): Builds a get data block packet. Usefull
157157+when requesting a data block from a data node.
158158+* getBlockID(): Returns the block\_id from a packet.
159159+160160+# Instructions
161161+162162+Write and complete code for an unreliable and insecure distributed file server
163163+following the specifications below.
164164+165165+### **Design specifications.**
166166+167167+For this project you will design and complete a distributed file system. You
168168+will write a DFS with tools to list the files, and to copy files from and to
169169+the DFS.
170170+171171+Your DFS will consist of:
172172+173173+* A metadata server: which will contain the metadata (inode) information of the
174174+files in your file system. It will also keep a registry of the data servers
175175+that are connected to the DFS.
176176+* Data nodes: The data nodes will contain chunks (some blocks) of the file that
177177+you are storing in the DFS.
178178+* List command: A command to list the files stored in the DFS.
179179+* Copy command: A command that will copy files from and to the DFS.
180180+181181+### **The metadata server**
182182+183183+The metadata server contains the metadata (inode) information of the files in
184184+your file system. It will also keep a registry of the data servers that are
185185+connected to the DFS.
186186+187187+Your metadata server must provide the following services:
188188+189189+1. Listen to the data nodes that are part of the DFS. Every time a new data
190190+node registers to the DFS the metadata server must keep the contact information
191191+of that data node. This is (IP Address, Listening Port).
192192+ * To ease the implementation of the DFS, the directory file system must
193193+contain three things:
194194+ * the path of the file in the file system (filename)
195195+ * the nodes that contain the data blocks of the files
196196+ * the file size
197197+2. Every time a client (commands list or copy) contacts the metadata server
198198+for:
199199+ * get: requesting to read a file: the metadata server must check if the file
200200+is in the DFS database, and if it is, it must return the nodes with the
201201+blocks\_ids that contain the file.
202202+ * put: requesting to write a file: the metadata server must:
203203+ * insert in the database the path of the new file (with its name), and its
204204+size.
205205+ * return a list of available data nodes where to write the chunks of the
206206+file
207207+ * dblks: then store the data blocks that have the information of the data
208208+nodes and the block ids of the file.
209209+ * list: requesting to list files:
210210+ * the metadata server must return a list with the files in the DFS and
211211+their size.
212212+213213+The metadata server must be run:
214214+215215+python meta-data.py \<port, default=8000\>
216216+217217+If no port is specified the port 8000 will be used by default.
218218+219219+### **The data node server**
220220+221221+The data node is the process that receives and saves the data blocks of the
222222+files. It must first register with the metadata server as soon as it starts its
223223+execution. The data node receives the data from the clients when the client
224224+wants to write a file, and returns the data when the client wants to read a
225225+file.
226226+227227+Your data node must provide the following services:
228228+229229+1. put: Listen to writes:
230230+ * The data node will receive blocks of data, store them using an unique id,
231231+and return the unique id.
232232+ * Each node must have its own block storage path. You may run more than one
233233+data node per system.
234234+2. get: Listen to reads
235235+ * The data node will receive requests for data blocks, and it must read the
236236+data block, and return its content.
237237+238238+The data nodes must be run:
239239+240240+python data-node.py \<server address\> \<port\> \<data path\> \<metadata
241241+port,default=8000\>
242242+243243+Server address is the metadata server address, port is the data-node port
244244+number, data path is a path to a directory to store the data blocks, and
245245+metadata port is the optional metadata port if it was run in a different port
246246+other than the default port.
247247+248248+**Note:** Since you most probably do not have many different computers at your
249249+disposal, you may run more than one data-node in the same computer but the
250250+listening port and their data block directory must be different.
251251+252252+### **The list client**
253253+254254+The list client just sends a list request to the metadata server and then waits
255255+for a list of file names with their size.
256256+257257+The output must look like:
258258+259259+/home/cheo/asig.cpp 30 bytes
260260+/home/hola.txt 200 bytes
261261+/home/saludos.dat 2000 bytes
262262+263263+The list client must be run:
264264+265265+python ls.py \<server\>:\<port, default=8000\>
266266+267267+Where server is the metadata server IP and port is the metadata server port. If
268268+the default port is not indicated the default port is 8000 and no ':' character
269269+is necessary.
270270+271271+### **The copy client**
272272+273273+The copy client is more complicated than the list client. It is in charge of
274274+copying the files from and to the DFS.
275275+276276+The copy client must:
277277+278278+1. Write files in the DFS
279279+ * The client must send to the metadata server the file name and size of the
280280+file to write.
281281+ * Wait for the metadata server response with the list of available data
282282+nodes.
283283+ * Send the data blocks to each data node.
284284+ * You may decide to divide the file over the number of data servers.
285285+ * You may divide the file into X size blocks and send it to the data
286286+servers in round robin.
287287+2. Read files from the DFS
288288+ * Contact the metadata server with the file name to read.
289289+ * Wait for the block list with the bloc id and data server information
290290+ * Retrieve the file blocks from the data servers.
291291+ * This part will depend on the division algorithm used in step (1).
292292+293293+The copy client must be run:
294294+295295+Copy from DFS:
296296+297297+python copy.py \<server\>:\<port\>:\<dfs file path\> \<destination file\>
298298+299299+To DFS:
300300+301301+python copy.py \<source file\> \<server\>:\<port\>:\<dfs file path\>
302302+303303+Where server is the metadata server IP address, and port is the metadata server
304304+port.
305305+306306+# Creating an empty database
307307+308308+The script createdb.py generates an empty database *dfs.db* for the project.
309309+310310+ python createdb.py
311311+312312+# Deliverables
313313+314314+* The source code of the programs (well documented)
315315+* A README file with:
316316+ * description of the programs, including a brief description of how they
317317+work.
318318+ * who helped you or discussed issues with you to finish the program.
319319+* Video description of the project with implementation details. Any doubt
320320+please consult the professor.
321321+322322+# Rubric
323323+324324+* (10 pts) the programs run
325325+* (80 pts) quality of the working solutions
326326+ * (20 pts) Metadata server implemented correctly
327327+ * (25 pts) Data server implemented correctly
328328+ * (10 pts) List client implemented correctly
329329+ * (25 pts) Copy client implemented correctly
330330+* (10 pts) quality of the README
331331+ * (10 pts) description of the programs with their description.
332332+* No project will be graded without submission of the video explaining how the
333333+project was implemented.
···11+{
22+ description = "Declarations for the environment that this project will use.";
33+44+ # Flake inputs
55+ inputs.nixpkgs.url = "https://flakehub.com/f/NixOS/nixpkgs/0.1";
66+77+ # Flake outputs
88+ outputs = inputs:
99+ let
1010+ # The systems supported for this flake
1111+ supportedSystems = [
1212+ "x86_64-linux" # 64-bit Intel/AMD Linux
1313+ "aarch64-linux" # 64-bit ARM Linux
1414+ "x86_64-darwin" # 64-bit Intel macOS
1515+ "aarch64-darwin" # 64-bit ARM macOS
1616+ ];
1717+1818+ # Helper to provide system-specific attributes
1919+ forEachSupportedSystem = f: inputs.nixpkgs.lib.genAttrs supportedSystems (system: f {
2020+ pkgs = import inputs.nixpkgs { inherit system; };
2121+ });
2222+ in
2323+ {
2424+ devShells = forEachSupportedSystem ({ pkgs }: {
2525+ default = pkgs.mkShell {
2626+ # The Nix packages provided in the environment
2727+ # Add any you need here
2828+ packages = with pkgs; [
2929+ tinycc
3030+ gcc
3131+ gnumake
3232+ clang-tools
3333+ lldb
3434+ ];
3535+3636+ # Set any environment variables for your dev shell
3737+ env = { };
3838+3939+ # Add any shell logic you want executed any time the environment is activated
4040+ shellHook = ''
4141+ '';
4242+ };
4343+ });
4444+ };
4545+}
+333
instructions.md
···11+# Assignment 04: Distributed File systems
22+33+University of Puerto Rico at Rio Piedras
44+55+Department of Computer Science
66+77+CCOM4017: Operating Systems
88+99+# Introduction
1010+1111+In this project the student will implement the main components of a file system
1212+by implementing a simple, yet functional, distributed file system (DFS). The
1313+project will expand students' knowledge of the main components of a file system
1414+(inodes, and data blocks), will further develop the student skills in
1515+inter-process communication, and will increase their system security awareness.
1616+1717+The components to implement are:
1818+1919+* **Metadata server**, which will function as an inodes repository
2020+* **Data servers**, that will serve as the disk space for file data blocks
2121+* **List client**, that will list the files available in the DFS
2222+* **Copy client**, that will copy files from and to the DFS
2323+2424+# Objectives
2525+2626+* Study the main components of a distributed file system
2727+* Get familiarized with File Management
2828+* Implementation of a distributed system
2929+3030+# Prerequisites
3131+3232+* Python:
3333+ * [www.python.org](http://www.python.org/)
3434+* Python SocketServer library: for **TCP** socket communication.
3535+ *
3636+[https://docs.python.org/3/library/socketserver.html](https://docs.python.org/3/
3737+library/socketserver.html)
3838+* uuid: to generate unique IDs for the data blocks
3939+ *
4040+[https://docs.python.org/3/library/uuid.html](https://docs.python.org/2/library/
4141+uuid.html)
4242+* **Optionally** you may read about the json and sqlite3 libraries used in the
4343+skeleton of the program.
4444+ *
4545+[https://docs.python.org/3/library/json.html](https://docs.python.org/3/library/
4646+json.html)
4747+ *
4848+[https://docs.python.org/3/library/sqlite3.html](https://docs.python.org/3/libra
4949+ry/sqlite3.html)
5050+5151+### **The metadata server's database manipulation functions.**
5252+5353+No expertise in database management is required to accomplish this project.
5454+However sqlite3 is used to store the file inodes in the metadata server. You
5555+don't need to understand the functions but you need to read the documentation
5656+of the functions that interact with the database. The metadata server database
5757+functions are defined in file mds\_db.py.
5858+5959+#### **Inode**
6060+6161+For this implementation an **inode** consists of:
6262+6363+* File name
6464+* File size
6565+* List of blocks
6666+6767+#### **Block List**
6868+6969+The **block list** consists of a list of:
7070+7171+* data node address \- to know the data node the block is stored
7272+* data node port \- to know the service port of the data node
7373+* data node block\_id \- the id assigned to the block
7474+7575+Functions:
7676+7777+* AddDataNode(address, port): Adds new data node to the metadata server
7878+Receives IP address and port. I.E. the information to connect to the data node.
7979+8080+* GetDataNodes(): Returns a list of data node tuples **(address, port)**
8181+registered. Useful to know to which data nodes the data blocks can be sent.
8282+* InsertFile(filename, fsize): Insert a filename with its file size into the
8383+database.
8484+* GetFiles(): Returns a list of the attributes of the files stored in the DFS.
8585+(addr, file size)
8686+* AddBlockToInode(filename, blocks): Add the list of data blocks information of
8787+a file. The data block information consists of (address, port, block\_id)
8888+* GetFileInode(filename): Returns the file size, and the list of data block
8989+information of a file. (fsize, block\_list)
9090+9191+### **The packet manipulation functions:**
9292+9393+The packet library is designed to serialize the communication data using the
9494+json library. No expertise with json is required to accomplish this assignment.
9595+These functions were developed to ease the packet generation process of the
9696+project. The packet library is defined in file Packet.py.
9797+9898+In this project all packet objects have a packet type among the following
9999+command type options:
100100+101101+* reg: to register a data node
102102+* list: to ask for a list of files
103103+* put: to put a files in the DFS
104104+* get: to get files from the DFS
105105+* dblks: to add the data block ids to the files.
106106+107107+#### **Functions:**
108108+109109+##### **General Functions**
110110+111111+* getEncodedPacket(): returns a serialized packet ready to send through the
112112+network. First you need to build the packets. See Build**\<X\>**Packet
113113+functions.
114114+* DecodePacket(packet): Receives a serialized message and turns it into a
115115+packet object.
116116+* getCommand(): Returns the command type of the packet
117117+118118+##### **Packet Registration Functions**
119119+120120+* BuildRegPacket(addr, port): Builds a registration packet.
121121+* getAddr(): Returns the IP address of a server. Useful for registration
122122+packets
123123+* getPort(): Returns the Port number of a server. Useful for registration
124124+packets
125125+126126+##### **Packet List Functions**
127127+128128+* BuildListPacket(): Builds a list packet for file listing
129129+* BuildListResponse(filelist): Builds a list response packet with the list of
130130+files.
131131+* getFileArray(): Returns a list of files
132132+133133+##### **Get Packet Functions**
134134+135135+* BuildGetPacket(fname): Builds a get packet to get a file name.
136136+* BuildGetResponse(metalist, fsize): Builds a list of data node servers with
137137+the blocks of a file, and the file size.
138138+* getFileName(): Returns the file name in a packet.
139139+* getDataNodes(): Returns a list of data servers.
140140+141141+##### **Put Packet Functions (Put Blocks)**
142142+143143+* BuildPutPacket(fname, size): Builds a put packet to put fname and file size
144144+in the metadata server.
145145+* getFileInfo(): Returns the file info in a packet.
146146+* BuildPutResponse(metalist): Builds a list of data node servers where the data
147147+blocks of a file can be stored. I.E a list of available data servers.
148148+* BuildDataBlockPacket(fname, block\_list): Builds a data block packet.
149149+Contains the file name and the list of blocks for the file. See [block
150150+list](http://ccom.uprrp.edu/~jortiz/clases/ccom4017/asig04/#block_list) to
151151+review the content of a block list.
152152+* getDataBlocks(): Returns a list of data blocks
153153+154154+##### **Get Data block Functions (Get Blocks)**
155155+156156+* BuildGetDataBlockPacket(blockid): Builds a get data block packet. Usefull
157157+when requesting a data block from a data node.
158158+* getBlockID(): Returns the block\_id from a packet.
159159+160160+# Instructions
161161+162162+Write and complete code for an unreliable and insecure distributed file server
163163+following the specifications below.
164164+165165+### **Design specifications.**
166166+167167+For this project you will design and complete a distributed file system. You
168168+will write a DFS with tools to list the files, and to copy files from and to
169169+the DFS.
170170+171171+Your DFS will consist of:
172172+173173+* A metadata server: which will contain the metadata (inode) information of the
174174+files in your file system. It will also keep a registry of the data servers
175175+that are connected to the DFS.
176176+* Data nodes: The data nodes will contain chunks (some blocks) of the file that
177177+you are storing in the DFS.
178178+* List command: A command to list the files stored in the DFS.
179179+* Copy command: A command that will copy files from and to the DFS.
180180+181181+### **The metadata server**
182182+183183+The metadata server contains the metadata (inode) information of the files in
184184+your file system. It will also keep a registry of the data servers that are
185185+connected to the DFS.
186186+187187+Your metadata server must provide the following services:
188188+189189+1. Listen to the data nodes that are part of the DFS. Every time a new data
190190+node registers to the DFS the metadata server must keep the contact information
191191+of that data node. This is (IP Address, Listening Port).
192192+ * To ease the implementation of the DFS, the directory file system must
193193+contain three things:
194194+ * the path of the file in the file system (filename)
195195+ * the nodes that contain the data blocks of the files
196196+ * the file size
197197+2. Every time a client (commands list or copy) contacts the metadata server
198198+for:
199199+ * get: requesting to read a file: the metadata server must check if the file
200200+is in the DFS database, and if it is, it must return the nodes with the
201201+blocks\_ids that contain the file.
202202+ * put: requesting to write a file: the metadata server must:
203203+ * insert in the database the path of the new file (with its name), and its
204204+size.
205205+ * return a list of available data nodes where to write the chunks of the
206206+file
207207+ * dblks: then store the data blocks that have the information of the data
208208+nodes and the block ids of the file.
209209+ * list: requesting to list files:
210210+ * the metadata server must return a list with the files in the DFS and
211211+their size.
212212+213213+The metadata server must be run:
214214+215215+python meta-data.py \<port, default=8000\>
216216+217217+If no port is specified the port 8000 will be used by default.
218218+219219+### **The data node server**
220220+221221+The data node is the process that receives and saves the data blocks of the
222222+files. It must first register with the metadata server as soon as it starts its
223223+execution. The data node receives the data from the clients when the client
224224+wants to write a file, and returns the data when the client wants to read a
225225+file.
226226+227227+Your data node must provide the following services:
228228+229229+1. put: Listen to writes:
230230+ * The data node will receive blocks of data, store them using an unique id,
231231+and return the unique id.
232232+ * Each node must have its own block storage path. You may run more than one
233233+data node per system.
234234+2. get: Listen to reads
235235+ * The data node will receive requests for data blocks, and it must read the
236236+data block, and return its content.
237237+238238+The data nodes must be run:
239239+240240+python data-node.py \<server address\> \<port\> \<data path\> \<metadata
241241+port,default=8000\>
242242+243243+Server address is the metadata server address, port is the data-node port
244244+number, data path is a path to a directory to store the data blocks, and
245245+metadata port is the optional metadata port if it was run in a different port
246246+other than the default port.
247247+248248+**Note:** Since you most probably do not have many different computers at your
249249+disposal, you may run more than one data-node in the same computer but the
250250+listening port and their data block directory must be different.
251251+252252+### **The list client**
253253+254254+The list client just sends a list request to the metadata server and then waits
255255+for a list of file names with their size.
256256+257257+The output must look like:
258258+259259+/home/cheo/asig.cpp 30 bytes
260260+/home/hola.txt 200 bytes
261261+/home/saludos.dat 2000 bytes
262262+263263+The list client must be run:
264264+265265+python ls.py \<server\>:\<port, default=8000\>
266266+267267+Where server is the metadata server IP and port is the metadata server port. If
268268+the default port is not indicated the default port is 8000 and no ':' character
269269+is necessary.
270270+271271+### **The copy client**
272272+273273+The copy client is more complicated than the list client. It is in charge of
274274+copying the files from and to the DFS.
275275+276276+The copy client must:
277277+278278+1. Write files in the DFS
279279+ * The client must send to the metadata server the file name and size of the
280280+file to write.
281281+ * Wait for the metadata server response with the list of available data
282282+nodes.
283283+ * Send the data blocks to each data node.
284284+ * You may decide to divide the file over the number of data servers.
285285+ * You may divide the file into X size blocks and send it to the data
286286+servers in round robin.
287287+2. Read files from the DFS
288288+ * Contact the metadata server with the file name to read.
289289+ * Wait for the block list with the bloc id and data server information
290290+ * Retrieve the file blocks from the data servers.
291291+ * This part will depend on the division algorithm used in step (1).
292292+293293+The copy client must be run:
294294+295295+Copy from DFS:
296296+297297+python copy.py \<server\>:\<port\>:\<dfs file path\> \<destination file\>
298298+299299+To DFS:
300300+301301+python copy.py \<source file\> \<server\>:\<port\>:\<dfs file path\>
302302+303303+Where server is the metadata server IP address, and port is the metadata server
304304+port.
305305+306306+# Creating an empty database
307307+308308+The script createdb.py generates an empty database *dfs.db* for the project.
309309+310310+ python createdb.py
311311+312312+# Deliverables
313313+314314+* The source code of the programs (well documented)
315315+* A README file with:
316316+ * description of the programs, including a brief description of how they
317317+work.
318318+ * who helped you or discussed issues with you to finish the program.
319319+* Video description of the project with implementation details. Any doubt
320320+please consult the professor.
321321+322322+# Rubric
323323+324324+* (10 pts) the programs run
325325+* (80 pts) quality of the working solutions
326326+ * (20 pts) Metadata server implemented correctly
327327+ * (25 pts) Data server implemented correctly
328328+ * (10 pts) List client implemented correctly
329329+ * (25 pts) Copy client implemented correctly
330330+* (10 pts) quality of the README
331331+ * (10 pts) description of the programs with their description.
332332+* No project will be graded without submission of the video explaining how the
333333+project was implemented.
···11+#ifndef ENCHUFE_H_
22+#define ENCHUFE_H_
33+#include <errno.h>
44+#include <netinet/in.h>
55+#include <stdint.h>
66+#include <stdio.h>
77+#include <stdlib.h>
88+#include <string.h>
99+#include <sys/socket.h>
1010+1111+// Macro para probar si un numero es negativo. En general, esta libreria
1212+// prefiere crashear el programa que dejar que el usuario arregle un error.
1313+#define try(a) do { \
1414+ if ((a) < 0) { \
1515+ fprintf(stderr, "[ERROR]: %s:%d %s\n", __FILE__, __LINE__, strerror(errno)); \
1616+ exit (EXIT_FAILURE); \
1717+ } \
1818+ } while(0)
1919+2020+typedef int FD; // short for FileDescriptor
2121+typedef uint16_t Port;
2222+typedef uint8_t Byte;
2323+2424+// Buffer de bytes, lo puedes usar para lo que sea.
2525+typedef struct {
2626+ size_t len;
2727+ Byte* buf;
2828+} Buffer;
2929+3030+// Convierte un string de C a un buffer de bytes.
3131+Buffer atob(const char* str);
3232+3333+3434+// Tipo para un IPv4. Ademas te ayuda convertir entre little endian y big
3535+// endian. La data de bytes aparece como bytes[3]bytes[2]bytes[1]bytes[0] en
3636+// memoria.
3737+typedef union {
3838+ Byte bytes[4];
3939+ uint32_t ip;
4040+} IPv4;
4141+4242+// Enchufe.
4343+typedef struct {
4444+ FD fd;
4545+ struct sockaddr_in addr;
4646+ socklen_t addrlen;
4747+} Enchufe;
4848+4949+// Receptaculo.
5050+typedef struct {
5151+ struct sockaddr_in addr;
5252+ socklen_t addrlen;
5353+} Receptaculo;
5454+5555+// Crea un file descriptor nuevo para un enchufe.
5656+inline FD nuevo() {
5757+ FD fd = socket(PF_INET, SOCK_STREAM, 0);
5858+ try (fd);
5959+ return fd;
6060+}
6161+6262+// Crea un receptaculo.
6363+inline Receptaculo receptaculo(IPv4 ip, Port port) {
6464+ struct sockaddr_in name = {
6565+ .sin_family = AF_INET,
6666+ .sin_port = port,
6767+ .sin_addr = {
6868+ .s_addr = ip.ip,
6969+ },
7070+ };
7171+ return (Receptaculo){
7272+ .addr = name,
7373+ .addrlen = sizeof(name),
7474+ };
7575+}
7676+7777+// Coge un file descriptor y un receptaculo y los junta. En otras palabras, los
7878+// aplasta. Un enchufe es basicamente, un file descriptor con un IP.
7979+inline Enchufe aplasta(FD fd, Receptaculo rec) {
8080+ return (Enchufe){
8181+ .fd = fd,
8282+ .addr = rec.addr,
8383+ .addrlen = rec.addrlen,
8484+ };
8585+}
8686+8787+// Esta funcion te crea un enchufe.
8888+Enchufe enchufa(IPv4 ip, Port port);
8989+9090+// Esta funcion crea la conexion desde tu computadora hasta donde sea que este
9191+// el enchufe.
9292+void conecta(Enchufe enchufe);
9393+9494+// Esta funcion amarra la direccion de IP que se le dio al enchufe, al file
9595+// descriptor. Hay casos donde no vas a querer que esten amarrados, como cuando
9696+// no te importa la direccion que tendra un cliente conectandose a un servidor,
9797+// por eso el default es que la funcion enchufa(ip, port) no amarre el file
9898+// descriptor al puerto.
9999+void amarra(Enchufe enchufe);
100100+101101+// Le deja saber al enchufe cuantas conexiones se pueden hacer. El default es
102102+// que no se puedan hacer conexiones. Asi que si estas codificando un servidor,
103103+// tienes que llamar esta funcion.
104104+void escucha(Enchufe enchufe, size_t len);
105105+106106+// Esta funcion bloquea el thread hasta que un cliente se conecte. Devuelve el
107107+// enchufe del cliente para poder comunicarse con el cliente. Tienes que
108108+// desenchufarlo cuando termines la direccion.
109109+Enchufe acepta(Enchufe enchufe);
110110+111111+// Envia un buffer de bytes a un cliente.
112112+void zumba(Enchufe enchufe, Buffer in_buf);
113113+114114+// Recibe un buffer de bytes de un cliente. Devuelve la cantidad de bytes que se
115115+// leyeron. Si devuelve 0, entonces el cliente cerro la conexion.
116116+size_t recibe(Enchufe enchufe, Buffer out_buf);
117117+118118+// Esta funcion se encarga de liberar los recursos que ocupan los echufes.
119119+void desenchufa(Enchufe enchufe);
120120+121121+#ifdef ENCHUFE_IMPLEMENTATION
122122+#include <unistd.h> // read, close and other POSIX functions
123123+#include <sys/socket.h> // all the socket functions
124124+#include <netinet/in.h> // sockaddr_in
125125+#include <arpa/inet.h> // inet_pton
126126+127127+// Esta funcion llama a tres otras funciones inline.
128128+Enchufe enchufa(IPv4 ip, Port port) {
129129+ return aplasta(nuevo(), receptaculo(ip, port));
130130+}
131131+132132+// Wrapper para connect.
133133+void conecta(Enchufe enchufe) {
134134+ try (connect(enchufe.fd, (const struct sockaddr*)&enchufe.addr, enchufe.addrlen));
135135+}
136136+137137+// Wrapper para bind.
138138+void amarra(Enchufe enchufe) {
139139+ try (bind(enchufe.fd, (struct sockaddr*)&enchufe.addr, enchufe.addrlen));
140140+}
141141+142142+// Wrapper para liste.
143143+void escucha(Enchufe enchufe, size_t len) {
144144+ listen(enchufe.fd, (int)len);
145145+}
146146+147147+// Wrapper para acepta.
148148+Enchufe acepta(Enchufe enchufe) {
149149+ FD fd = accept(enchufe.fd, (struct sockaddr*)&enchufe.addr, &enchufe.addrlen);
150150+ try (fd);
151151+ return (Enchufe){
152152+ .fd = fd,
153153+ .addr = enchufe.addr,
154154+ .addrlen = enchufe.addrlen,
155155+ };
156156+}
157157+158158+// Wrapper para zumba.
159159+void zumba(Enchufe enchufe, Buffer buf) {
160160+ try (write(enchufe.fd, buf.buf, buf.len));
161161+}
162162+163163+// Wrapper para recibe.
164164+size_t recibe(Enchufe enchufe, Buffer buf) {
165165+ int64_t bytes_read = read(enchufe.fd, buf.buf, buf.len);
166166+ try (bytes_read);
167167+ return (size_t)bytes_read;
168168+}
169169+170170+// Wrapper para close.
171171+void desenchufa(Enchufe enchufe) {
172172+ close(enchufe.fd);
173173+}
174174+175175+// Esta funcion convierte un string a un buffer.
176176+Buffer atob(const char* str) {
177177+ return (Buffer){
178178+ .buf = (Byte*)str,
179179+ .len = strlen(str),
180180+ };
181181+}
182182+#endif
183183+184184+#endif // ENCHUFE_H_ header
+184
src/lib/lib.h
···11+#ifndef LIB_H_
22+#define LIB_H_
33+#include "enchufe.h"
44+#include <assert.h>
55+66+// Macro para detectar si un pointer es NULL. Usa esta funcion si prefieres
77+// crashear el programa cuando encuetras un puntero NULL.
88+#define exists(a) do { \
99+ if ((a) == NULL) { \
1010+ fprintf(stderr, "[ERROR]: %s:%d Null pointer encountered, %s\n", __FILE__, __LINE__, strerror(errno)); \
1111+ exit (EXIT_FAILURE); \
1212+ } \
1313+ } while(0)
1414+1515+// Esto es necesario por si prefieres que Proc aguante una unidad de tiempo
1616+// distinta.
1717+typedef Byte Time;
1818+1919+// Esto sera lo que se envia y recibe por el socket.
2020+typedef struct {
2121+ Time time;
2222+ Buffer program;
2323+} Proc;
2424+2525+// Esto es para crear una lista dinamica de Procs.
2626+typedef struct {
2727+ Proc* procs;
2828+ size_t len;
2929+} Procs;
3030+3131+void* copy(void* src, size_t nbytes);
3232+3333+// Esta funcion convierte un buffer en una lista de Proc's.
3434+Procs deserialize(Buffer out_buf, size_t msg_len);
3535+3636+// Esta funcion convierte un Proc en un buffer.
3737+Buffer serialize(Proc);
3838+3939+// Esta funcion convierte un string que representa un IPv4 en un IPv4.
4040+IPv4 parse_address(const char* str);
4141+4242+// Esta funcion verifica que el string enviado por el socket sea valido.
4343+Buffer validate_str(Buffer str, size_t max_len);
4444+4545+// Esta funcion crea una copia de un buffer en memoria y lo devuelve.
4646+Buffer bufcpy(Buffer in);
4747+4848+// Esta funcion es mejor que strlen.
4949+size_t safe_strlen(const char* str, size_t max_len);
5050+5151+#ifdef LIB_IMPLEMENTATION
5252+#include "log.h"
5353+#include <stdlib.h>
5454+#include <string.h>
5555+5656+void* copy(void* src, size_t nbytes) {
5757+ void* out = malloc(sizeof(void) * nbytes);
5858+ memcpy(out, src, nbytes);
5959+ return out;
6060+}
6161+6262+// Allocates new memory from src and returns a buffer to that memory. The user
6363+// must free that memory.
6464+Buffer bufcpy(Buffer src) {
6565+ Byte* buf = (Byte*)malloc(src.len * sizeof(Byte));
6666+ exists(buf);
6767+ memcpy(buf, src.buf, src.len);
6868+ return (Buffer){
6969+ .buf = buf,
7070+ .len = src.len,
7171+ };
7272+}
7373+7474+// Checks whether the buffer contains a valid string and that the size provided
7575+// matches the size of that string.
7676+Buffer validate_str(Buffer str, size_t max_len) {
7777+ size_t calculated_len = safe_strlen((const char*)str.buf, max_len);
7878+ if (str.len != calculated_len) {
7979+ log(ERROR, "%s:%d String's length (%zu) is not equal to given length (%zu).", __FILE__, __LINE__, calculated_len, str.len);
8080+8181+ printf("\nBuffer contains: ");
8282+ for (size_t i = 0; i < max_len; ++i) printf("[%d] ", str.buf[i]);
8383+ printf("\n");
8484+8585+ exit(1);
8686+ }
8787+ if (str.len > max_len) {
8888+ log(ERROR, "%s:%d String's length (%zu) is larger than the buffer that contains it (%zu).\n", str.len, calculated_len);
8989+9090+ printf("\nBuffer contains: ");
9191+ for (size_t i = 0; i < max_len; ++i) printf("[%d] ", str.buf[i]);
9292+ printf("\n");
9393+9494+ exit(1);
9595+ }
9696+ return str;
9797+}
9898+9999+// Converts the buffer received from a socket into an array of processes. The
100100+// user must free this memory.
101101+Procs deserialize(Buffer out_buf, size_t msg_len) {
102102+ Procs procs = {
103103+ .procs = (Proc*)calloc(1, sizeof(Proc)),
104104+ .len = 1,
105105+ };
106106+ exists(procs.procs);
107107+108108+ // This loop will continue until all Proc's have been deserialized.
109109+ size_t buf_idx = 0;
110110+ for (size_t j = 0; buf_idx < msg_len; ++j) {
111111+ // Reallocate new Proc if more than one proc was received.
112112+ if (j == procs.len) {
113113+ procs.procs = (Proc*)realloc(procs.procs, procs.len + 1);
114114+ procs.len = procs.len + 1;
115115+ }
116116+117117+ // curr represents the first byte where the Proc lives in the message.
118118+ Byte* curr = out_buf.buf + buf_idx;
119119+120120+ // Parse out the data members.
121121+ Time time = *(Time*)curr;
122122+ size_t len = *(size_t*)(curr + sizeof(Time));
123123+ Byte* str_buf = curr + sizeof(Time) + sizeof(size_t);
124124+125125+ // Validate the string in the buffer.
126126+ Buffer program = validate_str((Buffer){.len = len, .buf = str_buf}, msg_len - buf_idx);
127127+128128+ // Insert everything into the proc list.
129129+ procs.procs[j] = (Proc){
130130+ .time = time,
131131+ .program = bufcpy(program),
132132+ };
133133+134134+ // This calculates where the next Proc will be.
135135+ buf_idx += sizeof(Time) + sizeof(size_t) + program.len + 1;
136136+ }
137137+ return procs;
138138+}
139139+140140+// This function turns a proc into a buffer. In order to do that, this function
141141+// reinterprets everything on the proc as a sequence of bytes.
142142+Buffer serialize(Proc proc) {
143143+ // First, determine how long the buffer has to be.
144144+ size_t len = sizeof(Time) + sizeof(size_t) + proc.program.len + 1;
145145+146146+ // Allocate the bytes in the buffer.
147147+ Buffer buf = {
148148+ .len = len,
149149+ .buf = (Byte*)calloc(len, sizeof(Byte)),
150150+ };
151151+ exists(buf.buf);
152152+153153+ // Copy everything in the buffer.
154154+ memcpy((void*)buf.buf, (void*)&proc.time, sizeof(Time));
155155+ memcpy((void*)(buf.buf + sizeof(Time)), (void*)&proc.program.len, sizeof(size_t));
156156+ memcpy((void*)(buf.buf + sizeof(Time) + sizeof(size_t)), (void*)proc.program.buf, proc.program.len);
157157+ return buf;
158158+}
159159+160160+// This function takes a string representing an IPv4 address and converts it
161161+// into an IPv4 type.
162162+IPv4 parse_address(const char* str) {
163163+ size_t len = safe_strlen(str, 15);
164164+165165+ IPv4 ip = {0};
166166+ size_t curr_byte = 0;
167167+ for (size_t i = 0; i < len; ++i) {
168168+ if (str[i] == '.') {
169169+ ++curr_byte;
170170+ } else {
171171+ ip.bytes[curr_byte] = (Byte)(ip.bytes[curr_byte] * 10 + (str[i] - '0'));
172172+ }
173173+ }
174174+175175+ return ip;
176176+}
177177+178178+// uses memchr to calculate strlen.
179179+size_t safe_strlen(const char* str, size_t max_len) {
180180+ return (size_t)memchr(str, '\0', max_len) - (size_t)str;
181181+}
182182+#endif
183183+184184+#endif // LIB_H_