Scala DataInputStream, for comprehension, Try, currying, tailrec
This post if from 2016, hope its helpful (it’s aimed at me when I’ve forgotten how)
Trying to come up with a functional way to read a data file I came accross lots of alternatives, but along the way I wrote the code below. Which introduces quite a few concepts.
The code first, which you can cut and paste, and some notes and explanation later while its relatively fresh in my mind
package com.jonathanplay.tryandforcomprehension
import java.io.{DataInputStream, FileInputStream}
import scala.annotation.tailrec
import scala.collection.mutable.ArrayBuffer
import scala.util.{Failure, Success, Try}
object OpenAndReadAFile extends App {
def readingData(fileName: String): Option[List[(Int, Long, Int)]] = {
@tailrec
def readDataInputStream(acc: ArrayBuffer[(Int, Long, Int)], dis: DataInputStream): Try[List[(Int, Long, Int)]] = {
if (dis.available() == 0) Success(acc.toList)
else readDataInputStream(acc += Tuple3(dis.readInt(), dis.readLong, dis.readInt), dis)
}
def using[A <: {def close() : Unit}, B](closeable: A)(f: A => B): B =
try f(closeable) finally closeable.close()
def readIndexFileStream(fis: FileInputStream): Try[List[(Int, Long, Int)]] = {
for {
dis <- Try(new DataInputStream(fis))
listOfTuples <- readDataInputStream(ArrayBuffer.empty[(Int, Long, Int)], dis)
} yield listOfTuples
}
def readIndexFile(): Option[List[(Int, Long, Int)]] = {
Try(using(new FileInputStream(fileName))(readIndexFileStream)).flatten
} match {
case Success(result) => Some(result)
case Failure(ex) =>
println(s"Could not read file $fileName, detail ${ex.getClass.getName}:${ex.getMessage}")
None
}
readIndexFile()
}
readingData("FileDoesNotExist")
readingData("C:\\tmp\\exampleindexfile.idx")
}
Scala upper bound parameter <:
We want to only accept a parameter which implements a close method.
[A <: {def close() : Unit}, B]
A must be of a type which implements close.
def using[A <: {def close() : Unit}, B](closeable: A)(f: A => B): B =
The function called using has two types, A must have a close method, and B is anything. The function then has two parameters. closeable is the parameter name, and it is of type A, while the second parameter is a function which takes a parameter of type A and returns type B.
How to close a resource in scale?
This is a copy paste from the thousands of other sites which show ‘using’
// A has to have a close function, and f is a function to run but we always close after
def using[A <: {def close() : Unit}, B](closeable: A)(f: A => B): B =
try f(closeable) finally closeable.close()
Which is fine, you call this curried function as below - and note that there are lots of wierd syntax versions to call it. The function which actually reads the file is ‘readIndexFileStream’
// With no _ or (_) scala can determine the type and know that readIndexFileStream is a function
// In some cases it wont be able to in which case you must use (_) or _ depending on your aim
using(new FileInputStream(fileName))(readIndexFileStream))
// alternative syntax, where _ means pass in as a function value - evaluated as it is used
using(new FileInputStream(fileName))(readIndexFileStream(_))
// Or the _ which means a partially applied function
using(new FileInputStream(fileName))(readIndexFileStream _)
Scala function value or partially applied function??
All praise to Scala Puzzlers, puzzle 12 for this information.
Scala allows you to leave off the underscore after a method name when it knows the expected type is a function, and the type of the function is consistent with the signature of the method.
A function value is not evaluated until it is used, and it is evaluated every time.
A partially applied function has parameters evaluated at the time… tell you what, read scala puzzlers, puzzle 12! ( also called “Count Me Now, Count Me Later”)
How to use Try for exception processing in Scala?
The for comprehension has a series of generators which are called in turn, providing a previous one did not fail. We are using an aspect of Try for this - Try is great as it says the functions may fail in a recoverable way. The for comprehension is composing operations which return Try with minimum syntax overhead.
def readIndexFileStream(fis: FileInputStream): Try[List[(Int, Long, Int)]] = {
for {
// gotcha here, if one returns a Try then the rest cannot return Seq of any type
dis <- Try(new DataInputStream(fis))
listOfTuples <- readDataInputStream(ArrayBuffer.empty[(Int, Long, Int)], dis)
} yield listOfTuples
}
Scala nestable Try flattening
I want to open the file, which can throw file not found, and then call using - which can throw IOExceptions, but which will at least close the FileInputStream. But this creates nested Try, and without the flatten, a warning which reads:
Expression of type Some[Try[List[(Int,Long,Int)]]] doesn't conform to the expected type Option[List[(Int,Long,Int)]]
So, Try is like Option and you simply flatten it to avoid the warning…
def readIndexFile(): Option[List[(Int, Long, Int)]] = {
Try(using(new FileInputStream(fileName))(readIndexFileStream)).flatten
} match {
case Success(result) => Some(result)
case Failure(ex) =>
println(s"Could not read file $fileName, detail ${ex.getClass.getName}:${ex.getMessage}")
None
}
Obviously my Failure handling is silly, but it shows what is going on.
Tailrec
Reading the file, which contains Int, Long, Int patterns until you get to the end. Using a mutable ArrayBuffer to append values as it is read in, and converting to an immutable List when we hit the end.
tailrec is better explained by google - but it uses the same stack frame so its efficient.
@tailrec
def readDataInputStream(acc: ArrayBuffer[(Int, Long, Int)], dis: DataInputStream): Try[List[(Int, Long, Int)]] = {
if (dis.available() == 0) Success(acc.toList)
else readDataInputStream(acc += Tuple3(dis.readInt(), dis.readLong, dis.readInt), dis)
}
The test file
You should create a file at C:\tmp\exampleindexfile.idx or equivalent for your Linux/PC/etc.
Type some random garbage into the file and run the code and you will get the output:
// Expected output, this is because the example file contains rubbish, so its just exceptions //Could not read file FileDoesNotExist, detail java.io.FileNotFoundException:FileDoesNotExist (The system cannot find the file specified) //Could not read file C:\tmp\exampleindexfile.idx, detail java.io.EOFException:null
Is this the way to read data?
There are lots of Scala libraries out there, but I did not google them all, as it should be pretty simple and I wanted to do it myself and learn.
References
There are so many I will list them another time, but a word of warning, even the best of them are aging fast as Scala changes. This post is from 2016 - so if you are in the future you should double check it all.