10

How to Authenticate Kafka Using Kerberos (SASL), Spark, and Jupyter Notebook

 3 years ago
source link: https://hackernoon.com/how-to-authenticate-kafka-using-kerberos-sasl-spark-and-jupyter-notebook-rwal35bx
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

How to Authenticate Kafka Using Kerberos (SASL), Spark, and Jupyter Notebook

5
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png

@artemgArtem Gogin

Data Engineer, Teacher and Technical Writer.

To briefly explain what we are trying to do here: We want to have permission to read and write Kafka topics.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Our Kafka is protected by Kerberos. It means, before we start accessing Kafka, we need to obtain a ticket from Kerberos. To get the ticket we have to provide a keytab — authentication file for each user. All these steps have to be done automatically because when we use commands to access Kafka there won’t be an opportunity to show keytab manually. To get things done we need to specify the right parameters and configurations in the right place.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Here is my environment (your tools and versions may vary but the approach still should work):

0 reactions
heart.png
light.png
money.png
thumbs-down.png
  • Cloudera Hadoop cluster v. 5+
  • Kafka v. 2+ ( topic with Kerberos auth already exists)
  • Spark v. 2+
  • Kerberos v. 5
  • Jupyter Notebook with Pyspark

For the beginning, let’s access the protected Kafka topic with the terminal. The topic access should only be granted if we obtain a ticket from Kerberos for the right user. For this operation, we need to prepare (it will be smoother if all the files will be in the same path): User’s keytab file ( for Kerberos )

0 reactions
heart.png
light.png
money.png
thumbs-down.png

For the beginning, let’s access the protected Kafka topic with terminal. Access to the topic should only be granted if we obtain a ticket from Kerberos for the right user. For this operation we need to prepare (it will be smoother if all the files will be on the same path):

0 reactions
heart.png
light.png
money.png
thumbs-down.png
  • User’s keytab file ( for Kerberos )
  • File jaas.conf:
    KafkaClient {
    com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    keyTab=”${PATH_TO_YOUR_KEYTAB}“
    principal=”${USER_NAME}@${REALM}”;
    };
  • File kafka_security.properties:
    security.protocol=SASL_PLAINTEXT

    sasl.kerberos.service.name=kafka
    sasl.mechanism=GSSAPI
  • File krb5.conf (probably located in /etc/krb5.conf or /etc/kafka/krb5.conf) (see JDK’s Kerberos Requirements for more details)

Then we need to export the variable with

jaas.conf
and
krb5.conf
:
0 reactions
heart.png
light.png
money.png
thumbs-down.png
export KAFKA_OPTS=” Djava.security.auth.login.config=jaas.conf -Djava.security.krb5.conf=/etc/krb5.conf”
0 reactions
heart.png
light.png
money.png
thumbs-down.png

Then we can write and read Kafka topic from Terminal.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

For writing:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
/bin/kafka-console-producer --broker-list ${KAFKA_BROKERS_WITH_PORTS} --topic ${TOPIC_NAME} --producer.config kafka_security.properties

For reading:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
/bin/kafka-console-consumer --bootstrap-server ${KAFKA_BROKERS_WITH_PORTS} --topic ${TOPIC_NAME} --from-beginning --consumer.config kafka_security.properties

Hope everything worked!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Let’s do the same thing using Spark.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

The challenge here is that we want Spark to access Kafka not only with the application driver but also with every executor. It means each executor needs to obtain a ticket from Kerberos with our keytab. To make Spark do this, we need to specify the right configurations.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Firstly, we need the same 

jaas.conf
:
0 reactions
heart.png
light.png
money.png
thumbs-down.png
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab=”${YOUR_KEYTAB_FILE} “
principal=”${USER_NAME}@${REALM}”;
};

Before launching Spark, we also need to export the variable:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
export SPARK_KAFKA_VERSION=0.10

In Spark code we will access Kafka with these options (the first 5 is mandatory):

0 reactions
heart.png
light.png
money.png
thumbs-down.png
kafka.bootstrap.servers=${KAFKA_BROKERS_WITH_PORTS}
kafka.security.protocol=SASL_PLAINTEXT
kafka.sasl.kerberos.service.name=kafka
kafka.sasl.mechanism=GSSAPI
subscribe=${TOPIC_NAME}
startingOffsets=latest
maxOffsetsPerTrigger=1000

You can pass these options map to:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
spark.readStream.
 format("kafka").
 options(myOptionsMap).
 load()

Before starting Spark we can define the shell variable.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
JAVA_OPTIONS="-Djava.security.auth.login.config=jaas.conf -Djava.security.krb5.conf=/etc/krb5.conf"

Also, we will need two copies of users Keytab with different names. If we already have one, we can create the second one with the command:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
cp $USER_NAME.keytab ${USER_NAME}_2.keytab

And to launch the spark application we should run this command:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
spark2-submit \
--master yarn \
--conf "spark.yarn.keytab=${USER_NAME}_2.keytab" \
--conf "spark.yarn.principal=$USER_NAME@$REALM" \
--conf "spark.driver.extraJavaOptions=$JAVA_OPTIONS" \
--conf "spark.executor.extraJavaOptions=$JAVA_OPTIONS" \
--class "org.example.MyClass" \
--jars spark-sql-kafka-0-10_2.11-2.4.0.jar \
--files "jaas.conf","${USER_NAME}.keytab" \
my_spark.jar

Or you can use the same configurations with spark-shell or pyspark.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Note: to allow Spark access HDFS we specify

spark.yarn.keytab
and
spark.yarn.principal
. To allow Spark access Kafka we specify
spark.driver.extraJavaOptions
and
spark.executor.extraJavaOptions
and provide files 
jaas.conf
,
${USER_NAME}.keytab
, mentioned in JavaOptions so every executor could receive a copy of these files for authentication. And for spark kafka dependency we provide
spark-sql-kafka
 jar suitable for our spark version. We can also use option --package instead of --jars.
0 reactions
heart.png
light.png
money.png
thumbs-down.png

Hope everything worked!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Let’s do the same trick in PySpark using Jupyter Notebook.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

To access the shell environment from python we will use 

os.environ
.
0 reactions
heart.png
light.png
money.png
thumbs-down.png
import os
import sysos.environ[‘SPARK_KAFKA_VERSION’] = ‘0.10’

Then we should configure the Spark session.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
spark = SparkSession.builder. \
config(‘spark.yarn.keytab’, ‘${USER_NAME}_2.keytab’).\
config(‘spark.yarn.principal’, ‘$USER_NAME@$REALM’).\
config(‘spark.jars’, ‘spark-sql-kafka-0–10_2.11–2.4.0.jar’).\
config(‘spark.driver.extraJavaOptions’, ‘-Djava.security.auth.login.config=jaas.conf -Djava.security.krb5.conf=/etc/krb5.conf’).\
config(‘spark.executor.extraJavaOptions’, 
‘-Djava.security.auth.login.config=jaas.conf 
-Djava.security.krb5.conf=/etc/krb5.conf’).\
config(‘spark.files’, ‘jaas.conf,${KEYTAB}’).\
.appName(“KafkaSpark”).getOrCreate()
0 reactions
heart.png
light.png
money.png
thumbs-down.png

we can connect to Kafka like this:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
kafka_raw = spark.readStream. \
format(‘kafka’).\
option(‘kafka.bootstrap.servers’, ${KAFKA_BROKERS_WITH_PORTS}). \
option(‘kafka.security.protocol’,’SASL_PLAINTEXT’). \
option(‘kafka.sasl.kerberos.service.name’,’kafka’). \
option(‘kafka.sasl.mechanism’,’GSSAPI’). \
option(‘startingOffsets’,’earliest’). \
option(‘maxOffestPerTrigger’,10). \
option(‘subscribe’,${TOPIC_NAME}). \
load()

To access the data we can use:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
query = kafka_raw. \
    writeStream. \
    format("console"). \
    start()

That’s it. I hope you could find all the configurations you need to access Kafka using Kerberos any way you like.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Also published here.

0 reactions
heart.png
light.png
money.png
thumbs-down.png
5
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png
by Artem Gogin @artemg. Data Engineer, Teacher and Technical Writer.Read my stories
Join Hacker Noon

Create your free account to unlock your custom reading experience.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK